REFERENCES
[1] L. a. Kappelman, J. P. Thompson, and E. R. McLean, “Converging
End-user and Corporate Computing,” Communications of the ACM,
vol. 36, pp. 79–92, 1993.
[2] C. Scaffidi, M. Shaw, and B. Myers, “Estimating the Numbers of End
Users and End User Programmers,” in Proceedings of IEEE
Symposium on Visual Languages and Human-Centric Computing
(VL/HCC), 2005, pp. 207–214.
[3] “Apache Subversion,” 2013. [Online]. Available:
http://de.wikipedia.org/wiki/Apache_Subversion.
[4] “Git.” [Online]. Available: https://git-scm.com/.
[5] Miryung Kim, L. Bergman, T. Lau, and D. Notkin, “An Ethnographic
Study of Copy and Paste Programming Practices in OOPL,” in
Proceedings of International Symposium on Empirical Software
Engineering (ISESE), 2004, pp. 83–92.
[6] “SpreadGit.” [Online]. Available:
https://www.crunchbase.com/organization/spreadgit.
[7] “SharePoint.” [Online]. Available: https://products.office.com/zh-
cn/sharepoint/collaboration.
[8] W. Dou, L. Xu, S.-C. Cheung, C. Gao, J. Wei, and T. Huang, “VEnron:
A Versioned Spreadsheet Corpus and Related Evolution Analysis,” in
Proceedings of the 38th International Conference on Software
Engineering Companion (ICSE), 2016, pp. 162–171.
[9] B. Jansen and F. Hermans, “Code Smells in Spreadsheet Formulas
Revisited on an Industrial Dataset,” in Proceedings of IEEE
International Conference on Software Maintenance and Evolution
(ICSME), 2015, pp. 372–380.
[10] T. Schmitz and D. Jannach, “Finding Errors in the Enron Spreadsheet
Corpus,” in Proceeding of IEEE Symposium on Visual Languages and
Human-Centric Computing (VL/HCC), 2016, pp. 157–161.
[11] W. Dou, S.-C. Cheung, C. Gao, C. Xu, L. Xu, and J. Wei, “Detecting
Table Clones and Smells in Spreadsheets,” in Proceedings of the 24th
ACM SIGSOFT International Symposium on the Foundations of
Software Engineering (FSE), 2016, pp. 787–798.
[12] F. Hermans and E. Murphy-Hill, “Enron’s Spreadsheets and Related
Emails: A Dataset and Analysis,” in Proceedings of the 37th IEEE
International Conference on Software Engineering (ICSE), 2015, pp.
7–16.
[13] T. Barik, K. Lubick, J. Smith, J. Slankas, and E. Murphy-Hill, “FUSE:
A Reproducible, Extendable, Internet-Scale Corpus of Spreadsheets,”
in Proceedings of the 12th Working Conference on Mining Software
Repositories (MSR), 2015, pp. 486–489.
[14] M. Fisher and G. Rothermel, “The EUSES Spreadsheet Corpus: A
Shared Resource for Supporting Experimentation with Spreadsheet
Dependability Mechanisms,” ACM SIGSOFT Software Engineering
Notes, pp. 1–5, 2005.
[15] “Enron Corporation.” [Online]. Available:
https://en.wikipedia.org/wiki/Enron.
[16] S. Ducasse, M. Rieger, and S. Demeyer, “A Language Independent
Approach for Detecting Duplicated Code,” in Proceedings of
International Conference on Software Maintenance (ICSM), 1999, pp.
109–118.
[17] R. Wettel and R. Marinescu, “Archeology of Code Duplication:
Recovering Duplication Chains From Small Duplication Fragments,”
in Proceeding of the 7th International Symposium on Symbolic and
Numeric Algorithms for Scientific Computing (SYNASC), 2005, pp. 8-
pp.
[18] U. Manber, “Finding Similar Files in a Large File System,” in
Proceedings of the USENIX Winter Technical Conference, 1994, pp.
1–10.
[19] C. Chambers, M. Erwig, and M. Luckey, “SheetDiff: A Tool for
Identifying Changes in Spreadsheets,” in Proceedings of IEEE
Symposium on Visual Languages and Human-Centric Computing
(VL/HCC), 2010, pp. 85–92.
[20] “xlCompare.” [Online]. Available:
http://www.xlcompare.com/product.asp.
[21] F. Hermans, M. Pinzger, and A. van Deursen, “Detecting and
Visualizing Inter-Worksheet Smells in Spreadsheets,” in Proceedings
of the 34th International Conference on Software Engineering (ICSE),
2012, pp. 441–451.
[22] F. Hermans, M. Pinzger, and A. van Deursen, “Supporting
Professional Spreadsheet Users by Generating Leveled Dataflow
Diagrams,” in Proceeding of the 33rd international conference on
Software engineering (ICSE), 2011, pp. 451-460.
[23] R. Abraham and M. Erwig, “UCheck: A Spreadsheet Type Checker
for End Users,” Journal of Visual Languages & Computing, vol. 18,
pp. 71–95, 2007.
[24] S. Roy, F. Hermans, E. Aivaloglou, J. Winter, and A. van Deursen,
“Evaluating Automatic Spreadsheet Metadata Extraction on a Large
Set of Responses from MOOC Participants,” in Proceedings of the
23rd International Conference on Software Analysis, Evolution, and
Reengineering (SANER), 2016, pp. 135–145.
[25] E. Greengrass, Information retrieval: A survey. 2000.
[26] S. K. M. Wong and V. V. Raghavan, “Vector Space Model of
Information Retrieval: A Reevaluation,” in Proceedings of the 7th
annual international ACM SIGIR conference on Research and
development in information retrieval (SIGIR), 1984, pp. 167–185.
[27] M. F. Porter, “An Algorithm for Suffix Stripping,” Program, vol. 14,
pp. 130–137, 1980.
[28] “Term Frequency-Inverse Document Frequency.” [Online]. Available:
https://en.wikipedia.org/wiki/Tf–idf.
[29] “Apache POI.” [Online]. Available: https://poi.apache.org/.
[30] R. L. Hale, “Cluster Analysis in School Psychology: An Example,”
Journal of School Psychology, vol. 19, pp. 51–56, 1981.
[31] J. Han, M. Kamber, and J. Pei, Data Mining : Concepts and
Techniques : Concepts and Techniques (3rd Edition). Elsevier, 2012.
[32] B. Larsen and C. Aone, “Fast and Effective Text Mining Using
Linear-time Document Clustering,” in Proceedings of the 5th ACM
SIGKDD international conference on Knowledge discovery and data
mining (KDD), 1999, pp. 16–22.