Culotta, A., and McCallum, A. (2005), “Joint Deduplication of Multiple Record Types in Relational Data,”
CIKM 2005.
Della Pietra, S., Della Pietra, V., and Lafferty, J. (1997), “Inducing Features of Random Fields,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, 19, 380-393.
Deming, W. E., and Gleser, G. J. (1959), "On the Problem of Matching Lists by Samples," Journal of the
American Statistical Association, 54, 403-415.
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977), "Maximum Likelihood from Incomplete Data via
the EM Algorithm," Journal of the Royal Statistical Society, B, 39, 1-38.
Do, H.-H. and Rahm, E. “COMA – A system for flexible combination of schema matching approaches,”
Very Large Data Bases 2002, 610-621
.
Dong, X., Halevy, A., and Madhavan, J. (2005), “Reference Reconciliation in Complex Information
Spaces,” Proceedings of the ACM SIGMOD Conference 2005, 85-96.
Elfekey, M., Vassilios, V., and Elmagarmid, A. “TAILOR: A Record Linkage Toolbox,” IEEE
International Conference on Data Engineering 2002, 17-28
Fan, W., Davidson, I., Zadrozny, B., and Yu, P. (2005), “An Improved Categorization of Classifier’s
Sensitivity on Sample Selection Bias, http://www.cs.albany.edu/~davidson/Publications/samplebias.pdf
.
Fayad, U. and Piatetskey-Shipiro, G, and Smyth, P. (1996), “The KDD Process of Extracting Useful
Information from Volumes of Data,” Communications of the Association of Computing Machinery,
39 (11), 27-34.
Fayad, U. and Uthurusamy, R. (1996), “Data Mining and Knowledge Discovery in Data Bases,”
Communications of the Association of Computing Machinery, 39 (11), 24-26.
Fayad, U. and Uthurusamy, R. (2002), “Evolving Data Mining into Solutions for Insights,”
Communications of the Association of Computing Machinery, 45 (8), 28-31.
Fellegi, I. P., and Sunter, A. B. (1969), "A Theory for Record Linkage," Journal of the American
Statistical Association, 64, 1183-1210.
Ferragina, P. and Grossi, R. (1999), “The String B-tree: a New Data Structure for String Search in External
Memory and Its Applications,” Journal of the Association of Computing Machinery, 46 (2), 236-280.
Freund, Y. and Schapire, R. E. (1996), “Experiments with a New Boosting Algorithm,” Machine Learning:
Proceedings of the Thirteenth International Conference, 148-156.
Friedman, J., Hastie, T. ,Tibshirani, R. (2000), “Additive Logistic Regression: a Statistical View of
Boosting,” Annals of Statistics, 28, 337-407.
Gill, L. (1999), “OX-LINK: The Oxford Medical Record Linkage System,” in Record Linkage Techniques
1997, Washington, DC: National Academy Press, 15-33.
Getoor, L., Friedman, N., Koller, D., and Taskar, B. (2003), “Learning Probabilistic Models for Link
Structure,” Journal Machine Learning Research, 3, 679-707.
Gravano, L., Ipeirotis, P. G., Jagadish, H. V., Koudas, N., Muthukrishnan, and Srivastava, D. (2001),
“Approximate String Joins in a Database (Almost) for Free,” Proceedings of VLDB, 491-500.
Guha, S., Koudas, N., Marathe, A., and Srivastava, D. (2004), “Merging the Results of Approximate Match
Operations,” Proceedings of the 30
th
VLDB Conference, 636-647.
Hall, P. A. V. and Dowling, G. R.(1980), “Approximate String Comparison,” Association of Computing
Machinery, Computing Surveys, 12, 381-402.
Hastie, T., Tibshirani, R., and Friedman, J. (2001), The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, Springer: New York.
Hernandez, M. and Stolfo, S. (1995), “The Merge-Purge Problem for Large Databases,” Proceedings of
ACM SIGMOD 1995, 127-138.
Hjaltson, G. and Samet, H. (2003), “Index-Driven Similarity Search in Metric Spaces,” ACM Transactions
On Database Systems, 28 (4), 517-580.
Ishikawa, H. (2003), “Exact Optimization of Markov Random Fields with Convex Priors,“ IEEE
Transactions on Pattern Analysis and Machine Intelligence, 25, 1333-1336.
Iyengar, V. (2002), “Transforming Data to Satisfy Privacy Constraints,” ACM KDD 2002, 279-288.
Jaro, M. A. (1989), "Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census
of Tampa, Florida," Journal of the American Statistical Association, 89, 414-420.
Jin, L., Li, C., and Mehrortra, S. (2002), “Efficient String Similarity Joins in Large Data Sets,” UCI
technical Report, Feb. 2002, http://www.ics.uci.edu/~chenli/pub/strjoin.pdf .
Jin, L., Li, C., and Mehrortra, S. (2003), “Efficient Record Linkage in Large Data Sets,” Eighth
International Conference for Database Systems for Advance Applications (DASFAA 2003), 26-28