References

Ball, P. (2000). The Salvadoran Human Rights Commission: Data Processing, Data Representation, and Generating Analytical Reports. In P. Ball, H. F. Spirer, & L. Spirer (Eds.), Making the Case: Investigating Large Scale Human Rights Violations Using Information Systems and Data Analysis. AAAS.

Belin, T. R., & Rubin, D. B. (1995). A method for calibrating false-match rates in record linkage. Journal of the American Statistical Association, 90(430), 694–707.

Bhattacharya, I., & Getoor, L. (2006). A latent dirichlet model for unsupervised entity resolution. In SDM (Vol. 5). SIAM.

Bilenko, M., & Mooney, R. J. (2003). Adaptive Duplicate Detection Using Learnable String Similarity Measures. In KDD ’03 (pp. 39–48). ACM.

Christen, P. (2008). Automatic Record Linkage Using Seeded Nearest Neighbour and Support Vector Machine Classification. In KDD ’08 (pp. 151–159). ACM.

Christen, P. (2012). A survey of indexing techniques for scalable record linkage and deduplication. IEEE Transactions on Knowledge and Data Engineering, 24(9), 1537–1555.

Cohen, W., Ravikumar, P., & Fienberg, S. (2003). A comparison of string metrics for matching names and records. In KDD workshop on data cleaning and object consolidation (Vol. 3, pp. 73–78).

Copas, J., & Hilton, F. (1990). Record linkage: Statistical models for matching computer records. Journal of the Royal Statistical Society, Series A, 153(3), 287–320.

Dai, A. M., & Storkey, A. J. (2011). The grouped author-topic model for unsupervised entity resolution. In Artificial neural networks and machine learning–icann 2011 (pp. 241–249). Springer.

Fellegi, I., & Sunter, A. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328), 1183–1210.

Fortini, M., Liseo, B., Nuccitelli, A., & Scanu, M. (2001). On Bayesian Record Linkage. Research in Official Statistics, 4(1), 185–198.

Gutman, R., Afendulis, C., & Zaslavsky, A. (2013). A bayesian procedure for file linking to analyze end- of-life medical costs. Journal of the American Statistical Association, 108(501), 34–47.

Hsu, W., Lee, M. L., Liu, B., & Ling, T. W. (2000). Exploration Mining in Diabetic Patients Databases: Findings and Conclusions. In KDD ’00 (pp. 430–436). ACM.

Jain, S., & Neal, R. (2004). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13, 158–182.

Jewell, N. P., Spagat, M., & Jewell, B. L. (2013). MSE and Casualty Counts: Assumptions, Interpretation, and Challenges. In T. B. Seybolt, J. D. Aronson, & B. Fischhoff (Eds.), Counting Civilian Casualties: An Introduction to Recording and Estimating Nonmilitary Deaths in Conflict. Oxford, UK: Oxford University Press.

Larsen, M. D. (2002). Comments on Hierarchical Bayesian Record Linkage. In Proceedings of the joint statistical meetings, section on survey research methods (pp. 1995–2000). The American Statistical Association.

Larsen, M. D. (2005). Advances in Record Linkage Theory: Hierarchical Bayesian Record Linkage Theory. In Proceedings of the joint statistical meetings, section on survey research methods (pp. 3277–3284). The American Statistical Association.

Larsen, M. D. (2012). An Experiment with Hierarchical Bayesian Record Linkage. Preprint in arXiv: http://arxiv.org/abs/1212.5203.

Larsen, M. D., & Rubin, D. B. (2001). Iterative automated record linkage using mixture models. Journal of the American Statistical Association, 96(453), 32–41.

Liseo, B., & Tancredi, A. (2013). Some advances on Bayesian record linkage and inference for linked data. Retrieved from http://www.ine.es/e/essnetdi_ws2011/ppts/Liseo_Tancredi.pdf

Lum, K., Price, M. E., & Banks, D. (2013). Applications of Multiple Systems Estimation in Human Rights Research. The American Statistician, 67(4), 191–200.

Marchant, N. G., Steorts, R. C., Kaplan, A., Rubinstein, B. I. P., & Elazar, D. N. (2019). D-blink: Distributed end-to-end bayesian entity resolution.

Matsakis, N. E. (2010). Active Duplicate Detection with Bayesian Nonparametric Models (PhD thesis). Massachusetts Institute of Technology.

McCallum, A., & Wellner, B. (2004). Conditional Models of Identity Uncertainty with Application to Noun Coreference. In Advances in neural information processing systems (nips ’04) (pp. 905–912). MIT Press.

Miller, P. L., Frawley, S. J., & Sayward, F. G. (2000). IMM/Scrub: A Domain-Specific Tool for the Deduplication of Vaccination History Records in Childhood Immunization Registries. Computers and Biomedical Research, 33(2), 126–143.

Monge, A., & Elkan, C. (1997). An efficient domain-independent algorithm for detecting approximately duplicate datadata records.

Murphy, J., Brackbill, R. M., Thalji, L., Dolan, M., Pulliam, P., & Walker, D. J. (2007). Measuring and Maximizing Coverage in the World Trade Center Health Registry. Statistics in Medicine, 26(8), 1688–1701.

Murray, J. S. (2016). Probabilistic record linkage and deduplication after indexing, blocking, and filtering. Journal of Privacy and Confidentiality, 7(1), 3–24.

Newcombe, H. B., Kennedy, J. M., Axford, S. J., & James, A. P. (1959). Automatic linkage of vital records computers can be used to extract" follow-up" statistics of families from files of routine records. Science, 130(3381), 954–959.

Sadinle, M. (2014). Detecting Duplicates in a Homicide Registry Using a Bayesian Partitioning Approach. Annals of Applied Statistics, 8(4), 2404–2434.

Sadinle, M. (2016). Bayesian estimation of bipartite matchings for record linkage. Journal of the American Statistical Association, (just-accepted), 1–35.

Sariyar, M., & Borg, A. (2010). The RecordLinkage Package: Detecting Errors in Data. The R Journal, 2(2), 61–67.

Sariyar, M., Borg, A., & Pommerening, K. (2012). Active Learning Strategies for the Deduplication of Electronic Patient Data Using Classification Trees. Journal of Biomedical Informatics, 45(5), 893–900.

Steorts, R. C. (2015). Entity Resolution with Empirically Motivated Priors. Bayesian Analysis, 10(4), 849–875. http://doi.org/10.1214/15-BA965SI

Steorts, R. C., Hall, R., & Fienberg, S. E. (2016). A Bayesian Approach to Graphical Record Linkage and Deduplication. Journal of the American Statistical Association, 111(516), 1660–1672.

Tancredi, A., & Liseo, B. (2011). A hierarchical Bayesian approach to record linkage and population size problems. Annals of Applied Statistics, 5(2B), 1553–1585.