Record linkage and deduplication using traditional blocking

Authors

  • G Somasekhar

  • SeshaSravani K

  • Keerthi P

  • Sai Sandeep G

How to Cite

Somasekhar, G., K, S., P, K., & Sandeep G, S. (2017). Record linkage and deduplication using traditional blocking. International Journal of Engineering and Technology, 7(1.1), 294-296. https://doi.org/10.14419/ijet.v7i1.1.9705

Received date: February 25, 2018

Accepted date: February 25, 2018

Published date: December 21, 2017

DOI:

https://doi.org/10.14419/ijet.v7i1.1.9705

Keywords:

Blocking, Blocking Key, Blocking Key Value, Deduplication, Record Linkage, Traditional Blocking.

Abstract

Record Linkage and Deduplication are the two process that are used in matching records. Matching of records is done to remove the duplicate records. These duplicate records highly influence the outputs of data mining and data processing. If the matching of records is done on the single database, it is called Deduplication. In Deduplication we check for the duplicate records in the single database. Unlike deduplication if the matching of the records is done on the several databases it is called as record linkage. In this paper we also discuss about the indexing technique called as traditional blocking which is used to remove non matching pairs that leads to the less number of record pair to be compared.

References

  1. [1] Peter Christen, “A Survey of Indexing techniques for Scalable Record Linkage and Deduplication,†Journal of Knowledge and Data Engineering, Vol 24, September 2012.

    [2] J. Jonas and J. Harper, “Effective Counterterrorism and the Limited Role of Predictive Data Mining,†Policy Analysis, no. 584, pp. 1-11, 2006.

    [3] Carlo Batini, Monica Scannapieco, “Data and Information Quality: Dimensions, Principles and Techniques “ pp 228.

    [4] D.E. Clark, “Practical Introduction to Record Linkage for Injury Research,†Injury Prevention, vol. 10, pp. 186-191, 2004.https://doi.org/10.1136/ip.2003.004580.

    C.W. Kelman, J. Bass, and D. Holman, “Research Use of Linked Health Data—A Best Practice Protocol,†Australian NZ J. Public Health, vol. 26, pp. 251-255, 2002.https://doi.org/10.1111/j.1467-842X.2002.tb00682.x

Downloads

How to Cite

Somasekhar, G., K, S., P, K., & Sandeep G, S. (2017). Record linkage and deduplication using traditional blocking. International Journal of Engineering and Technology, 7(1.1), 294-296. https://doi.org/10.14419/ijet.v7i1.1.9705

Received date: February 25, 2018

Accepted date: February 25, 2018

Published date: December 21, 2017