Skip to content

Research at St Andrews

Record linking using metric space similarity search

Research output: Contribution to conferenceAbstract

Abstract

Record linking often employs blocking to reduce the computational complexity of full pairwise comparison. A key is formed from a subset of record attributes. Those records with the same key values are blocked together for detailed comparison. Use of a single blocking key fails to detect many true matches if records contain missing values or errors, since only those records with the same key values are compared.

To address missing values, it is common to repeat the matching process using multiple blocking keys, to match records that are identical in a subset of the fields. The presence of erroneous values may be addressed by blocking using key values mapped to a canonical form (e.g. Soundex). However, this does not address other problems such as single digit transcription errors in dates.

Blocking is used to categorise records that are candidate matches, in preparation for a pairwise comparison phase which may use various distance metrics, depending on the domain of the values being compared. Each blocking process defines a partition of records. The comparison operations are only applied to pairs of records within the same category.

In some contexts, it may be useful to have flexible control over the precision/recall trade-off, depending on the intended use for the matched data, and the degree of conservatism required of the identified links. With blocking, this flexibility is limited by the number of sensible blocking keys that can be identified.

In this talk, we describe experiments with a technique based on similarity searching over metric spaces, which appears to offer greater flexibility, and describe some preliminary results using an historic Scottish dataset.
Close

Details

Original languageEnglish
StatePublished - 2 Apr 2017
EventUK Administrative Data Research Network Annual Research Conference: Social science using administrative data for public benefit - Royal College of Surgeons, Edinburgh, United Kingdom
Duration: 1 Jun 20172 Jun 2017
http://www.adrn2017.net

Conference

ConferenceUK Administrative Data Research Network Annual Research Conference
Abbreviated titleADRN2017
CountryUnited Kingdom
CityEdinburgh
Period1/06/172/06/17
Internet address

    Research areas

  • record linkage

Discover related content
Find related publications, people, projects and more using interactive charts.

View graph of relations

Related by author

  1. Probabilistic linkage of vital event records in Scotland using familial groups

    Akgun, O., Dalton, T. S., Dearle, A., Garrett, E. & Kirby, G. N. C. 11 May 2017

    Research output: Contribution to conferenceAbstract

  2. Evaluating population data linkage: assessing stability, scalability, resilience and robustness across many data sets for comprehensive linkage evaluation

    Dalton, T. S., Akgun, O., Al-Sediqi, A., Christen, P., Dearle, A., Garrett, E., Gray, A., Kirby, G. N. C. & Reid, A. 2 Apr 2017

    Research output: Contribution to conferenceAbstract

  3. An identifier scheme for the Digitising Scotland project

    Akgun, O., Al-Sidiqi, A., Christen, P., Dalton, T. S., Dearle, A., Dibben, C. J. L., Garrett, E., Gray, A., Kirby, G. N. C. & Reid, A. 2 Apr 2017

    Research output: Contribution to conferenceAbstract

ID: 250036196