Skip to content

Research at St Andrews

Evaluating record linkage: creating longitudinal synthetic data to provide gold-standard linked data sets

Research output: Contribution to conferenceAbstract

Standard

Evaluating record linkage: creating longitudinal synthetic data to provide gold-standard linked data sets. / Dalton, Thomas Stanley; Dearle, Alan; Kirby, Graham Njal Cameron; Akgun, Ozgur.

2017. Abstract from Workshop for the Systematic Linking of Historical Records, Guelph, Canada.

Research output: Contribution to conferenceAbstract

Harvard

Dalton, TS, Dearle, A, Kirby, GNC & Akgun, O 2017, 'Evaluating record linkage: creating longitudinal synthetic data to provide gold-standard linked data sets' Workshop for the Systematic Linking of Historical Records, Guelph, Canada, 11/05/17 - 13/05/17, .

APA

Dalton, T. S., Dearle, A., Kirby, G. N. C., & Akgun, O. (2017). Evaluating record linkage: creating longitudinal synthetic data to provide gold-standard linked data sets. Abstract from Workshop for the Systematic Linking of Historical Records, Guelph, Canada.

Vancouver

Dalton TS, Dearle A, Kirby GNC, Akgun O. Evaluating record linkage: creating longitudinal synthetic data to provide gold-standard linked data sets. 2017. Abstract from Workshop for the Systematic Linking of Historical Records, Guelph, Canada.

Author

Dalton, Thomas Stanley ; Dearle, Alan ; Kirby, Graham Njal Cameron ; Akgun, Ozgur. / Evaluating record linkage: creating longitudinal synthetic data to provide gold-standard linked data sets. Abstract from Workshop for the Systematic Linking of Historical Records, Guelph, Canada.

Bibtex - Download

@conference{b23bf2749f964504a25b98b76298e524,
title = "Evaluating record linkage: creating longitudinal synthetic data to provide gold-standard linked data sets",
abstract = "‘Gold-standard’ data to evaluate linkage algorithms are rare. Synthetic data have the advantage that all the true links are known. In the domain of population reconstruction, the ability to synthesise populations on demand, with varying characteristics, allows a linkage approach to be evaluated across a wide range of data sets.We present a micro-simulation model for generating such synthetic populations, taking as input a set of desired statistical properties. It then outlines how these desired properties are verified in the generated populations, and the intended approach to using generated populations to evaluate linkage algorithms. We envisage a sequence of experiments where a set of populations are generated to consider how linkage quality varies across different populations: with the same characteristics, with differing characteristics, and with differing types and levels of corruption. The performance of an approach at scale is also considered.",
keywords = "record linkage",
author = "Dalton, {Thomas Stanley} and Alan Dearle and Kirby, {Graham Njal Cameron} and Ozgur Akgun",
year = "2017",
month = "5",
day = "11",
language = "English",
note = "Workshop for the Systematic Linking of Historical Records ; Conference date: 11-05-2017 Through 13-05-2017",
url = "http://recordlink.org",

}

RIS (suitable for import to EndNote) - Download

TY - CONF

T1 - Evaluating record linkage: creating longitudinal synthetic data to provide gold-standard linked data sets

AU - Dalton,Thomas Stanley

AU - Dearle,Alan

AU - Kirby,Graham Njal Cameron

AU - Akgun,Ozgur

PY - 2017/5/11

Y1 - 2017/5/11

N2 - ‘Gold-standard’ data to evaluate linkage algorithms are rare. Synthetic data have the advantage that all the true links are known. In the domain of population reconstruction, the ability to synthesise populations on demand, with varying characteristics, allows a linkage approach to be evaluated across a wide range of data sets.We present a micro-simulation model for generating such synthetic populations, taking as input a set of desired statistical properties. It then outlines how these desired properties are verified in the generated populations, and the intended approach to using generated populations to evaluate linkage algorithms. We envisage a sequence of experiments where a set of populations are generated to consider how linkage quality varies across different populations: with the same characteristics, with differing characteristics, and with differing types and levels of corruption. The performance of an approach at scale is also considered.

AB - ‘Gold-standard’ data to evaluate linkage algorithms are rare. Synthetic data have the advantage that all the true links are known. In the domain of population reconstruction, the ability to synthesise populations on demand, with varying characteristics, allows a linkage approach to be evaluated across a wide range of data sets.We present a micro-simulation model for generating such synthetic populations, taking as input a set of desired statistical properties. It then outlines how these desired properties are verified in the generated populations, and the intended approach to using generated populations to evaluate linkage algorithms. We envisage a sequence of experiments where a set of populations are generated to consider how linkage quality varies across different populations: with the same characteristics, with differing characteristics, and with differing types and levels of corruption. The performance of an approach at scale is also considered.

KW - record linkage

M3 - Abstract

ER -

Related by author

  1. Probabilistic linkage of vital event records in Scotland using familial groups

    Akgun, O., Dalton, T. S., Dearle, A., Garrett, E. & Kirby, G. N. C. 11 May 2017

    Research output: Contribution to conferenceAbstract

  2. Record linking using metric space similarity search

    Dearle, A., Kirby, G. N. C., Akgun, O. & Dalton, T. S. 2 Apr 2017

    Research output: Contribution to conferenceAbstract

  3. Evaluating population data linkage: assessing stability, scalability, resilience and robustness across many data sets for comprehensive linkage evaluation

    Dalton, T. S., Akgun, O., Al-Sediqi, A., Christen, P., Dearle, A., Garrett, E., Gray, A., Kirby, G. N. C. & Reid, A. 2 Apr 2017

    Research output: Contribution to conferenceAbstract

  4. An identifier scheme for the Digitising Scotland project

    Akgun, O., Al-Sidiqi, A., Christen, P., Dalton, T. S., Dearle, A., Dibben, C. J. L., Garrett, E., Gray, A., Kirby, G. N. C. & Reid, A. 2 Apr 2017

    Research output: Contribution to conferenceAbstract

ID: 250035874