Skip to content

Research at St Andrews

Evaluating population data linkage: assessing stability, scalability, resilience and robustness across many data sets for comprehensive linkage evaluation

Research output: Contribution to conferenceAbstract

Standard

Evaluating population data linkage: assessing stability, scalability, resilience and robustness across many data sets for comprehensive linkage evaluation. / Dalton, Thomas Stanley; Akgun, Ozgur; Al-Sediqi, Ahmad; Christen, Peter; Dearle, Alan; Garrett, Eilidh; Gray, Alasdair; Kirby, Graham Njal Cameron; Reid, Alice.

2017. Abstract from UK Administrative Data Research Network Annual Research Conference, Edinburgh, United Kingdom.

Research output: Contribution to conferenceAbstract

Harvard

Dalton, TS, Akgun, O, Al-Sediqi, A, Christen, P, Dearle, A, Garrett, E, Gray, A, Kirby, GNC & Reid, A 2017, 'Evaluating population data linkage: assessing stability, scalability, resilience and robustness across many data sets for comprehensive linkage evaluation' UK Administrative Data Research Network Annual Research Conference, Edinburgh, United Kingdom, 1/06/17 - 2/06/17, .

APA

Dalton, T. S., Akgun, O., Al-Sediqi, A., Christen, P., Dearle, A., Garrett, E., ... Reid, A. (2017). Evaluating population data linkage: assessing stability, scalability, resilience and robustness across many data sets for comprehensive linkage evaluation. Abstract from UK Administrative Data Research Network Annual Research Conference, Edinburgh, United Kingdom.

Vancouver

Dalton TS, Akgun O, Al-Sediqi A, Christen P, Dearle A, Garrett E et al. Evaluating population data linkage: assessing stability, scalability, resilience and robustness across many data sets for comprehensive linkage evaluation. 2017. Abstract from UK Administrative Data Research Network Annual Research Conference, Edinburgh, United Kingdom.

Author

Dalton, Thomas Stanley ; Akgun, Ozgur ; Al-Sediqi, Ahmad ; Christen, Peter ; Dearle, Alan ; Garrett, Eilidh ; Gray, Alasdair ; Kirby, Graham Njal Cameron ; Reid, Alice. / Evaluating population data linkage: assessing stability, scalability, resilience and robustness across many data sets for comprehensive linkage evaluation. Abstract from UK Administrative Data Research Network Annual Research Conference, Edinburgh, United Kingdom.

Bibtex - Download

@conference{2c922d20ebd541c7a1673a55e379b693,
title = "Evaluating population data linkage: assessing stability, scalability, resilience and robustness across many data sets for comprehensive linkage evaluation",
abstract = "Data linkage approaches are often evaluated with small or few data sets. If a linkage approach is to be used widely, quantifying its performance with varying data sets would be beneficial. In addition, given a data set needs to be linked, the true links are by definition unknown. The success of a linkage approach is thus difficult to comprehensively evaluate. This talk focuses on the use of many synthetic data sets for the evaluation of linkage quality achieved by automatic linkage algorithms in the domain of population reconstruction. It presents an evaluation approach which considers linkage quality when characteristics of the population are varied. We envisage a sequence of experiments where a set of populations are generated to consider how linkage quality varies across different populations: with the same characteristics, with differing characteristics, and with differing types and levels of corruption. The performance of an approach at scale is also considered. The approach to generate synthetic populations with varying characteristics on demand will also be addressed. The use of synthetic populations has the advantage that all the true links are known, thus allowing evaluation as if with real-world 'gold-standard' linked data sets. Given the large number of data sets evaluated against we also give consideration as to how to present these findings. The ability to assess variations in linkage quality across many data sets will assist in the development of new linkage approaches and identifying areas where existing linkage approaches may be more widely applied.",
keywords = "data linkage",
author = "Dalton, {Thomas Stanley} and Ozgur Akgun and Ahmad Al-Sediqi and Peter Christen and Alan Dearle and Eilidh Garrett and Alasdair Gray and Kirby, {Graham Njal Cameron} and Alice Reid",
year = "2017",
month = "4",
day = "2",
language = "English",
note = "UK Administrative Data Research Network Annual Research Conference : Social science using administrative data for public benefit, ADRN2017 ; Conference date: 01-06-2017 Through 02-06-2017",
url = "http://www.adrn2017.net",

}

RIS (suitable for import to EndNote) - Download

TY - CONF

T1 - Evaluating population data linkage: assessing stability, scalability, resilience and robustness across many data sets for comprehensive linkage evaluation

AU - Dalton,Thomas Stanley

AU - Akgun,Ozgur

AU - Al-Sediqi,Ahmad

AU - Christen,Peter

AU - Dearle,Alan

AU - Garrett,Eilidh

AU - Gray,Alasdair

AU - Kirby,Graham Njal Cameron

AU - Reid,Alice

PY - 2017/4/2

Y1 - 2017/4/2

N2 - Data linkage approaches are often evaluated with small or few data sets. If a linkage approach is to be used widely, quantifying its performance with varying data sets would be beneficial. In addition, given a data set needs to be linked, the true links are by definition unknown. The success of a linkage approach is thus difficult to comprehensively evaluate. This talk focuses on the use of many synthetic data sets for the evaluation of linkage quality achieved by automatic linkage algorithms in the domain of population reconstruction. It presents an evaluation approach which considers linkage quality when characteristics of the population are varied. We envisage a sequence of experiments where a set of populations are generated to consider how linkage quality varies across different populations: with the same characteristics, with differing characteristics, and with differing types and levels of corruption. The performance of an approach at scale is also considered. The approach to generate synthetic populations with varying characteristics on demand will also be addressed. The use of synthetic populations has the advantage that all the true links are known, thus allowing evaluation as if with real-world 'gold-standard' linked data sets. Given the large number of data sets evaluated against we also give consideration as to how to present these findings. The ability to assess variations in linkage quality across many data sets will assist in the development of new linkage approaches and identifying areas where existing linkage approaches may be more widely applied.

AB - Data linkage approaches are often evaluated with small or few data sets. If a linkage approach is to be used widely, quantifying its performance with varying data sets would be beneficial. In addition, given a data set needs to be linked, the true links are by definition unknown. The success of a linkage approach is thus difficult to comprehensively evaluate. This talk focuses on the use of many synthetic data sets for the evaluation of linkage quality achieved by automatic linkage algorithms in the domain of population reconstruction. It presents an evaluation approach which considers linkage quality when characteristics of the population are varied. We envisage a sequence of experiments where a set of populations are generated to consider how linkage quality varies across different populations: with the same characteristics, with differing characteristics, and with differing types and levels of corruption. The performance of an approach at scale is also considered. The approach to generate synthetic populations with varying characteristics on demand will also be addressed. The use of synthetic populations has the advantage that all the true links are known, thus allowing evaluation as if with real-world 'gold-standard' linked data sets. Given the large number of data sets evaluated against we also give consideration as to how to present these findings. The ability to assess variations in linkage quality across many data sets will assist in the development of new linkage approaches and identifying areas where existing linkage approaches may be more widely applied.

KW - data linkage

M3 - Abstract

ER -

Related by author

  1. Probabilistic linkage of vital event records in Scotland using familial groups

    Akgun, O., Dalton, T. S., Dearle, A., Garrett, E. & Kirby, G. N. C. 11 May 2017

    Research output: Contribution to conferenceAbstract

  2. Record linking using metric space similarity search

    Dearle, A., Kirby, G. N. C., Akgun, O. & Dalton, T. S. 2 Apr 2017

    Research output: Contribution to conferenceAbstract

  3. An identifier scheme for the Digitising Scotland project

    Akgun, O., Al-Sidiqi, A., Christen, P., Dalton, T. S., Dearle, A., Dibben, C. J. L., Garrett, E., Gray, A., Kirby, G. N. C. & Reid, A. 2 Apr 2017

    Research output: Contribution to conferenceAbstract

ID: 250035951