University of Leicester
Browse

DataSHIELD: resolving a conflict in contemporary bioscience--performing a pooled analysis of individual-level data without sharing the data

Download (592.2 kB)
journal contribution
posted on 2015-07-29, 09:57 authored by M. Wolfson, S. E. Wallace, Nicholas Masca, G. Rowe, Nuala A . Sheehan, V. Ferretti, P. LaFlamme, Martin D. Tobin, J. Macleod, J. Little, I. Fortier, B. M. Knoppers, Paul R. Burton
BACKGROUND: Contemporary bioscience sometimes demands vast sample sizes and there is often then no choice but to synthesize data across several studies and to undertake an appropriate pooled analysis. This same need is also faced in health-services and socio-economic research. When a pooled analysis is required, analytic efficiency and flexibility are often best served by combining the individual-level data from all sources and analysing them as a single large data set. But ethico-legal constraints, including the wording of consent forms and privacy legislation, often prohibit or discourage the sharing of individual-level data, particularly across national or other jurisdictional boundaries. This leads to a fundamental conflict in competing public goods: individual-level analysis is desirable from a scientific perspective, but is prevented by ethico-legal considerations that are entirely valid. METHODS: Data aggregation through anonymous summary-statistics from harmonized individual-level databases (DataSHIELD), provides a simple approach to analysing pooled data that circumvents this conflict. This is achieved via parallelized analysis and modern distributed computing and, in one key setting, takes advantage of the properties of the updating algorithm for generalized linear models (GLMs). RESULTS: The conceptual use of DataSHIELD is illustrated in two different settings. CONCLUSIONS: As the study of the aetiological architecture of chronic diseases advances to encompass more complex causal pathways-e.g. to include the joint effects of genes, lifestyle and environment-sample size requirements will increase further and the analysis of pooled individual-level data will become ever more important. An aim of this conceptual article is to encourage others to address the challenges and opportunities that DataSHIELD presents, and to explore potential extensions, for example to its use when different data sources hold different data on the same individuals.

History

Citation

International Journal of Epidemiology, 2010, 39 (5), pp. 1372-1382

Author affiliation

/Organisation/COLLEGE OF MEDICINE, BIOLOGICAL SCIENCES AND PSYCHOLOGY/School of Medicine/Department of Health Sciences

Version

  • VoR (Version of Record)

Published in

International Journal of Epidemiology

Publisher

Oxford University Press for International Epidemiological Association

issn

0300-5771

eissn

1464-3685

Acceptance date

2010-05-27

Copyright date

2010

Available date

2015-07-29

Publisher version

http://ije.oxfordjournals.org/content/39/5/1372

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC