posted on 2019-08-14, 15:55authored byW Algady, S Louzada, D Carpenter, P Brajer, A Färnert, I Rooth, B Ngasala, F Yang, M-A Shaw, EJ Hollox
Glycophorin A and glycophorin B are red blood cell surface proteins and are both receptors for the parasite Plasmodium falciparum, which is the principal cause of malaria in sub-Saharan Africa. DUP4 is a complex structural genomic variant that carries extra copies of a glycophorin A-glycophorin B fusion gene and has a dramatic effect on malaria risk by reducing the risk of severe malaria by up to 40%. Using fiber-FISH and Illumina sequencing, we validate the structural arrangement of the glycophorin locus in the DUP4 variant and reveal somatic variation in copy number of the glycophorin B-glycophorin A fusion gene. By developing a simple, specific, PCR-based assay for DUP4, we show that the DUP4 variant reaches a frequency of 13% in the population of a malaria-endemic village in south-eastern Tanzania. We genotype a substantial proportion of that village and demonstrate an association of DUP4 genotype with hemoglobin levels, a phenotype related to malaria, using a family-based association test. Taken together, we show that DUP4 is a complex structural variant that may be susceptible to somatic variation and show that DUP4 is associated with a malarial-related phenotype in a longitudinally followed population.
Funding
This work was funded by a SACB PhD studentship to W.A. and Wellcome Trust grant WT098051 (F.Y. and S.L.). This research used the SPECTRE High Performance Computing Facility at the University of Leicester. We wish to thank the villagers of Nyamisati and the research team for continuous engagement and contributions. We thank Ellen Leffler and Gavin Band for helpful comments on a previous version of this manuscript, Kirk Rockett for providing the HG02554 cells used by the Oxford laboratory, and Chris Tyler-Smith for support.
History
Citation
American Journal of Human Genetics, 2018, 103 (5), pp. 769-776
Author affiliation
/Organisation/COLLEGE OF LIFE SCIENCES/Biological Sciences/Genetics and Genome Biology
Sequence data are available from the European Nucleotide Archive for HG02554 (accession number ERP110671) and European Genome-Phenome Archive for the four Tanzanian samples (study accession number EGAS00001003239). Access to the Tanzanian sample sequences is restricted to projects related to malarial research.