posted on 2022-03-15, 10:24authored byMargherita Colucci
When a suspect cannot be identified by searching an investigative database of DNA profiles, attempts can be made to gather intelligence from DNA. These include kinship testing (finding relatives of the sample donor), and predicting externally visible characteristics (EVCs) and population of origin (biogeographical ancestry; BGA). This project explores the potential of genome-wide SNP chips and targeted massively parallel sequencing (MPS) in these areas.
SNP chip data arguably forms the “gold standard” for kinship analysis. In a set of eight German pedigrees, the performance of SNP-chip analysis was compared to that of the MPS ForenSeq DNA Signature Prep. Kit, which can analyse up to 230 markers, including autosomal SNPs, and autosomal, X-chromosomal and Y-chromosomal STRs. Different methods (PLINK, GENESIS, PRIMUS - SNP chip data; forrel R package - MPS data) were evaluated. Incorporating information post facto on X-, Y- and mtDNA SNPs added value in some scenarios. The three methods for dense autosomal SNP data performed comparably in kinship estimation and pedigree reconstruction. Kinship coefficients were estimated from the MPS data. The sequence data revealed additional variation in some complex STR arrays and SNP flanking regions, and these variants together with the set of 230 targeted markers offered higher resolution in identity-by-descent estimation. To mimic forensic scenarios, real and simulated STR data were used in an implementation of kinship estimation via likelihood for multiple searches (“blind search”), including founder inbreeding and the addition of X-STRs.
Finally, the ForenSeq kit was used to infer pigmentation phenotypes and estimate ancestry in a sample of African-Portuguese admixed individuals from Cape Verde, for whom genome-wide ancestry estimates and direct measurements of skin (melanin index) and eye colour (T-index) were available. This highlighted difficulties in both BGA and EVC estimation in admixed populations, and suggested that model-based approaches to ancestry are more useful than principal components analysis.