posted on 2010-07-14, 11:59authored byRobert Kevin Hastings
The human genome contains vast numbers of sequences that have copied themselves to new genomic locations by retrotransposition. Long Interspersed Nuclear Element-1 (LINE-1 or L1) is the only sequence in the human genome still capable of autonomous retrotransposition. L1 elements have contributed to the evolution of the human genome via insertional mutagenesis, pseudogene formation, sequence transduction, and recombination events (producing insertions, deletions and inversions). Currently general and L1- specific sequence databases do not reflect the true level of Full Length Human Specific L1 (FL-L1HS) variation, due to the polymorphic nature of these elements and the way the databases were compiled.
Methods to identify FL-L1HS were applied to three sequence assemblies (Reference, Celera and HuRef) and the nucleotide accession database from NCBI. A non-redundant set of 533 FL-L1HS was discovered in these four sources, of which 164 resided in genes. The trace archives from Ensembl were also searched and a further 48 potential FL-L1HS were found. Computational analyses showed 154 FL-L1HS were potentially capable of retrotransposition, including 54 that resided in genes. Alongside these analyses a Target Site Duplication (TSD) detection and analysis tool, TSDmapper was developed to automatically detect TSDs in FL-L1HS sequences and provide annotation on sequence transduction. TSDmapper was used to predict the pre-insertion sequence of all 533 unique L1s, which facilitated in-silico genotyping.
A new informatic resource, baseLINE (http://baseline.gene.le.ac.uk), was created to display and enable searching of all the L1 annotation information generated. Data can be viewed in a genomic context in chromosome ideograms or can be exported via the Distributed Annotation Service (DAS) on to the Ensembl genome browser. TSDmapper is also provided as a web application at baseline for users to perform TSD annotation of their sequences of interest.