AbstractsComputer Science

Ancient transposable elements discovery and annotation

by Airin Ahia-Tabibi




Institution: McGill University
Department: School of Computer Science
Degree: MS
Year: 2015
Keywords: Applied Sciences - Computer Science
Record ID: 2060759
Full text PDF: http://digitool.library.mcgill.ca/thesisfile130620.pdf


Abstract

Transposable elements (TE), the largest class of repetitive DNA fragments, are the single most abundant component of the genetic material of most eukaryotes. The sheer number, mechanism of transpositions and repetitive natures of the TE sequences are responsible for some challenges in genomics, although that is what makes them particularly interesting entities to study. The recent advancement in the sequencing technologies and the availability of genomic sequences has made the genome-wide analysis of TEs possible. The impact of TEs on structure, evolution and size of the genome as well as genome sequencing and annotation has created growing interest and demand for developing new bioinformatics approaches for their identification. These approaches all aim to computationally discover, detect and analyze both known and novel families of TEs. After their insertion in the genome, most TE copies get relatively quickly degraded, making the recognition of old insertion events challenging. In this thesis, we develop a new pipeline to improve the annotation of ancient transposable elements that have shaped the dynamic component of the human genome. We make use of the availability of inferred ancestral mammalian genome to detect these ancient TE copies using RepeatMasker. Using LiftOver these TEs are lifted to the human genome and then fed to our TEMapper program to be aligned to their corresponding consensus sequences and corrected for the percentage of divergences. Applying the ancient TE annotation pipeline, we revised the annotation of TEs and reached 115Mb coverage gained corresponding to ~7.28% improvement in the human genome. This number corresponds to the significant 3.5% increase in TE composition of the human genome. In addition, we discover novel TE families and investigate their association with genes and regulatory elements. Les éléments transposables (ET), la plus grande classe de fragments d'ADN répétitif, sont les éléments les plus abondants du génome de la plupart des eucaryotes. Leur nombre même, leur mécanisme de transpositions et leur nature répétitive sont responsables de certains défis importants en génomique, et c'est en partie ce qui les rend particulièrement intéressants à étudier. L'avenue récente des technologies de séquençage et la disponibilité de séquences génomiques a rendu possible l'analyse de ETs dans des génome entiers. L'impact des ET sur la structure, l'évolution et la taille du génome ainsi que le séquençage du génome et l'annotation a suscité l'intérêt et la demande pour développer de nouvelles approches bio-informatique pour leur identification. Après leur insertion dans le génome, la plupart des copies d'ET se dégradent assez rapidement, ce qui rend la reconnaissance de vieux événements d'insertion difficile. Dans cette thèse, nous développons de nouveaux algorithmes visant à améliorer l'annotation des éléments transposables anciens qui ont façonné la composante dynamique du génome humain. Nous faisons usage de la disponibilité de génome des mammifères ancestraux infers pour détecter…