AbstractsBiology & Animal Science

Novel Methods for the Computational Analysis of RNA-Seq Data with Applications to Alternative Splicing

by André Kahles




Institution: Universität Tübingen
Department:
Year: 2014
Record ID: 1118250
Full text PDF: http://hdl.handle.net/10900/58072


Abstract

Understanding how genetic information is transformed into a diverse spectrum of complex organisms is one of the longstanding questions of biology. Over the recent years, advancements in sequencing technology have enabled the accurate measurement of the pool of ribonucleic acids (RNAs) contained in a cell at an unprecedented depth. High-throughput RNA-sequencing (RNA-Seq) allows to acquire quantitative measurements of all transcripts in one or more cells and provides qualitative information about isoform structures or sequence alterations. Our goal is to use this information to get a better understanding of RNA-processing and gene regulation with a specific focus on alternative splicing. In this thesis, we present advanced computational methods for the processing of RNA-Seq data, including novel strategies for spliced alignment in the context of genomic variation, accuracy improvements through alignment post-processing and the first high-throughput analysis pipeline for the characterization of alternative splicing events. Our first contribution is the development and extension of PALMapper, a versatile RNA-Seq alignment method. By using a variation-aware alignment approach, we could markedly improve its alignment sensitivity in cases where reference genome and the source-genome of the measured RNA differ. We also greatly increased its accuracy through an additional re-alignment step for reads that span splice junctions. Due to the high-throughput nature of the data and limited computational resources, most alignment tools only perform an approximate search. To better understand the extent of variability in the alignments results and to identify possible sources of variation, we performed a comprehensive evaluation of alignment algorithms, showing substantial differences between alignment outcomes. Using the insights gained during the evaluation, we developed two powerful alignment post-processing tools that aim at making results more comparable and remove possible false hits from the data: The simple alignment filtering tool (SAFT) optimizes filter criteria on a given training set to increase overall accuracy of the alignment. The tool for multiple-mapper resolution (MMR) disambiguates between several equally good alignment-possibilities of the same read, using an iterative algorithm to minimize the variance of the local read coverage. In order to use RNA-Seq alignments for profiling alternative splicing (AS), we developed SplAdder, a tool that enriches a splicing graph representation of existing genome annotations and extracts AS-events from this augmented graph. All presented methods were applied in analysis pipelines that align, post-process and then quantitatively analyze RNA-Seq data. We present four biological studies, where the herein presented tools were an integral part of the analysis pipeline. In a study on the mRNA degradation mechanism nonsense-mediated decay (NMD) in Arabidopsis thaliana, we analyzed samples mutated in UPF1 and UPF3 and thus deficient in NMD to investigate the connection between…