AbstractsBiology & Animal Science

Theoretical aspects of overlapping genes

by Katharina Schilling




Institution: Universität Ulm
Department: Ingenieurwissenschaften und Informatik
Degree: PhD
Year: 2015
Record ID: 1108691
Full text PDF: http://vts.uni-ulm.de/docs/2015/9397/vts_9397_14143.pdf


Abstract

This thesis investigates theoretical aspects of overlapping genes in prokaryotes. Overlapping genes are protein-coding sequences that are encoded in different reading frames of the same DNA region, such that the sequences overlap non-trivially. In contrast to viruses, where overlapping genes are an accepted phenomenon and eukaryotes, where many examples have been found, they were thought to be exotic exceptions in prokaryotes. To shed light on this phenomenon, we study the theory behind overlapping protein-coding sequences in bacterial genomes. First, an analytical model, based on the codon composition of genes and the length of the genome is developed. It predicts statistical properties of Open Reading Frames (ORFs) in all possible reading frames on the DNA sequence. The model is applied to overlapping ORFs, showing significant deviations in some cases indicating a potential functionality. Detailed comparison of the model predictions with Escherichia coli is presented. This bacterium is a pathogen that can lead to severe fooborne diseases when it is transmitted to humans. Additionally, a large scale comparison over 71 bacterial genomes is performed. Further, the model is applied to alternative genetic codes, to examine if the standard genetic code is optimized to encode long overlapping genes. The second topic of this thesis is dedicated to theoretical aspects of overlapping gene evolution, studying the influence of selection pressure in different reading frames. Since evolution can be described as a communication process, concepts of information and communication theory are applied. Based on a codon evolution model in the protein-coding reading frame, the impact on alternative reading frames, that may contain gene candidates, is investigated. Information theoretic measures are applied to quantify the amount of information that is transmitted over time.