AbstractsComputer Science

Computational approaches for the study of gene expression, genetic and epigenetic variation in human

by James Wagner




Institution: McGill University
Department: School of Computer Science
Degree: PhD
Year: 2015
Keywords: Applied Sciences - Computer Science
Record ID: 2058426
Full text PDF: http://digitool.library.mcgill.ca/thesisfile130447.pdf


Abstract

Advances in high-throughput genomic technology seen in recent years have enabled the measurement of gene expression, DNA sequence polymorphisms, and epigenetic marks such as DNA methylation at thousands or millions of loci in tens or hundreds of samples. Inter-individual variation in gene expression and DNA methylation are present even when samples are drawn from an ostensibly healthy population. These measurements can be expected to vary due to underlying genetic variation, environmental effects, experimental noise, and, in the case of complex tissues, tissue composition heterogeneity among the individuals studied. The research described in this thesis results from the development of computational and statistical methods and their application to the analysis of three main high-throughput genomic experiments, all with the common goal of better characterizing variation of gene expression and DNA methylation in populations and generating hypotheses of the underlying causes of this variation. The first study involved a Hidden Markov Model based approach to detect statistically meaningful levels of allelic expression from experiments that generate a noisy measurement of allelic expression at heterozygous single nucleotide polymorphism (SNP) loci in a set of samples; we also described results seen when applying our approach to a set of lymphoblastoid cell line (LCL) samples. Next is an examination of the relationships between DNA methylation, gene expression and sequence variation in a set of human fibroblast samples, and results showing that information about chromatin accessibility and histone modifications are a more useful predictor of the directionality of these methylation-expression relationships than location of the CpG site relative to the gene alone. Finally is the identification and analysis of co-methylation modules present in adipose tissue samples, the relationship of these modules with Body Mass Index (BMI), DNA sequence variation, gene expression, open chromatin and histone modifications, and an approach to remove effects caused by tissue composition variation in the adipose tissue and re-characterize the relationshipspresent after correcting for these effects. Together, these studies represent an important contribution to the body of research seeking to better characterize and understand the sources of population level variation in various genetic and epigenetic properties, and introduce several useful tools and important considerations for researchers embarking on these kinds of studies. Les avancées réalisées au cours des dernières années en matière de technologie génomique à haut débit ont rendu possible la mesure de l'expression des gènes, des polymorphismes de séquences d'ADN et des marques épigénétiques telles que la méthylation de l'ADN pour des centaines d'échantillons à des millions de loci. Il est intéressant de noter que même pour des échantillons prélevés d'individus d'une population en bonne santé, il existe une variation de l'expression des gènes et de la méthylation de l'ADN. Cette…