AbstractsBiology & Animal Science

MetaPathways : a modular pipeline for the analysis of environmental sequence information

by Niels William Hanson




Institution: University of British Columbia
Department: Bioinformatics
Degree: PhD
Year: 2015
Record ID: 2058925
Full text PDF: http://hdl.handle.net/2429/52845


Abstract

The lack of cultivated reference strains for the majority of naturally occurring microorganisms has lead to the development of plurality sequencing methods and the field of metagenomics, offering a glimpse into the genomes of this so-called 'microbial dark matter' (MDM). An explosion of sequencing initiatives has followed, attempting to capture and extract biological meaning from MDM across a wide range of ecosystems from deep-sea vents and polar seas to waste-water bioreactors and human beings. Current analytic approaches focus on taxonomic structure and metabolic potential through a combination of phylogenetic anchor screening of the small subunit ribosomal RNA gene (SSU or 16S rRNA) and general sequence searches using homology-based inference. Though much has been learned about microbial diversity and metabolic potential within natural and engineered ecosystems using these approaches, they are insufficient to resolve the ecological relationships that couple nutrient and energy flow between community members - ultimately translating into ecosystem functions and services. This shortcoming arises from a combination of data-intensive challenges presented by environmental sequence information that span processing, integration, and interpretation steps, and a general lack of robust statistical and analytical methods to directly address these problems. This dissertation addresses some of these shortcomings through the development of a modular analytical pipeline, MetaPathways, allowing for the large-scale and systematic processing and integration of many forms of environmental sequence information. MetaPathways is built to scale, comparing hundreds of metagenomic samples through the efficient use of data structures, grid compute models, and interactive data query. Moreover, it attempts to bring functional analysis back to the metabolic map through the creation of environmental pathway/genome databases (ePGDBs), adopting the Pathway Tools software for metabolic pathway prediction on the MetaCyc encyclopedia of genes and genomes. ePGDBs and the pathway-centric approach are validated to provide known and novel insights into community structure and function. Finally, novel taxonomic and metabolic methods supporting the pathway-centric model are derived and demonstrated, and enhance Pathway Tools as a framework for engineering microbial communities and consortia.