|Keywords:||Genetics; Statistics; Bioinformatics|
|Full text PDF:||http://pqdtopen.proquest.com/#viewpdf?dispub=3713946|
The development of quantitative sequencing technologies, such as RNA-Seq, Bar-Seq, ChIP-Seq, and metagenomics, has offered great insight into molecular biology. Proper design and analysis of these experiments require statistical models and techniques that consider the specific nature of sequencing data, which typically consists of a matrix of read counts per feature. An issue of particular importance to the development of these methods is the role of read depth in statistical accuracy and power. The depth of an experiment affects the power to make biological conclusions, meaning an experiment design must consider the tradeoff between cost, power, and the number of samples that are examined. Similarly, per-gene read depth affects each gene's power and accuracy, and must be taken into account in any downstream analysis. Here I explore many facets of the role of read depth in the design and analysis of sequencing experiments, and offer computational and statistical methods for addressing them. To assist in the design of sequencing experiments, I present subSeq, which examines the effect of depth in an experiment by subsampling reads to simulate lower depths. I use this method to examine the extent of read saturation across a variety of RNA-Seq experiments, and demonstrate a statistical model for predicting the effect of increasing depth in any experiment. I consider intensity-dependence in a technology comparison between microarrays and RNA-Seq, and show that the variance added by RNA-Seq depends more on depth than the variance in microarray depends on fluorescence intensity. I demonstrate that Bar-Seq data shares these depth-dependent properties with RNA-Seq and can be analyzed by the same tools, and further provide suggestions on the appropriate depth for Bar-Seq experiments. Finally, I show that per-gene read depth can be taken into account in multiple hypothesis testing to improve power, and introduce the method of functional false discovery rate (fFDR) control.