Behavior of statistics for genetic association in a genome-wide scan context

by Hui-Min Lin

Institution: University of Pittsburgh
Year: 2015
Posted: 02/05/2017
Record ID: 2064690
Full text PDF: http://d-scholarship.pitt.edu/25278/1/Hui-Min_Lin_dissertation_Final.pdf


Genome-wide association studies are used to detect association between genetic variants and diseases. Hundreds of thousands to millions of SNPs are tested simultaneously. The results of the study often focus on the list of SNPs ordered according to the statistics rather than on certain p-value cutoffs. Therefore, it is important to investigate the behavior of the extreme values of the statistics rather than the behavior of the expected values. “Detection probability” and “proportion positive” have been proposed to measure the success of a genomic study when ranked lists are the primary outcome. In this dissertation, we first focused on the comparison of statistics for X-chromosome association with rare alleles. The regression with male coded as (0, 2) or adjusting for sex as a covariate is recommended. Then we evaluated statistics for detecting genetic association in the presence of an environmental covariate effect. Selecting the best statistics depends on the purpose of the study and how a researcher selects disease-associated SNPs. Studies whose goal is to find significant signal at the whole genome level should focus on which statistic can provide the highest power. Exploratory studies that look for a list of top ranking SNPs which will be further studied in the future should focus on which statistic can provide the highest detection probability. Adjusting for the environmental covariate effect or interaction effect may reduce the power, but it can help with producing more accurate ranked lists. This work will improve the statistical power of genetic association studies, which will allow us to gain a better understanding of disease processes and ultimately design better treatments and public health interventions.