Statistical methods for genetic risk confidence intervals, Bayesian disease risk prediction, and estimating mutation screening saturation

Ying Shan

Abstracts Biology & Animal Science

by Ying Shan

Institution:	University of Pittsburgh
Department:
Year:	2016
Posted:	02/05/2017
Record ID:	2090688
Full text PDF:	http://d-scholarship.pitt.edu/28629/1/Ying_dissert_8_2016.pdf

Abstract

Genetic information can be used to improve disease risk estimation as well as to estimate the number of genes influencing a trait. Here we explore these issues in three parts. 1) For an informed understanding of a disease risk prediction, the confidence interval of the risk estimate should be taken into account. But few previous studies have considered it. We propose a better risk prediction model and provide a better screening strategy considering the confidence intervals. Risk models are built with varying numbers of genetic risk variants known as single nucleotide polymorphisms (SNPs). Inclusion in the risk model of SNPs, sorted in decreasing order by effect size, with smaller effects modestly, shifts the risk but also increases the confidence intervals. The more appropriate risk prediction model should not include the small effect SNPs. The newly proposed screening method is superior to the traditional one as evaluated by net benefit quantity. 2) Many methods have been developed for associated SNP selection, SNP effect estimation, and risk prediction. A Bayesian method designed for continuous phenotypes, BayesR, shows good characteristics. Here, we developed an extension of BayesR (BayesRB), so that the method can be used for binary phenotypes. For SNP effect estimation, BayesRB shows the unbiasedness and sparseness for the big and small effect SNPs, respectively. It also performs well on risk prediction, but not on associated SNP selection. 3) When a recessive forward genetic screening study (RFGSS) is carried out to detect disease mutations, it is important to estimate the screening saturation so as to guide the screening strategy. Here, we develop a simulation-based 'unseen species' method to estimate the screening saturation in a RFGSS. We simulated a RFGSS process based on a real study and compared our method to both nonparametric methods and parametric methods. The proposed method performs better than all the other methods, except an existing 'unseen species' method. The above three newly proposed methods are helpful for constructing better risk prediction models and for estimating the number of disease contributing genes. These methods can be applied to different disease studies and may make contributions to public health.

AbstractsBiology & Animal Science

Abstract

Abstracts Biology & Animal Science