AbstractsBiology & Animal Science

Bayesian Predictive Inference for Three Topics in Survey Samples.

by Qixuan Chen




Institution: University of Michigan
Department: Biostatistics
Degree: PhD
Year: 2009
Keywords: Bayesian Inference; Limit of Detection; Multiple Imputation; Sample Surveys; Spline Models; Unequal Probability Sampling; Statistics and Numeric Data; Science
Record ID: 1854375
Full text PDF: http://hdl.handle.net/2027.42/64741


Abstract

In this thesis, I study three problems in survey samples: inference for finite population quantities in unequal probability sampling, variable selection for multiply imputed data, and the application of the multiple imputation method to the problem of detection limits. In survey samples, design-based estimators are often used for inference about finite population quantities when sample sizes are large. However, design-based inference relies on asymptotic assumptions; mean square error can be very large and nominal confidence interval coverage relatively poor when the sample is small. When design information is available to modelers, it can be used to improve the efficiency of the estimators. In Chapters II and III, I provide Bayesian model-based estimators for finite population proportions and quantiles in unequal probability sampling settings by fitting the survey outcomes on the penalized splines of the selection probabilities. Simulation studies show that the robust Bayesian estimator for proportions is more efficient and its 95% CI provides better confidence coverage with shorter average width than the Hajek estimator or the generalized regression estimator. The Bayesian estimators for quantiles also outperform the design-based estimators, with smaller mean squared errors and shorter average width of 95% CIs. When sparse data are selected into samples, the Bayesian estimators yield better confidence coverage. The second part of the research is motivated by two statistical issues connected with the University of Michigan Dioxin Exposure Study which employs a complex survey design. In Chapter IV, I propose a ???combine then select??? variable selection method which calculates combined p-values using the multiple imputation combining rule and then selects variables based on the combined p-values in each step of the selection. I show through simulations and the dioxin study data that the ???combine then select??? method is less likely to incorrectly select variables into the model than competing methods currently used in epidemiological studies. In Chapter V, I employ a proper multiple imputation approach to impute the serum dioxin concentrations for those below the limit of detection. I then use the complete imputed data to predict the age- and sex- specific percentiles of serum dioxin concentrations among the U.S. population.