AbstractsBiology & Animal Science

Evaluation of regression methods for log-normal data - linear models for environmental exposure and biomarker outcomes

by Sara Gustavsson

Institution: University of Gothenburg / Göteborgs Universitet
Year: 2015
Keywords: log-normal distribution; linear models; absolute effects
Record ID: 1342930
Full text PDF: http://hdl.handle.net/2077/37537


The identification and quantification of associations between variables is often of interest in occupational and environmental research, and regression analysis is commonly used to assess these associations. While exposures and biological data often have a positive skewness and can be approximated with the log-normal distribution, much of the inference in regression analysis is based on the normal distribution. A common approach is therefore to log-transform the data before the regression analysis. However, if the regression model contains quantitative predictors, a transformation often gives a more complex interpretation of the coefficients. A linear model in original scale (non-transformed data) estimates the additive effect of the predictor, while linear regression on a log-transformed response estimates the relative effect. The overall aim of this thesis was to develop and evaluate a maximum likelihood method (denoted MLLN) for estimating the absolute effects for the predictors in a regression model where the outcome follows a log-normal distribution. The MLLN estimates were compared to estimates using common regression methods, both using large-scale simulation studies, and by applying the method to a number of real-life datasets. The method was also further developed to handle repeated measurements data. Our results show that when the association is linear and the sample size is large (> 100 observations), MLLN provides basically unbiased point estimates and has accurate coverage for both confidence and predictor intervals. Our results also showed that, if the relationship is linear, log-transformation, which is the most commonly used method for regression on log-normal data, leads to erroneous point estimates, liberal prediction intervals, and erroneous confidence intervals. For independent samples, we also studied small-sample properties of the MLLN-estimates; we suggest the use of bootstrap methods when samples are too small for the estimates to achieve the asymptotic properties.