AbstractsBiology & Animal Science

Weighted Lasso Analysis for Cervix Cancer Data

by Marianne Røine




Institution: University of Oslo
Department:
Year: 1000
Keywords: VDP::412
Record ID: 1294719
Full text PDF: https://www.duo.uio.no/handle/10852/37714


https://www.duo.uio.no/bitstream/10852/37714/1/ThesisMarianneRxine.pdf


Abstract

In this thesis I analyze two datasets about cervix cancer supplied by the Norwegian Radium Hospital. The datasets are measured by two different methods, the cDNA microarray technology and the Illumina BeadArray technology, and are measurements of gene expressions, together with the survival times of the patients. The datasets are high-dimensional because the number of explanatory variables p is much larger than the number of patients n. The need for good methods for analyzing high-dimensional data has increased during the last years, and different methods have been developed to solve p>n problems, one of them is the weighted lasso. In this thesis we focus on the weighted lasso analysis with the use of additional data to determine the weights. We look at the correlation between the gene expressions and the copy number data, and use this to decide the weight. In this thesis a simple weight is proposed, where the covariates are ranked according to their correlation with the copy number data, and we decide by cross-validation how many of the top ranked covariates to include in a standard lasso analysis of the data. This weight is compared with a standard lasso analysis without weights and a different weight that uses the correlation as individual weights in the weighted lasso analysis. The aim of the weighted lasso analysis is to find the genes that are important for the survival of the patients, and that can be used in predicting the survival times for future patients.