|Institution:||Oklahoma State University|
|Full text PDF:||http://hdl.handle.net/11244/45247|
This study applies machine learning techniques for predicting ovarian cancer survivability using Cerner Health facts data. Specifically, the study uses three popular data mining techniques: Neural network- MLP, Neural network-RBF and Decision trees. Using 10-fold cross validation is used in all the techniques to minimize the overfitting of models. Based on the descriptive statistics this study finds similar patterns that are observed in studies conducted on SEER cancer data. The aggregated results indicate that balanced technique using neural network multilayer perceptron in IBM SPSS modeler performed the best with a classification accuracy of 97.71% which is better than any other model compared in the study. The second best is using unbalanced data on neural network radial basis function with a classification accuracy of 96.18%. The neural network with radial basis function comes out as the worst with a classification accuracy of 67.80% even with a balanced dataset. This signifies given a set of parameters used in the study like: admission source, race of the patient, census division and so on the neural network using multilayer perceptron will predict the outcome of survival of the patient with 97.71% accuracy. In addition to the prediction model this study also found important factors in order to have a better insight into the relative contribution of the variables to predict survivability. Advisors/Committee Members: Liu, Tieming (advisor), Delen, Dursun (advisor), Pratt, David (committee member).