AbstractsComputer Science

A multi-stage decision algorithm for rule generation for minority class

by Soma Datta




Institution: Texas Tech University
Department:
Year: 2014
Keywords: Data mining; Clustering; Retention; Attrition; Model selection; Words, association rules; Decision tree; Multi-stage decision rules; Recursive partition; Rule sets
Record ID: 2045399
Full text PDF: http://hdl.handle.net/2346/58906


Abstract

This study analyzes student retention data for the prediction of students who are likely to drop out. Retention is an increasingly important problem for institutions who must meet legislative mandates, face budget shortfalls due to decreased tuition or state-based revenue, and who do not produce enough graduates in fields of need, such as technology. The study proposes a multiple stage decision method to improve rule extraction issues encountered previously when using ensemble learning with clustering and decision trees. These expansions include rules with anomalous classes and rules with attributes only chosen by the decision tree method. To improve rule extraction, the study described in this paper uses a multi-stage decision method with clustering, controlled decision trees, and association mining. This study uses dynamic method to generate rules. The characteristic of the dataset commands the path towards generation of the rules. A dynamic multi-stage decision tree was generated depending on the attribute dimensions and size of dataset. Each rule gets its coverage and accuracy. This technique generates rules after mining data from the minority class. The rules generated were grouped into ranges to facilitate the rule choice as needed. These rules were applied to identify student attrition for intervention purposes in the accuracy ranges. Different models can be created for each year and depending on the distribution of the attributes for the year, a respective model can be chosen.