Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice

by Kelly E. Rapp

Institution: Indiana University
Year: 2013
Keywords: college selectivity; comparative study; logistic regression; ordinal data; student college choice
Record ID: 2016140
Full text PDF: http://hdl.handle.net/2022/15879


The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational attainment) are categorical in nature but are analyzed using the CLRM (Harwell & Gatti, 2001) even though alternate regression techniques for categorical dependent variables are recommended (Agresti, 1996; Long, 1997). Data obtained from ACT<super>??</super>, Inc., on 5,200 high school seniors in Illinois and Colorado were used to analyze effects of regression method on a model of ascriptive and academic influences on selectivity of postsecondary institution attended. The dependent variable was measured in rank-ordered categories based on self-reported institutional admissions policies and analyzed with classical linear, multinomial logistic, and ordered logistic regressions. Choice of regression method did not affect overall model performance as evidenced by significant <italic>F</italic> and Likelihood Ratio <italic>χ</italic><super>2</super> tests. The full CLRM was fit moderately-well to the data (<italic>R</italic><super>2</super> = .391), surpassing some previous findings (Hearn, 1988, 1991; Davies & Guppy, 1997). McFadden's <italic>R</italic><super>2</super>L measure of strength of association was larger in the multinomial regression than in the ordered regression (<italic>R</italic><super>2</super>L = .191 vs. <italic>R</italic><super>2</super>L = .158). The multinomial logistic method also correctly predicted dependent variable category with the greatest accuracy (46.3% correct), but Somers' <italic>D</italic>yx measure of association was smallest for the multinomial model. Direction and significance of relationship between predictors and the dependent variable was substantively consistent across the CLRM and logistic methods. In all regressions, ACT<super>??</super> score had the most impact on selectivity of institution attended. Threshold values were significant, supporting the assumption of an ordered dependent variable. Due to the CLRM's theoretical and predictive shortcomings and the multinomial model's complexity in interpretation, ordered logistic regression was determined to be the most appropriate for explaining influences on selectivity of postsecondary institution attended.