|Keywords:||Assessment; Computer Adaptive Testing; Criterion-referenced Testing; Instructional Technology; Item Calibration; Mastery Testing|
|Full text PDF:||http://hdl.handle.net/2022/19795|
Computer Adaptive Tests (CATs) have been used successfully with standardized tests. However, CATs are rarely practical for assessment in instructional contexts, because large numbers of examinees are required a priori to calibrate items using item response theory (IRT). Computerized Classification Tests (CCTs) provide a practical alternative to IRT-based CATs. CCTs show promise for instructional contexts, since many fewer examinees are required for item parameter estimation. However, there is a paucity of clear guidelines indicating when items are sufficiently calibrated in CCTs. Is there an efficient and accurate CCT algorithm which can estimate item parameters adaptively? Automatic Racing Calibration Heuristics (ARCH) was invented as a new CCT method and was empirically evaluated in two studies. Monte Carlo simulations were run on previous administrations of a computer literacy test, consisting of 85 items answered by 104 examinees. Simulations resulted in determination of thresholds needed by the ARCH method for parameter estimates. These thresholds were subsequently used in 50 sets of computer simulations in order to compare accuracy and efficiency of ARCH with the sequential probability ratio test (SPRT) and with an enhanced method called EXSPRT. In the second study, 5,729 examinees took an online plagiarism test, where ARCH was implemented in parallel with SPRT and EXSPRT for comparison. Results indicated that new statistics were needed by ARCH to establish thresholds and to determine when ARCH could begin. The ARCH method resulted in test lengths significantly shorter than SPRT, and slightly longer than EXSPRT without sacrificing accuracy of classification of examinees as masters and nonmasters. This research was the first of its kind in evaluating the ARCH method. ARCH appears to be a viable CCT method, which could be particularly useful in massively open online courses (MOOCs). Additional studies with different test content and contexts are needed.