Computer‐Automated Scoring of Written Responses - Carr - Major Reference Works - Wiley Online Library

This chapter begins by briefly discussing the human scoring procedures that preceded—and still operate parallel to—computer-automated scoring (CAS) of written responses. The current conceptualization of the topic is approached by tracing the development of CAS in two areas: extended response tasks such as essays, and limited production tasks such as short answer questions. Limited production responses will be further divided based on the approach to scoring that is being used. This classification is important not only because of the differences in the types of expected responses that they yield, but also because of the different computational approaches normally used to score them: various forms of key word or phrase matching for limited production responses, and systems using more complex forms of natural language processing to score both limited production and extended response tasks. The chapter next moves on to a discussion of current research on CAS in written responses, maintaining its organization based on extended and limited production tasks, and then concludes by exploring the future directions in which research, development, and operational use are likely to proceed.

References

Attali, Y. (2011). Automated subscores for TOEFL iBT independent essays (ETS research report no. RR-11-39) . Princeton, NJ: Educational Testing Service.
Google Scholar
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater v.2. Journal of Technology, Learning, and Assessment, 4(3), 1–31.
Google Scholar
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice. Oxford, England: Oxford University Press.
Google Scholar
Burstein, J. (2003). The e-rater scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 113–21). Mahwah, NJ: Erlbaum.
Google Scholar
Burstein, J., Leacock, C., & Swartz, R. (2001). Automated evaluation of essays and short answers. Retrieved November 18, 2012 from https://dspace.lboro.ac.uk/dspace-jspui/bitstream/2134/1790/1/burstein01.pdf
Google Scholar
Burstein, J., & Marcu, D. (2003). Automated evaluation of discourse structure in student essays. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 209–29). Mahwah, NJ: Erlbaum.
Google Scholar
Cambridge Michigan Language Assessments. (2012). MELAB information bulletin. Retrieved November 18, 2012 from http://www.cambridgemichigan.org/sites/default/files/resources/MELAB_IB.pdf
Google Scholar
Carr, N. T. (2008). Decisions about automated scoring: What they mean for our constructs. In C. A. Chapelle, Y.-R. Chung, & J. Xu (Eds.), Towards adaptive CALL: Natural language processing for diagnostic language assessment (pp. 82–101). Ames: Iowa State University.
Google Scholar
Carr, N. T. (2010). Computer-automated scoring of English writing: Advantages, disadvantages, and alternatives. In M.-H. Tsai, S.-W. Chen, R.-C. Shih, T.-H. Hsin, I. F. Chung, C.-C. Lee, . . . & S.-Y. Lin (Eds.), Proceedings of the 2010 International Conference on ELT Technological Industry and Book Fair: Computer-scoring English writing (pp. 16–28). Pingtung, Taiwan: Department of Modern Languages, National Pingtung University of Science and Technology.
Google Scholar
Carr, N. T. (2011a). Designing and analyzing language tests. Oxford, England: Oxford University Press.
Google Scholar
Carr, N. T. (2011b). Computer-based language assessment: Prospects for innovative assessment. In N. Arnold & L. Ducate (Eds.), Present and future promises of CALL: From theory and research to new directions in language teaching (pp. 337–73). San Marcos, TX: CALICO.
Google Scholar
Carr, N. T. (2011c, June). Training teachers to write good short-answer automated scoring keys. Paper presented at the 33rd Annual Language Testing Research Colloquium, Ann Arbor, MI.
Google Scholar
Carr, N. T. (2011d, June). The generalizability of scoring keys for the computer automated scoring of Web-based language tests. Poster session presented at the 33rd Annual Language Testing Research Colloquium, Ann Arbor, MI.
Google Scholar
Carr, N. T., & Xi, X. (2010). Automated scoring of short-answer reading items: Implications for constructs. Language Assessment Quarterly, 7(3), 205–18.
10.1080/15434300903443958
Web of Science® Google Scholar
Chodorow, M., Gamon, M., & Tetreault, J. (2010). The utility of article and preposition error correction systems for English language learners: Feedback and assessment. Language Testing, 27(3), 419–36.
10.1177/0265532210364391
Web of Science® Google Scholar
Chung, G. K. W. K., & Baker, E. L. (2003). Issues in the reliability and validity of automated scoring of constructed responses. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 23–40). Mahwah, NJ: Erlbaum.
Google Scholar
College Entrance Examination Board. (2004). ACCUPLACER coordinator's guide. Retrieved November 18, 2012 from http://www.olc.edu/∼cdelong/ACCUPLACER/CoordinatorGuide.pdf
Google Scholar
Condon, W. (2006). Why less is not more: What we lose by letting a computer score writing samples. In P. F. Ericsson & R. H. Haswell (Eds.), Machine scoring of student essays: Truth and consequences (pp. 211–30). Logan: Utah State University Press.
Web of Science® Google Scholar
Conference on College Composition and Communication. (2004). CCCC position statement on teaching, learning, and assessing writing in digital environments. Retrieved November 18, 2012 from http://www.ncte.org/cccc/resources/positions/digitalenvironments
Google Scholar
Educational Testing Service. (2007). Test and score summary data for TOEFL computer-based and paper-based tests: July 2005–June 2006 test data. Princeton, NJ: Author.
Google Scholar
Educational Testing Service. (2008). Criterion online writing evaluation service. Retrieved November 18, 2012 from http://www.ets.org/s/criterion/pdf/9286_CriterionBrochure.pdf
Google Scholar
Educational Testing Service. (2012a). Automated scoring and natural language processing: Bibliography. Retrieved November 18, 2012 from http://www.ets.org/research/topics/as_nlp/bibliography/
Google Scholar
Educational Testing Service. (2012c). Criterion. Retrieved November 18, 2012 from http://www.ets.org/criterion/
Google Scholar
Educational Testing Service. (2012b). Understanding your TOEFL iBT® test scores. Retrieved November 18, 2012 from http://www.ets.org/toefl/ibt/scores/understand/
Google Scholar
Elliot, S. (2003). IntelliMetric^TM: From here to validity. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 71–86). Mahwah, NJ: Erlbaum.
Google Scholar
Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater scoring. Language Testing, 27(3), 317–34.
10.1177/0265532210363144
Web of Science® Google Scholar
Granfeldt, J., Nugues, P., Persson, E., Persson, L., Kostadinov, F., Ågren, M., & Schlyter, S. (2005). Direkt Profil: A system for evaluating texts of second language learners of French based on developmental sequences. In J. Burstein & C. Leacock (Eds.), Proceedings of the second workshop on building educational applications using NLP (pp. 53–60). New Brunswick, NJ: Association for Computational Linguistics. Retrieved November 18, 2012 from http://acl.ldc.upenn.edu/W/W05/W05-02.pdf
10.3115/1609829.1609838
Google Scholar
Hawkey, R., & Shaw, S. D. (2005). The Common Scale for Writing Project: Implications for the comparison of IELTS band scores and main suite exam levels. Research Notes, 19, 19–24.
Google Scholar
Herrington, A., & Moran, C. (2006). WritePlacer Plus in place. In P. F. Ericsson & R. H. Haswell (Eds.), Machine scoring of student essays: Truth and consequences (pp. 114–29). Logan: Utah State University Press.
Web of Science® Google Scholar
Hoang, G. (2011). Validating MY Access! as an automated writing instructional tool for English language learners (Unpublished master's thesis). California State University, Los Angeles.
Google Scholar
Intelligent Assessment Technologies. (2011). FreeText Author. Retrieved November 18, 2012 from http://www.intelligentassessment.com/author.htm
Google Scholar
James, C. L. (2008). Electronic scoring of essays: Does topic matter? Assessing Writing, 13, 80–92.
10.1016/j.asw.2008.05.001
Google Scholar
Jones, E. (2006). ACCUPLACER's essay-scoring technology: When reliability does not equal validity. In P. F. Ericsson & R. H. Haswell (Eds.), Machine scoring of student essays: Truth and consequences (pp. 93–113). Logan: Utah State University Press.
10.2307/j.ctt4cgq0p.9
Google Scholar
Keith, T. Z. (2003). Validity of automated essay scoring systems. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 147–67). Mahwah, NJ: Erlbaum.
Google Scholar
Kohli, S., Bhumkar, K., Bakshi, V., Ganapatibhotla, M., & Padhye, A. (2004). Independiente: Automated essay scoring system. Retrieved November 18, 2012 from http://www.d.umn.edu/∼tpederse/Courses/CS8761-FALL04/Project/Readme-Independiente.html
Google Scholar
Landauer, T. P., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation of essays with the Intelligent Essay Assessor^TM . In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 87–112). Mahwah, NJ: Erlbaum.
Google Scholar
Leacock, C., & Chodorow, M. (2001). Automatic assessment of vocabulary usage without negative evidence (TOEFL research report no. RR-01-21) . Princeton, NJ: Educational Testing Service.
Google Scholar
Leacock, C., & Chodorow, M. (2003). Automated grammatical error detection. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 195–207). Mahwah, NJ: Erlbaum.
Google Scholar
McGee, T. (2006). Taking a spin on the Intelligent Essay Assessor. In P. F. Ericsson & R. H. Haswell (Eds.), Machine scoring of student essays: Truth and consequences (pp. 79–92). Logan: Utah State University Press.
10.2307/j.ctt4cgq0p.8
Web of Science® Google Scholar
Mitchell, T., Russell, T., Broomhead, P., & Aldridge, N. (2002). Towards robust computerised marking of free-text responses. Retrieved November 18, 2012 from http://www.intelligentassessment.com/pdf/IntelligentAssessmentTechnologiesCAA2002.pdf
Google Scholar
Ockey, G. J. (2009). Developments and challenges in the use of computer-based testing for assessing second language ability. Modern Language Journal, 93, 836–47.
10.1111/j.1540-4781.2009.00976.x
Web of Science® Google Scholar
Page, E. B. (2003). Project Essay Grade: PEG. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 43–54). Mahwah, NJ: Erlbaum.
Google Scholar
Pearson Education. (2009). PTE Academic automated scoring. Retrieved November 18, 2012 from http://pearsonpte.com/SiteCollectionDocuments/AutomatedScoringUS.pdf
Google Scholar
Pearson Education. (2011a). WriteToLearn FAQ. Retrieved from http://www.writetolearn.net/faq.php
Google Scholar
Pearson Education. (2011b). Learn more. Retrieved November 18, 2012 from http://kt.pearsonassessments.com/learnMore.php
Google Scholar
Phillips, S. M. (2007). Automated essay scoring: A literature review. Kelowna, BC: Society for the Advancement of Excellence in Education (SAEE).
Google Scholar
Pulman, S. G., & Sukkarieh, J. Z. (2005). Automatic short answer marking. In J. Burstein & C. Leacock (Eds.), Proceedings of the second workshop on building educational applications using NLP (pp. 9–16). New Brunswick, NJ: Association for Computational Linguistics. Retrieved November 18, 2012 from http://acl.ldc.upenn.edu/W/W05/W05-02.pdf
10.3115/1609829.1609831
Google Scholar
Quinlan, T., Higgins, D., & Wolff, S. (2009). Evaluating the construct-coverage of the e-rater scoring engine (ETS research report no. RR-09-01) . Princeton, NJ: Educational Testing Service.
Google Scholar
Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of the IntelliMetric^SM essay scoring system. The Journal of Technology, Learning, and Assessment, 4(4), 1–21.
Google Scholar
Spolsky, B. (1995). Measured words. Oxford, England: Oxford University Press.
Google Scholar
Tyson, E. (2010, April 9). Re: Any reliable essay e-rater for large scale English testing? Retrieved November 18, 2012 from http://lists.psu.edu/cgi-bin/wa?A0=LTEST-L&X=1BC6055E343430E8EF
Google Scholar
Vantage Learning. (2007). MY Access! efficacy report. Retrieved November 18, 2012 from http://www.vantagelearning.com/docs/myaccess/myaccess.research.efficacy.report.200709.pdf
Google Scholar
Vantage Learning. (n.d.a). IntelliMetric. Retrieved November 18, 2012 from http://www.vantagelearning.com/products/intellimetric/
Google Scholar
Vantage Learning. (n.d.b). MY Access! School Edition. Retrieved November 18, 2012 from http://www.vantagelearning.com/products/my-access-school-edition/
Google Scholar
Vantage Learning. (n.d.c). Research. Retrieved November 18, 2012 from http://www.vantagelearning.com/learning-center/research/
Google Scholar
Vantage Learning. (n.d.d). IntelliMetric: Frequently asked questions. Retrieved November 18, 2012 from http://www.vantagelearning.com/products/intellimetric/faqs/
Google Scholar
Wang, B., & Mikulis, C. (2005, April). A multi-method, multi-trait validity study of direct writing assessment using automated essay scoring. Retrieved November 18, 2012 from http://www.mrwilliams.org/mm/myaccessoverview/LinkedDocuments/IM_Research_Brief_WritePlacer.pdf
Google Scholar
Warschauer, M., & Ware, P. (2006). Automated writing evaluation: Defining the classroom research agenda. Language Teaching Research, 10(2), 157–80.
10.1191/1362168806lr190oa
Web of Science® Google Scholar
Weigle, S. C. (2010). Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing ability. Language Testing, 27(3), 335–53.
10.1177/0265532210364406
Web of Science® Google Scholar
Williamson, D. M. (2009, April). A framework for implementing automated scoring. Retrieved April 20, 2012, from https://www.ets.org/Media/Conferences_and_Events/AERA_2009_pdfs/AERA_NCME_2009_Williamson.pdf
Google Scholar
Xi, X. (2010). Automated scoring and feedback systems: Where are we and where are we heading? Language Testing, 27(3), 291–300.
10.1177/0265532210364643
Web of Science® Google Scholar
Yang, Y., Buckendahl, C., Juszkiewicz, P., & Bhola, D. (2002). A review of strategies for validating computer-automated scoring. Applied Measurement in Education, 15(4), 391–412.
10.1207/S15324818AME1504_04
Web of Science® Google Scholar

Online Resource

Citing Literature

The Companion to Language Assessment

Browse other articles of this reference work:

BROWSE TABLE OF CONTENTS