Computer-Automated Scoring of Written Responses
Abstract
This chapter begins by briefly discussing the human scoring procedures that preceded—and still operate parallel to—computer-automated scoring (CAS) of written responses. The current conceptualization of the topic is approached by tracing the development of CAS in two areas: extended response tasks such as essays, and limited production tasks such as short answer questions. Limited production responses will be further divided based on the approach to scoring that is being used. This classification is important not only because of the differences in the types of expected responses that they yield, but also because of the different computational approaches normally used to score them: various forms of key word or phrase matching for limited production responses, and systems using more complex forms of natural language processing to score both limited production and extended response tasks. The chapter next moves on to a discussion of current research on CAS in written responses, maintaining its organization based on extended and limited production tasks, and then concludes by exploring the future directions in which research, development, and operational use are likely to proceed.
References
- Attali, Y. (2011). Automated subscores for TOEFL iBT independent essays (ETS research report no. RR-11-39) . Princeton, NJ: Educational Testing Service.
- Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater v.2. Journal of Technology, Learning, and Assessment, 4(3), 1–31.
- Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice. Oxford, England: Oxford University Press.
- Burstein, J. (2003). The e-rater scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 113–21). Mahwah, NJ: Erlbaum.
- Burstein, J., Leacock, C., & Swartz, R. (2001). Automated evaluation of essays and short answers. Retrieved November 18, 2012 from https://dspace.lboro.ac.uk/dspace-jspui/bitstream/2134/1790/1/burstein01.pdf
- Burstein, J., & Marcu, D. (2003). Automated evaluation of discourse structure in student essays. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 209–29). Mahwah, NJ: Erlbaum.
- Cambridge Michigan Language Assessments. (2012). MELAB information bulletin. Retrieved November 18, 2012 from http://www.cambridgemichigan.org/sites/default/files/resources/MELAB_IB.pdf
- Carr, N. T. (2008). Decisions about automated scoring: What they mean for our constructs. In C. A. Chapelle, Y.-R. Chung, & J. Xu (Eds.), Towards adaptive CALL: Natural language processing for diagnostic language assessment (pp. 82–101). Ames: Iowa State University.
- Carr, N. T. (2010). Computer-automated scoring of English writing: Advantages, disadvantages, and alternatives. In M.-H. Tsai, S.-W. Chen, R.-C. Shih, T.-H. Hsin, I. F. Chung, C.-C. Lee, . . . & S.-Y. Lin (Eds.), Proceedings of the 2010 International Conference on ELT Technological Industry and Book Fair: Computer-scoring English writing (pp. 16–28). Pingtung, Taiwan: Department of Modern Languages, National Pingtung University of Science and Technology.
- Carr, N. T. (2011a). Designing and analyzing language tests. Oxford, England: Oxford University Press.
- Carr, N. T. (2011b). Computer-based language assessment: Prospects for innovative assessment. In N. Arnold & L. Ducate (Eds.), Present and future promises of CALL: From theory and research to new directions in language teaching (pp. 337–73). San Marcos, TX: CALICO.
- Carr, N. T. (2011c, June). Training teachers to write good short-answer automated scoring keys. Paper presented at the 33rd Annual Language Testing Research Colloquium, Ann Arbor, MI.
- Carr, N. T. (2011d, June). The generalizability of scoring keys for the computer automated scoring of Web-based language tests. Poster session presented at the 33rd Annual Language Testing Research Colloquium, Ann Arbor, MI.
- Carr, N. T., & Xi, X. (2010). Automated scoring of short-answer reading items: Implications for constructs. Language Assessment Quarterly, 7(3), 205–18.
- Chodorow, M., Gamon, M., & Tetreault, J. (2010). The utility of article and preposition error correction systems for English language learners: Feedback and assessment. Language Testing, 27(3), 419–36.
- Chung, G. K. W. K., & Baker, E. L. (2003). Issues in the reliability and validity of automated scoring of constructed responses. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 23–40). Mahwah, NJ: Erlbaum.
- College Entrance Examination Board. (2004). ACCUPLACER coordinator's guide. Retrieved November 18, 2012 from http://www.olc.edu/∼cdelong/ACCUPLACER/CoordinatorGuide.pdf
- Condon, W. (2006). Why less is not more: What we lose by letting a computer score writing samples. In P. F. Ericsson & R. H. Haswell (Eds.), Machine scoring of student essays: Truth and consequences (pp. 211–30). Logan: Utah State University Press.
- Conference on College Composition and Communication. (2004). CCCC position statement on teaching, learning, and assessing writing in digital environments. Retrieved November 18, 2012 from http://www.ncte.org/cccc/resources/positions/digitalenvironments
- Educational Testing Service. (2007). Test and score summary data for TOEFL computer-based and paper-based tests: July 2005–June 2006 test data. Princeton, NJ: Author.
- Educational Testing Service. (2008). Criterion online writing evaluation service. Retrieved November 18, 2012 from http://www.ets.org/s/criterion/pdf/9286_CriterionBrochure.pdf
- Educational Testing Service. (2012a). Automated scoring and natural language processing: Bibliography. Retrieved November 18, 2012 from http://www.ets.org/research/topics/as_nlp/bibliography/
- Educational Testing Service. (2012c). Criterion. Retrieved November 18, 2012 from http://www.ets.org/criterion/
- Educational Testing Service. (2012b). Understanding your TOEFL iBT® test scores. Retrieved November 18, 2012 from http://www.ets.org/toefl/ibt/scores/understand/
- Elliot, S. (2003). IntelliMetricTM: From here to validity. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 71–86). Mahwah, NJ: Erlbaum.
- Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater scoring. Language Testing, 27(3), 317–34.
-
Granfeldt, J.,
Nugues, P.,
Persson, E.,
Persson, L.,
Kostadinov, F.,
Ågren, M., &
Schlyter, S.
(2005).
Direkt Profil: A system for evaluating texts of second language learners of French based on developmental sequences. In
J. Burstein &
C. Leacock (Eds.),
Proceedings of the second workshop on building educational applications using NLP (pp. 53–60).
New Brunswick, NJ:
Association for Computational Linguistics. Retrieved November 18, 2012 from http://acl.ldc.upenn.edu/W/W05/W05-02.pdf
10.3115/1609829.1609838 Google Scholar
- Hawkey, R., & Shaw, S. D. (2005). The Common Scale for Writing Project: Implications for the comparison of IELTS band scores and main suite exam levels. Research Notes, 19, 19–24.
- Herrington, A., & Moran, C. (2006). WritePlacer Plus in place. In P. F. Ericsson & R. H. Haswell (Eds.), Machine scoring of student essays: Truth and consequences (pp. 114–29). Logan: Utah State University Press.
- Hoang, G. (2011). Validating MY Access! as an automated writing instructional tool for English language learners (Unpublished master's thesis). California State University, Los Angeles.
- Intelligent Assessment Technologies. (2011). FreeText Author. Retrieved November 18, 2012 from http://www.intelligentassessment.com/author.htm
-
James, C. L.
(2008).
Electronic scoring of essays: Does topic matter?
Assessing Writing, 13, 80–92.
10.1016/j.asw.2008.05.001 Google Scholar
-
Jones, E.
(2006).
ACCUPLACER's essay-scoring technology: When reliability does not equal validity. In
P. F. Ericsson &
R. H. Haswell (Eds.),
Machine scoring of student essays: Truth and consequences (pp. 93–113).
Logan:
Utah State University Press.
10.2307/j.ctt4cgq0p.9 Google Scholar
- Keith, T. Z. (2003). Validity of automated essay scoring systems. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 147–67). Mahwah, NJ: Erlbaum.
- Kohli, S., Bhumkar, K., Bakshi, V., Ganapatibhotla, M., & Padhye, A. (2004). Independiente: Automated essay scoring system. Retrieved November 18, 2012 from http://www.d.umn.edu/∼tpederse/Courses/CS8761-FALL04/Project/Readme-Independiente.html
- Landauer, T. P., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation of essays with the Intelligent Essay AssessorTM . In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 87–112). Mahwah, NJ: Erlbaum.
- Leacock, C., & Chodorow, M. (2001). Automatic assessment of vocabulary usage without negative evidence (TOEFL research report no. RR-01-21) . Princeton, NJ: Educational Testing Service.
- Leacock, C., & Chodorow, M. (2003). Automated grammatical error detection. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 195–207). Mahwah, NJ: Erlbaum.
- McGee, T. (2006). Taking a spin on the Intelligent Essay Assessor. In P. F. Ericsson & R. H. Haswell (Eds.), Machine scoring of student essays: Truth and consequences (pp. 79–92). Logan: Utah State University Press.
- Mitchell, T., Russell, T., Broomhead, P., & Aldridge, N. (2002). Towards robust computerised marking of free-text responses. Retrieved November 18, 2012 from http://www.intelligentassessment.com/pdf/IntelligentAssessmentTechnologiesCAA2002.pdf
- Ockey, G. J. (2009). Developments and challenges in the use of computer-based testing for assessing second language ability. Modern Language Journal, 93, 836–47.
- Page, E. B. (2003). Project Essay Grade: PEG. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 43–54). Mahwah, NJ: Erlbaum.
- Pearson Education. (2009). PTE Academic automated scoring. Retrieved November 18, 2012 from http://pearsonpte.com/SiteCollectionDocuments/AutomatedScoringUS.pdf
- Pearson Education. (2011a). WriteToLearn FAQ. Retrieved from http://www.writetolearn.net/faq.php
- Pearson Education. (2011b). Learn more. Retrieved November 18, 2012 from http://kt.pearsonassessments.com/learnMore.php
- Phillips, S. M. (2007). Automated essay scoring: A literature review. Kelowna, BC: Society for the Advancement of Excellence in Education (SAEE).
-
Pulman, S. G., &
Sukkarieh, J. Z.
(2005).
Automatic short answer marking. In
J. Burstein &
C. Leacock (Eds.),
Proceedings of the second workshop on building educational applications using NLP (pp. 9–16).
New Brunswick, NJ:
Association for Computational Linguistics. Retrieved November 18, 2012 from http://acl.ldc.upenn.edu/W/W05/W05-02.pdf
10.3115/1609829.1609831 Google Scholar
- Quinlan, T., Higgins, D., & Wolff, S. (2009). Evaluating the construct-coverage of the e-rater scoring engine (ETS research report no. RR-09-01) . Princeton, NJ: Educational Testing Service.
- Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of the IntelliMetricSM essay scoring system. The Journal of Technology, Learning, and Assessment, 4(4), 1–21.
- Spolsky, B. (1995). Measured words. Oxford, England: Oxford University Press.
- Tyson, E. (2010, April 9). Re: Any reliable essay e-rater for large scale English testing? Retrieved November 18, 2012 from http://lists.psu.edu/cgi-bin/wa?A0=LTEST-L&X=1BC6055E343430E8EF
- Vantage Learning. (2007). MY Access! efficacy report. Retrieved November 18, 2012 from http://www.vantagelearning.com/docs/myaccess/myaccess.research.efficacy.report.200709.pdf
- Vantage Learning. (n.d.a). IntelliMetric. Retrieved November 18, 2012 from http://www.vantagelearning.com/products/intellimetric/
- Vantage Learning. (n.d.b). MY Access! School Edition. Retrieved November 18, 2012 from http://www.vantagelearning.com/products/my-access-school-edition/
- Vantage Learning. (n.d.c). Research. Retrieved November 18, 2012 from http://www.vantagelearning.com/learning-center/research/
- Vantage Learning. (n.d.d). IntelliMetric: Frequently asked questions. Retrieved November 18, 2012 from http://www.vantagelearning.com/products/intellimetric/faqs/
- Wang, B., & Mikulis, C. (2005, April). A multi-method, multi-trait validity study of direct writing assessment using automated essay scoring. Retrieved November 18, 2012 from http://www.mrwilliams.org/mm/myaccessoverview/LinkedDocuments/IM_Research_Brief_WritePlacer.pdf
- Warschauer, M., & Ware, P. (2006). Automated writing evaluation: Defining the classroom research agenda. Language Teaching Research, 10(2), 157–80.
- Weigle, S. C. (2010). Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing ability. Language Testing, 27(3), 335–53.
- Williamson, D. M. (2009, April). A framework for implementing automated scoring. Retrieved April 20, 2012, from https://www.ets.org/Media/Conferences_and_Events/AERA_2009_pdfs/AERA_NCME_2009_Williamson.pdf
- Xi, X. (2010). Automated scoring and feedback systems: Where are we and where are we heading? Language Testing, 27(3), 291–300.
- Yang, Y., Buckendahl, C., Juszkiewicz, P., & Bhola, D. (2002). A review of strategies for validating computer-automated scoring. Applied Measurement in Education, 15(4), 391–412.
Online Resource
- Educational Testing Service. (2004). iBT/next generation TOEFL Test integrated writing rubrics (scoring standards). Retrieved April 23, 2012, from http://www.ets.org/Media/Tests/TOEFL/pdf/Writing_Rubrics.pdf