Testing Writing Skills: a Selected Bibliography

The International Research Foundation for English Language Education

ASSESSING SECOND LANGUAGE WRITING SKILLS: SELECTED REFERENCES (Last updated 22 October 2016)

Ali, K. A., & Mostafa, N. A. (2013). Errors in using past tense form in writing essays among Kurdish university learners. The Asian Journal of English Language and Pedagogy, 1, 176-189.

Ali, S. (2005). How effective is self-assessment in writing? In P. Davidson, C. Coombe, & W. Jones (Eds.), Assessment in the Arab world (pp. 307-322). Dubai: TESOL Arabia.

Allai, S. K., & Connor, U. (1991). Using performative assessment instruments with ESL student writers. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts. (pp. 227-240). Norwood, NJ: Ablex.

Allison, D., & Cheung, E. (1991). ‘Good’ and ‘poor’ writing and writers: Studying individual performance as a part of placement test validation. Hong Kong Papers in Linguistics and Language Teaching, 14, 1-14.

Amena, M. (2005). Identifying the processes of assessing writing using an analytic marking criteria. In P. Davidson, C. Coombe, & W. Jones (Eds.), Assessing second language writing in academic contexts (pp. 225-249). Norwood, NJ: Ablex.

Arthur, B. (1989). Short term changes in EFL composition skills. In C. A. Yorio, K. Perkins, & J. Schachter (Eds.), On TESOL ‘79: The learner in focus (pp. 330-342). Washington, DC: TESOL.

Attali, Y., & Burstein, J. (2006). Automated Essay Scoring with e-rater V.2. Journal of Technology, Learning, and Assessment (JTLA), 4(3).

Bae, J., & Bachman, L. F. (2010). An investigation of four writing traits and two tasks across two languages. Language Testing, 27, 213-234.

Baker, B. (2015). An updated visual representation for writing assessment research. TESL Canada, 32(9), 124-136.

Banerjee, J., Franceschina, F., Smith, A. M. (2007). Documenting features of written language production typical at different IELTS band score levels. In P. McGovern & S. Walsh (Eds.), IELTS Research Reports (Vol. 7) (pp. 241-309). Canberra, Australia: IELTS Australia.

Barkaoui, K. (2007). Rating scale impact on EFL essay marking: A mixed-method study. Assessing Writing, 12, 86-107.

1 177 Webster St., #220, Monterey, CA 93940 USA Web: www.tirfonline.org / Email: [email protected] The International Research Foundation for English Language Education

Barkaoui, K. (2011). Effects of marking method and rater experience on ESL essay scores. As- sessment in Education: Principles, Policy & Practice, 18(3), 277-291.

Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28(1), 51-75.

Beck, S. W., Llosa, L., Black, K., & Trzeszkowski-Giese, A. (2015). Beyond the rubric: Think- alouds as a diagnostic assessment tool for high school writing teachers. Journal of Adolescent and Adult Literacy, 58(8), 668-679.

Bernhardt, E., Molitoris, J., Romeo, K., Lin, N., & Valderrama, P. (2015) Designing and sustaining a foreign language writing proficiency assessment program at the postsecondary level. Foreign Language Annals, 48(3), 329-349.

Boldt, H. Valsecchi, M. I., & Cushing, S. C. (2001). Evaluation of student writing on text- responsible and non-text responsible writing tasks. MEXTESOL Journal, 24, 13-33.

Brodkey, D., & Young, R. (1981). Compositions correctness scores. TESOL Quarterly, 15(2), 159-168. Brown, J. D. (1991). Do English and ESL faculties rate writing samples differently? TESOL Quarterly, 25(4), 587-603.

Brown, J. D., & Bailey, K. M. (1984). A categorical instrument for scoring second language writing skills. Language Learning, 34(4), 21-42.

Brown, J. D., Hilgers, T., & Marsella, J. (1991). Essay prompts and topics: Minimizing the effect of mean differences. Written Communicaton, 8, 533-556.

Burstein, J. (2003). The e-rater scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. C. Burstein (Eds.), Automated essay scoring: A cross disciplinary approach (pp. 113-121). Mahwah, NJ: Lawrence Erlbaum.

Calfee, R., & Perfumo, P. (Eds.). (1996). Writing portfolios in the classroom: Policy and practice, promise and peril. Mahwah, NJ: Lawrence Erlbaum.

Camp, R. (1993). Changing the model for direct assessment of writing. In M. Williamson & B. Huot (Eds.), Holistic scoring: Theoretical foundations and validation research (pp. 56- 69). Cresskill, NJ: Hampton Press.

Carlise, R., & McKenna, E. (1991). Placement of ESL/EFL undergraduate writers in college- level writing programs. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 197-211). Norwood, NJ: Ablex.

Chambers, L. (2011). Composition and revision in computer-based written assessment. Cambridge ESOL Research Notes, 43, 25-32.

Charney, D. (1984). The validity of using holistic scoring to evaluate writing. Research in the teaching of English, 18, 65-81.

Cho, D. (1999). A study on ESL writing assessment: Intra–rater reliability of ESL compositions. Melbourne Papers in Language Testing, 8(1), 1–24.

Cizek, G. J., & Page, B. A. (2003). The concept of reliability in the context of automated essay scoring. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross- disciplinary perspective (pp. 125–145). Mahwah, NJ: Lawrence Erlbaum.

Connor, U., & Carrell, P. (1983). The interpretation of tasks by writers and readers in holistically rated direct assessment of writing. In J. G. Carson & I. Leki (Eds.), Reading in the composition classroom (pp. 159-175). Mahwah, NJ: Lawrence Erlbaum.

Connor, U., & Mbaye, A. (2002). Discourse approaches to writing assessment. Annual Review of Applied Linguistics, 22, 263-278.

Connor-Linton, J. (1995a). Crosscultural comparison of writing standards: American ESL and Japanese EFL. World Englishes, 14, 99-115.

Connor-Linton, J. (1995b). Looking behind the curtain: What do L2 composition ratings really mean? TESOL Quarterly, 29, 762-765.

Cooper, C. R., & Odell, L. (Eds.). (1999). Evaluating writing: The role of teacher’s knowledge about text, learning, and culture. Urbana, IL: National Council of Teachers of English.

Cooper, T. C. (1976). Measuring written syntactic patterns of second language learners of German. Journal of Educational Research, 69, 176-183.

Cumming, A. (1989). Writing expertise and second language proficiency. Language Learning, 39, 81-141.

Cumming, A. (1990). Expertise in evaluation second language composition. Language Testing, 7(1), 31-51.

Cumming, A. (1990). The thinking, interactions, and participation to foster in adult ESL literacy instruction. TESL Talk, 20, 34-51.

Cumming, A. (1997). The testing of second-language writing. In C. Clapham (Ed.), The encyclopedia of language and education: Volume 7. Language assessment (pp. 51-63). Dordrecht, The Netherlands: Kluwer.

Cumming, A. (1998). Theoretical perspectives on writing. Annual Review of Applied Linguistics, 18, 61-78.

Cumming, A. (2001). ESL/EFL instructors’ practices for writing assessment: Specific purposes or general purposes? Language Testing, 18(2), 207-224.

Cumming, A. (2001). Learning to write in a second language: Two decades of research. International Journal of English Studies, 1(2), 1-23.

Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10, 5-43.

Cumming, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. Modern Language Journal, 86, 67-96.

Cumming, A., Kantor, R., Powers, D., Santos, T., & Taylor, C. (2000). TOEFL 2000 writing framework: A working paper. Princeton, NJ: Educational Testing Service.

Daly, J. A., & Dickson-Markman, F. (1982). Contrast effects in evaluating essays. Journal of Educational Measurement, 19, 309–316.

David, V. (2016). A comparison of two approaches for assessing L2 writing: Process-based and impromptu timed writing exams. Applied Language Learning, 26(1), 65-82.

Delaruelle, S. (1997). Text type and rater decision-making in the writing module. In G. Brindley & G. Wigglesworth (Eds.), Access: Issues in language test design and delivery (pp. 215–242). Sydney, Australia: National Centre for English Language Teaching and Research, Macquarie University.

DeRemer, M. (1998). Writing assessment: Raters’ elaboration of the rating task. Assessing Writing, 5, 7-29.

Diederich, P. B., French, J. W., & Carlton, S. T. (1961). Factors in judgements of writing ability. Research Bulletin, RB-61-15. Princeton, NJ: Educational Testing Service (ERIC Document Reproduction Service ED 002172). di Gennaro, K. (2008). Assessment of Generation 1.5 learners for placement into college writing courses. Journal of Basic Writing, 27(1), 61-79. di Gennaro, K. (2009). Investigating differences in the writing performance of international and generation 1.5 students. Language Testing, 26, 533-559.

Dobson, B. (2007). Designing effective writing assessments for classroom contexts. In C. Irvine- Niakaris & A. Nebel (Eds.), 2nd Language Testing & Evaluation Forum, Teaching and testing: Opportunities for learning (pp. 7-19). Athens, Greece: Hellenic American Union.

Douglas, D. (2000). Specific purpose tests of reading and writing. Assessing languages for specific purposes (pp.189-245). Cambridge, UK: Cambridge University Press.

East, M., & Young, D. (2007). Scoring L2 writing samples: Exploring the relative effectiveness of two different diagnostic methods. New Zealand Studies in Applied Linguistics, 13(1), 1-21.

Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2, 197–221.

Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25, 155–185.

Educational Testing Service. (1992). TOEFL Test of Written English guide. Princeton, NJ: Educational Testing Service.

Elbow, P. (1993). Ranking, evaluating and liking: Sorting out three forms of judgment. College English, 55(2), 187-206.

Elder, C., Barkhuizen, G., Knoch, U., & von Randow, J. (2007). Evaluating rater responses to an online training program for L2 writing assessment. Language Testing, 24, 37–64.

Elder, C., Knoch, U., & Zhang, R. (2009). Diagnosing the support needs of second language writers: Does the time allowance matter? TESOL Quarterly, 43(2), 351-360.

Englehard, G. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31, 93–112.

Enright, M. K., Bridgeman, B., Eignor, D., Lee, Y. W., & Powers, D. E. (2008). Prototyping measures of listening, reading, speaking, and writing. In C. A. Chapelle, M. K. Enright, & J. M. Jamieson (Eds.), Building a validity argument for the Test of English as a Foreign Language (pp. 145–186). New York, NY: Routledge.

Erdosy, M. U. (2004). Exploring variability in judging writing ability in a second language: A study of four experienced raters of ESL compositions (TOEFL Research Report No. 70, RR-03-17). Princeton, NJ: ETS. Retrieved from: http://www.ets.org/Media/Research/pdf/RR-03-17.pdf

Esmaeili, H. (2002). Integrated reading and writing tasks and ESL students’ reading and writing

performance in an English language test. The Canadian Modern Language Review/La Revue Canadienne Des Langues Vivantes, 58(4), 599-622.

Evola, J., Mamer, E., & Lentz, B. (1980). Discrete point versus global scoring for cohesive devices. In J. W. Oller & K. Perkins (Eds.), Research in language testing (pp. 177-181). Rowley, MA: Newbury House.

Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. Iranian Journal of Language Testing, 1, 1–16.

Faigley, L., Cherry, R. D., Jolliffe, D. A., & Skinner, A. M. (1985). Assessing writers’ knowledge and processes of composing. Norwood, NJ: Ablex.

Fawcett, S., Sandberg, A., & Pittman, M. S. (1987). Test package. Grassroots: The writer’s workbook. Boston, MA: Houghton Mifflin.

Feak, C., & Dobson, B. (1996). Building on the impromptu: A source-based academic writing assessment. College ESL, 6(1), 73-84.

Flahive, D. E., & Snow, B. G. (1980). Measures of syntactic complexity in evaluating ESL compositions. In J. W. Oller & K. Perkins (Eds.), Research in language testing (pp. 171- 176). Rowley, MA: Newbury House.

Freedman, W. S. (1993). Linking large-scale testing and classroom portfolio assessments of student writing. Educational Assessment, 1(1), 27-52.

Freedman, S. W., & Calfee, R. C. (1983). Holistic assessment of writing: Experimental design and cognitive theory. In P. Mosenthal, L. Tamor, & S. A. Walmsley (Eds.), Research on writing: Principles and methods (pp. 75–98). New York, NY: Longman.

Frodeson, J., & Starna, N. (1999). Distinguishing incipient and functional bilingual writers: Assessment and instructional insights gained through second-language writer profiles. In L. Harklau, K. Losely, & M. Siegal (Eds.), Generation 1.5 meets college composition: Issues in the teaching of writing to U.S.-educated learners of ESL (pp. 61-80). Mahwah, NJ: Lawrence Erlbaum.

Furneaux, C., & Rignall, M. (2007). The effect of standardization–training on rater judgements for the IELTS writing module. In L. Taylor & P. Falvey (Eds.), IELTS Collected Papers: Research in speaking and writing assessment (pp. 422–445). Cambridge, England: Cambridge University Press.

Gaies, S. J. (1980). T-unit analysis in second language research: Applications, problems and limitations. TESOL Quarterly, 14, 53-60.

Gebril, A. (2009). Score generalizability of academic writing tasks: Does one test method fit it all? Language Testing, 26(4), 507-531.

Guo, L., Crossley, S. A., & McNamara, D. S. (2013). Predicting human judgments of essay quality in both integrated and independent second language: A comparison study. Assessing Writing, 18, 218-238.

Hamp-Lyons, L. (1987). Performance profiles for academic writing. In K. M. Bailey, T. L. Dale, & R.T. Clifford (Eds.), Language testing research: Selected papers from the 1986 Colloquium, (pp. 78-92). Monterey, CA: Defense Language Institute.

Hamp-Lyons, L. (1989). Raters respond to rhetoric in writing. In H. W. Dechert & M. Raupauch (Eds.), Interlingual processes (pp. 229-244). Tübingen, Germany: Gunter Narr.

Hamp-Lyons, L. (Ed.). (1991). Assessing second language writing in academic contexts. Norwood, NJ: Ablex.

Hamp-Lyons, L. (1991). Basic concepts. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 5-15). Norwood, NJ: Ablex.

Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 241-276). Norwood, NJ: Ablex.

Hamp-Lyons, L. (1994). Interweaving assessment and instruction in college ESL writing classes. College ESL, 4(1), 43-55.

Hamp-Lyons, L. (1995). Rating non-native writing: The trouble with holistic scoring. TESOL Quarterly, 29, 759-762.

Hamp-Lyons, L., & Kroll, B. (1996). Issues in ESL writing assessment: An overview. College ESL Writing, 6(1), 57-72.

Hamp-Lyons, L. (1996). The challenges of second-language writing assessment. In E. M. White, W. D. Lutz, & S. Kamusikiri (Eds.), Assessment of writing: Politics, policies, practices (pp. 226-240). New York, NY: Modern Language Association.

Hamp-Lyons, L. (2001). Fourth generation writing assessment. In T. Silva & P. K. Matsuda (Eds.), On second language writing (pp. 117–125). Mahwah, NJ: Lawrence Erlbaum.

Hamp-Lyons, L., & Condon, W. (1993). Questioning assumptions about portfolio-based assessment. College Composition and Communication, 44, 176-190.

Hamp-Lyons, L., & Condon, W. (2000). Assessing the portfolio: Principles for practice, theory, and research. Cresskill, NJ: Hampton Press.

Hamp-Lyons, L., & Henning, G. (1991). Communicative writing profiles: An investigation of the transferability of a multiple-trait scoring instrument across ESL writing assessment contexts. Language Learning, 41, 337-373.

Hamp-Lyons, L., & Kroll, B. (1996). Issues in ESL writing assessment. College ESL, 6(1), 52- 72.

Hamp-Lyons, L., & Kroll, B. (1997). TOEFL 2000—Writing: Composition, community, and assessment. Princeton, NJ: Educational Testing Service.

Hamp-Lyons, L., & Mathias, S. P. (1994). Examining expert judgments of task difficulty on essay tests. Journal of Second Language Writing, 3, 49-68.

Hanania, E., & Shikhani, M. (1986). Interrelationships among three tests of language proficiency: Standardized ESL, cloze, and writing. TESOL Quarterly, 20(1), 97-109.

Haswell, R. H. (1998). Searching for Kiyoko: Bettering mandatory ESL writing placement. Journal of Second Language Writing, 7(2), 133-174.

Hayes, J. R., Hatch, J. A., & Silk, C. M. (2000). Does holistic assessment predict writing performance? Estimating the consistency of student performance on holistically scored writing assignments. Written Communication, 17(1), 3-26.

He, L., & Shi, L. (2008). ESL students’ perceptions and experiences of standardized English writing tests. Assessing Writing, 13(2), 130-149.

He, L., & Shi, L. (2012). Topical knowledge and ESL writing. Language Testing, 29(3), 443- 464.

Huang, J. (2008). How accurate are ESL students’ holistic writing scores on large-scale assessments? A generalizability theory approach. Assessing Writing, 13, 201-218.

Hayward, M. (1990). Evaluations of essay prompts by nonnative speakers of English. TESOL Quarterly, 24(4), 753-758.

Henning, G., & Davidson, F. (1987). Scalar analysis of composition ratings. In K. M. Bailey, T. L. Dale, & R. T. Clifford (Eds.), Language testing research: Selected papers from the 1986 Colloquium (pp. 24-38). Monterey, CA: Defense Language Institute,

Herman, J. L., Aschbacher, P. R., & Winters, L. (1992). A practical guide to alternative assessment. Alexandria, VA: Association for Supervision and Curriculum Development.

Herzog, M. (1988). Issues in writing proficiency assessment. Section 1: The Government Scale. In P. Lowe & C. W. Stansfield (Eds.), Second language proficiency assessment: Current issues (pp. 149-177). Englewood Cliffs, NJ: Prentice Hall Regents.

Homburg, T. J. (1984). Holistic evaluation of ESL compositions: Can it be validated objectively? TESOL Quarterly, 18(1), 87-109.

Huot, B. (1990). The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research, 60(2), 237-263.

Huot, B. (1990). Reliability, validity, and holistic scoring: What we know and what we need to know. College Composition and Communication, 41, 201-213.

Huot, B. (1993). The influence of holistic scoring procedures on reading and rating student essays. In M. Williamson & B. Huot (Eds.), Validating holistic scoring for writing assessment (pp. 206-236). Cresskill, NJ: Hampton Press.

Ishikawa, S. (1995). Objective measurement of low-proficiency EFL narrative writing. Journal of Second Language Writing, 4(1), 51-69.

Jacobs, H. L., Zinkgraf, S. A., Wormuth, D. R. Hartfiel, V. F., & Hughey, J. B. (1981). Testing ESL composition: A practical approach. Rowley, MA: Newbury House.

Jeffery, J. (2009). Constructs of writing proficiency in US state and national writing assessments: Exploring variability. Assessing Writing, 14(1), 3-24.

Johnson, J. S., & Lim, G. S. (2009). The influence of rater language background on writing performance assessment. Language Testing, 26, 485–505.

Kaczmarek, C. M. (1980). Scoring and rating essay tasks. In J. W. Oller & K. Perkins (Eds.), Research and language testing (pp. 151-159). Rowley, MA: Newbury House.

Katz, A. (1988). Issues in writing proficiency assessment. Section 2: The Academic Context. In P. Lowe & C. W. Stansfield (Eds.), Second language proficiency assessment: Current issues (pp. 178-201). Englewood Cliffs, NJ: Prentice Hall Regents.

Kellogg, R. T. (1988). Attentional overload and writing performance: Effects of rough draft and outline strategies. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(2), 355-365.

Kinshi, K., Kuru, Y., Masaki, M., Yamanishi, H., & Otoshi, J. (2011). Revising a writing rubric for its improved use in the classroom. LET Kansai Chapter Collected Papers, 13, 113- 124.

Klobucar, A., Elliot, N., Deess, P., Rudniy, O., & Joshi, K. (2013). Automated scoring in context: Rapid assessment for placed students. Assessing Writing, 18(1), 62-84.

Knoch, U. (2008). The assessment of academic style in EAP writing: The case of the rating scale. Melbourne Papers in Language Testing, 13(1), 34-67.

Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Lan- guage Testing, 26(2), 275-304.

Knoch, U. (2009). Diagnostic writing assessment: The development and validation of a rating scale. Frankfurt, Germany: Peter Lang.

Knoch, U., Read, J., & von Randow, J. (2007). Re-training raters online: How does it compare with face-to-face training? Assessing Writing, 12, 26–43. Used for raters/rating

Knoch, U., & Sitajalabhorn, W. (2013). A closer look at integrated writing tasks: Towards a more focused definition for assessment purposes. Assessing Writing, 18(4), 300-308.

Kobayashi, T. (1992). Native and nonnative reactions to ESL compositions. TESOL Quarterly, 26, 81–112.

Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19(1), 3-31.

Kroll, B. (1998). Assessing writing abilities. Annual Review of Applied Linguistics, 18, 219-240.

Kroll, B., & Reid, J. (1994). Guidelines for designing writing prompts: Clarifications, caveats, and cautions. Journal of Second Language Writing, 3, 231-255.

Kroll, B., & Reid, J. (1995). Designing and assessing effective classroom writing assignments for NES and ESL students. Journal of Second Language Writing, 4, 17-41.

Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated Essay Scoring: A cross disciplinary perspective. In M. D. Shermis & J. C. Burstein (Eds.), Automated Essay Scoring and annotation of essays with the Intelligent Essay Assessor (pp. 87-112). Mahwah, NJ: Lawrence Erlbaum.

Lee, I. (2007). Assessment for learning. Integrating assessment, teaching, and learning in the ESL/EFL writing classroom. The Canadian Modern Language Review, 64(1), 199-214.

Lee, M. B., Shin, D. G., Yun, J. H., Cho, B. K., Park, T. J., Lee, D. J., & Song, M. Y. (2011). A study of developing English teaching methods and curriculum on a new era of national English testing of speaking and writing. Seoul, Korea: Korea Institute for Curriculum and

Evaluation. Lee, Y. (2006). The process-oriented ESL writing assessment: Promises and challenges. Journal of Second Language Writing, 15(4), 307-330.

Lee, Y. W., & Kantor, R. (2005). Dependability of ESL writing test scores: Evaluating prototype tasks and alternative rating schemes. (TOEFL Monograph MS-31). Princeton, NJ: Educational Testing Service. Retrieved from http://www.ets.org/Media/Research/pdf/RR- 05-14.pdf

Lei, S., & Xiaohong, W. (2005). A new and practical way of monitoring examiner reliability in assessing writing. In A. McNeill & J. Lai (Eds.), Crosslinks in English language teaching, (Vol. 2) (pp. 95-104). Hong Kong: English Language Teaching Unit, Chinese University of Hong Kong.

Lemaire, B. & Dessus, P. (2001). A system to assess the semantic content of student essays. Educational Computing Research, 24(3), 305-306.

Lenski, S. & Verbruggen, F. (2010). Writing instruction and assessment for English language learners K-8. New York, NY: Guilford.

Lim, G. S. (2010). Prompt and rater effects in second language writing performance assessment. Cambridge ESOL Research Notes, 42, 39

Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28, 543–560.

Llosa, L. (2005). Assessing English learners’ language proficiency: A qualitative investigation of teachers’ interpretations of the California ELD standards. The CATESOL Journal, 18(1), 7-18.

Llosa, L., Beck, S. W., & Zhao, C. G. (2011). An investigation of academic writing in secondary schools to inform the development of diagnostic classroom assessments. Assessing Writing, 16, 256-273.

Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474-496.

Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246-276.

Lumley, T. (2005). Assessing second language writing: The rater’s perspective. New York, NY: Peter Lang.

Luoma, S., & Tarnanen, M. (2003). Creating a self-rating instrument for second language writing: From idea to implementation. Language Testing, 20(4), 440-465.

Matsuno, S. (2009). Self-, peer-, and teacher-assessments in Japanese university EFL writing classrooms. Language Testing, 26, 75-100.

Myford, C. M., Marr, D. B., & Linacre, J. M. (1996). Reader calibration and its potential role in equating for the Test of Written English. (TOEFL Research Report No. 52, RR-95-40). Princeton, NJ: Educational Testing Service. Retrieved from http://www.ets.org/Media/Research/pdf/RR-95-40.pdf

Mullen, K. A. (1980). Evaluating writing proficiency in ESL. In J. W. Oller & K. Perkins (Eds.), Research in language testing (pp. 160-170). Rowley, MA: Newbury House.

Mullen, K. A. (1977). Using rater judgments in the evaluation of writing proficiency for nonnative speakers of English. In H. D. Brown, C. A. Yorio, & R. H. Crymes (Eds.), On TESOL ‘77: Teaching and learning ESL - Trends in research and practice (pp. 309-320). Washington, DC: TESOL.

Mullis, I. (1974). The primary trait system of scoring writing tasks. Denver, CO: National Assessment of Educational Progress.

Myers, M. (2003). What can computers and AES contribute to a K–12 writing program? In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 3-20). Mahwah, NJ: Lawrence Erlbaum.

Nevo, D. (1986). Comments on Stansfield: A history of the Test of Written English: The developmental year. Language Testing, 3(2), 235-236.

Norton, B., & Starfield, S. (1997). Covert language assessment in academic writing. Language Testing, 14(3), 278-294.

Odlin, T. (1987). Some problems concerning the interpretation of passage correction tests. In K. M. Bailey, T. L. Dale, & R.T. Clifford (Eds.), Language testing research: Selected papers from the 1986 Colloquium (pp. 70-77). Monterey, CA: Defense Language Institute.

O'Sullivan, B., & Rignall, M. (2007). Assessing the value of bias analysis feedback to raters for the IELTS writing module. In L. Taylor & P. Falvey (Eds.), IELTS Collected Papers: Research in speaking and writing assessment (pp. 446–478). Cambridge, England: Cambridge University Press.

Page, E. B. (1994). Computer grading of student prose, using modern concepts and software. Journal of Experimental Education, 62, 127-142.

Perkins, K. (1980). Using objective methods of attained writing proficiency to discriminate among holistic evaluations. TESOL Quarterly, 14(1), 61-67.

Perkins, K. (1983). On the use of composition scoring techniques, objective measures, and objective tests to evaluate ESL writing ability. TESOL Quarterly, 14(4), 651-671.

Peterson, R. (1995). The writing teacher’s companion: Planning, teaching, and evaluating in the composition classroom. Boston, MA: Houghton Mifflin.

Plakans, L. (2010). Independent vs. integrated writing tasks: A comparison of task representation. TESOL Quarterly, 44(1), 185-194.

Polio, C. G. (1997). Measures of linguistic accuracy in second language writing research. Language Learning, 47(1), 103-143.

Polio, C. & Williams, J. (2009). Teaching and testing writing. In M. H. Long & C. J. Doughty (Eds), The handbook of language teaching (pp. 486-517). West Sussex, UK: Wiley- Blackwell.

Pollitt, A., & Hutchinson, C. (1987). Calibrating graded assessments: Rasch partial credit analysis of performance in writing. Language Testing, 4(1), 72-92.

Porto, M. (2001). Cooperative writing response groups and self-evaluation. ELT Journal, 55(1), 38-46.

Powell, B. (1980). ESL and composition: A report. TESOL Newsletter, 14(3), 15-16.

Powers, D., Burstein, J., Chodorow, M., Fowles, M., & Kukich, K. (2001). Stumping E-rater: Challenging the validity of automated essay scoring (Research Report 01-03). Princeton, NJ: Educational Testing Service.

Pula, J. J., & Huot, B. A. (1993). A model of background influences on holistic raters. In M. M. Williamson & B. A. Huot (Eds.), Validating holistic scoring for writing assessment: Theoretical and empirical foundations (pp. 237–265). Cresskill, NJ: Hampton Press.

Qian, D. (2005). Assessing English writing for professional communication: A perspective from employers. In A. McNeill & J. Lai (Eds.), Crosslinks in English language teaching, (Vol. 2) (pp. 105-122). Hong Kong: The Chinese University of Hong Kong.

Raimes, A. (1990). The TOEFL test of written English: Causes for concern. TESOL Quarterly, 24(3), 427-442.

Read, J. (1990). Providing relevant content in an EAP writing test. English for Specific Purposes, 9(2), 109-121.

Reid, J. (1986). Using the Writer’s Workbench in composition teaching and testing. In C. W. Stansfield (Ed.), Technology and language testing (pp. 167-187). Washington, DC: TESOL.

Rinnert, C., & Kobayashi, H. (2001). Differing perceptions of EFL writing among readers in Japan. Modern Language Journal, 85, 189–209.

Ruegg, R., Fritz, E., & Holland, J. (2011). Rater sensitivity to qualities of lexis in writing. TESOL Quarterly, 45(1), 63-80.

Ruiz-Funes, M. (1999). The process of reading-to-write used by a skilled Spanish-as-a-foreign- language student: A case study. Foreign Language Annals, 32(1), 45-58.

Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88, 413–428.

Sakyi, A. A. (2000). Validation of holistic scoring for ESL writing assessment: How raters evaluate compositions. In A. J. Kunnan (Ed.), Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida (pp. 129–152). Cambridge, England: Cambridge University Press.

Santos, T. (1988). Professors' reactions to the academic writing of nonnative-speaking students. TESOL Quarterly, 22, 69–90.

Sasaki, M., & Hirose, K. (1999). Development of an analytic rating scale for Japanese L1 writing. Language Testing, 16(4), 457-478.

Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25, 465-493.

Schoonen, R., Vergeer, M., & Eiting, M. (1997). The assessment of writing ability: Expert readers versus lay readers. Language Testing, 14, 157–184.

Sedgwick, C. (2007). Survey to investigate expectations of achievement in written English on English language degree programmes in Europe. Language Assessment Quarterly, 4(3), 235-256.

Shaw, S. D., & Weir, C. J. (2007). Examining writing: Research and practice in assessing second language writing. Studies in Language Testing, 26, Cambridge, UK: UCLES/Cambridge University Press.

Shermis, M. D. & Burstein, J. (2003). Automated Essay Scoring: A cross disciplinary perspective. Mahwah, NJ: Lawrence Erlbaum.

Shi, L. (2001). Native- and nonnative-speaking EFL teachers’ evaluation of Chinese students’ English writing. Language Testing, 18(3), 303-325.

Singh, A. K. J., & Harun, R. N. S. B. R. (2013). Peer editing process through Wikispaces in correcting L2 students’ writing. The Asian Journal of English Language and Pedagogy, 1, 119-147.

Smith, D. (2000). Rater judgments in the direct assessment of competency-based second language writing ability. In G. Brindley (Ed.), Studies in immigrant English language assessment (pp. 159–190). Sydney, Australia: National Centre for English Language Teaching and Research, Macquarie University.

Song, B., & Caruso, I. (1996). Do English and ESL faculty differ in evaluating the essays of native English-speaking and ESL students? Journal of Second Language Writing, 5, 163– 182.

Stansfield, C. W. (1986). A history of the Test of Written English: The developmental year. Language Testing, 3(2), 224-234.

Stansfield, C. W., & Ross, J. (1988). A long-term research agenda for the Test of Written English. Language Testing, 5(2), 160-186.

Sullivan, K., & Lindgren, E. (2003). Self-assessment in autonomous computer-aided second language writing. ELT Journal, 56(3), 258-266.

Sweedler-Brown, C. O. (1993). The effects of ESL errors on holistic scores assigned by English composition faculty. College ESL, 3(1), 53-69.

Tedick, D. J. (1990). ESL writing assessment: Subject-matter knowledge and its impact on performance. English for Specific Purposes, 9(2), 123-143.

Terry, R. M. (1986). Testing the productive skills: A creative focus for hybrid achievement tests. Foreign Language Annals, 1(6), 521-528.

Troia, G. A. (Ed.). (2010). Instruction and assessment for struggling writers. New York, NY: Guilford.

Usman, E. M. (2004). Exploring variability in judging writing ability in a second language: A study of four experienced raters of ESL compositions. TOEFL research report no. RR- 70. Princeton, NJ: Educational Testing Service.

Valdés, G., Garo, P., & Echevarriarza, M. P. (1992). The development of writing abilities in a foreign language: Contributions toward a general theory of L2 writing. The Modern Language Journal, 76(3), 333-352.

Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education, (2), 319-330.

Vaughan, C. (1991). Holistic assessment: What goes on in the rater's mind? In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 111-125). Norwood, NJ: Ablex.

Wang, P. (2015). Effects of an automated writing evaluation program: Student experiences and perceptions. Electronic Journal of Foreign Language Teaching, 12(1), 79–100.

Weigle, S. C. (1994). Effects of training on raters of ESL compositions. Language Testing, 11(2), 197-223.

Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287.

Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6(2), 145-178.

Weigle, S. C. (2000). Test review: The Michigan English Language Assessment Battery. Language Testing, 17(4), 449-455.

Weigle, S. C. (2002). Assessing writing. Cambridge, UK: Cambridge University Press.

Weigle, S. C. (2004). Integrating reading and writing in a competency test for non-native speakers of English. Assessing Writing, 9, 28-47.

Weigle, S. C. (2007). Teaching writing teachers about assessment. Journal of Second Language Writing, 16, 3,194-209. Weigle, S. C. (2010). Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing ability. Language Testing, 27, 3, 335-353.

Weigle, S. C. (2011). Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing ability. TOEFL iBT Research Report TOEFL iBT-15. Princeton, NJ: Educational Testing Service.

Weigle, S. C., & Jensen, L. (1996). Reading rate improvement in University ESL classes. CATESOL Journal, 55-71. 16 177 Webster St., #220, Monterey, CA 93940 USA Web: www.tirfonline.org / Email: [email protected] The International Research Foundation for English Language Education

Weigle, S. C., & Jensen, L. (1997). Issues in assessment for content-based instruction. In M. A. Snow & D. Brinton, The content-based classroom: Perspectives on integrating language and content (pp. 210-212). White Plains, NY: Longman.

Weigle, S. C., & Lynch, B. (1996). Hypothesis testing in construct validation. In A. Cumming & R. Berwick (Eds.), Validation in language testing (pp. 58-71). Clevedon, UK: Multilingual Matters.

Weigle, S. C., & Nelson, G. (2001). Academic writing for university examinations. In I. Leki (Volume Ed.), Academic writing programs: Case studies in TESOL practice series (pp. 121-135). Alexandria, VA: TESOL.

Weir, C.J. (2013). The measurement of writing ability 1913 – 2012. In C.J. Weir, I. Vidaković, & E.D. Galaczi (Eds.), Measured constructs: A history of English language examinations 1913-2012. Studies in Language Testing 37 (pp. 180-256). Cambridge, UK: Cambridge University Press.

Weir, C., O’Sullivan, B., Yan, J., & Bax, S. (2007). Does the computer make a difference? The reaction of candidates to a computer-based versus a traditional hand-written form of the IELTS writing component: Effects and impact. In P. McGovern & S. Walsh (Eds.), IELTS Research Reports (Vol. 7) (pp. 311-347). Canberra, Australia: IELTS Australia.

White, E. M. (1994). Teaching and assessing writing: Recent advances in understanding, evaluating, and improving student performance (2nd ed.). San Francisco, CA: Jossey- Bass.

White, E. M. (1995). An apologia for the timed impromptu essay test. College Composition and Communication, 46(1), 30-45.

Williamson, M., & Huot, B. (Eds.). (1993). Holistic scoring: Theoretical foundations and validation research. Cresskill, NJ: Hampton Press.

Wolfe, E. W. (1997). The relationship between essay reading style and scoring proficiency in a psychometric scoring system. Assessing Writing, 4, 83–106.

Wolfe, E. W., Kao, C. W., & Ranney, M. (1998). Cognitive differences in proficient and nonproficient essay scorers. Written Communication, 15, 465–492.

Worden, D. L. (2009). Finding process in product: Prewriting and revision in timed essay responses. Assessing Writing, 14(3), 157-177.

Xianrong, H., Jinding, P., Lijun, L., & Liyuan, J. (2005). Interactive composition evaluation in the multimedia classroom. In A. McNeill & J. Lai (Eds.), Crosslinks in English language teaching, (Vol. 2) (pp. 45-62). Hong Kong: The Chinese University of Hong Kong.

Yancey, K. B. (Ed.). (1992). Portfolios in the writing classroom. Urbana, IL: National Council of Teachers of English.

Zhao, C. G., & Llosa, L. (2008). Voice in high-stakes L1 academic writing assessment: Implications for L2 writing instruction. Assessing Writing, 13(3), 153-170.

18 177 Webster St., #220, Monterey, CA 93940 USA Web: www.tirfonline.org / Email: [email protected]