Designing and Sustaining a Foreign Language Writing Proficiency

Home , Assessing Writing, Foreign Language Annals

Foreign Language Annals VOL. 48, NO. 3 329

Designing and Sustaining a Foreign Language Writing Proﬁciency Assessment Program at the Postsecondary Level

Elizabeth Bernhardt Stanford University Joan Molitoris Stanford University Ken Romeo Stanford University Nina Lin Stanford University Patricia Valderrama Stanford University

Abstract: Writing in postsecondary foreign language contexts in North America has received far less attention in the curriculum than the development of oral proficiency. This article describes one institution’s process of confronting the challenges not only of recognizing the contribution of writing to students’ overall linguistic development, but also of implementing a program-wide process of assessing writing proficiency. The article reports writing proficiency ratings that were collected over a 5-year period for more than 4,000 learners in 10 languages, poses questions regarding the proficiency

Elizabeth Bernhardt (PhD, University of Minnesota) is Professor of German Studies and John Roberts Hale Director of the Language Center, Stanford University, Stanford, CA. Joan Molitoris (PhD, Columbia University) is Lecturer in Spanish and Associate Director of the Language Center, Stanford University, Stanford, CA. Ken Romeo (PhD, Stanford University) is Lecturer in English for Foreign Students and Academic Technology Specialist for the Language Center, Stanford University, Stanford, CA. Nina Lin (MA, Stanford University) is Lecturer in Chinese in the Language Center, Stanford University, Stanford, CA. Patricia Valderrama (PhD candidate, Stanford University) is Graduate Teaching Associate in Comparative Literature, Stanford University, Stanford, CA. Foreign Language Annals, Vol. 48, Iss. 3, pp. 329–349. © 2015 by American Council on the Teaching of Foreign Languages. DOI: 10.1111/flan.12153 330 FALL 2015

levels that postsecondary learners achieved in English. This phenomenon stands in across 2 years of foreign language instruction, stark contrast to the field of English as a and relates writing proficiency scores to Sim- second language, which attempts to prepare ulated Oral Proficiency Interview ratings for a English language learners with the skills subset of students. The article also articulates that they will need to pursue bachelor’sor the crucial relationship between professional post-bachelor’s degrees in English-speaking development and writing as well as the role of countries—a project that necessarily entails technology in collecting and assessing writing copious amounts of academic writing. A samples. number of research studies such as Leki (1995) and Leki and Carson (1997) exist Key words: assessment, oral proficiency, on this topic and have been synthesized technology, writing proficiency succinctly by Hedgcock (2005). Another speculation for the lack of focus on foreign language writing is that writing is a planned language performance, in Introduction contrast to the spontaneous language per- Writing in postsecondary foreign language formance of oral proficiency, and, hence, is contexts in North America has received far viewed as less demanding. The language less attention in the curriculum than the development research that has dominated development of oral proficiency. While at studies in second language acquisition has least one extensive volume on the impor- been principally rooted in oral assessments, tance of advanced writing in foreign lan- with literacy (writing or reading) rarely ac- guage contexts exists (Byrnes, Maxim, & knowledged as an important dimension of Norris, 2010), there remain a number of input (Bernhardt, 2011). With the excep- reasons that account for the lack of both tion of Byrnes and colleagues (2010), who research-based and instruction-based atten- contended that writing provides learners tion to the development of foreign language with the ability to perform in genres and writing throughout the early years of lan- hence is a “particularly valued indicator of guage acquisition. First, the focus of foreign overall FL development toward upper levels language instruction at the postsecondary of ability” (p. 4), most research has ignored level in the United States is most often on learners’ abilities to integrate spoken and oral proficiency goals. Most foreign lan- written texts; even fewer studies referred guage programs intend to prepare learners to foreign language writing in terms of rela- to speak and listen and to read so that they tively lengthy connected discourse. Reichelt are able to negotiate overseas foreign set- (1999) thoroughly reviewed these and other tings with confidence. Hence, oral profi- dilemmas confronted by the context of for- ciency is emphasized and writing in these eign language writing. contexts is often relegated to exercises that The development of the foreign lan- reveal learners’ acquisition of grammatical guage profession itself from the 1980s on- forms or developing breadth of vocabulary, ward has lacked perspective on writing to the thank-you note to foreign hosts or the development. In the early years of the pro- personal resume, or as a means of measur- ficiency movement, the focus was exclusive- ing syntactic complexity. In upper-level ly on oral proficiency and centered on a courses, programs may even allow compo- consistent measure of student performance, sitions to be written in the native language most especially in their oral performance. of students, English in the American con- Admittedly, the ACTFL Proficiency Guide- text, in order to facilitate deeper literary and lines entailed reading, writing, listening, cultural interpretations. Even dissertations and speaking from their initial publication produced in foreign language departments in 1982, with the primary emphasis on oral in many American universities are written proficiency. This was evidenced by a full- Foreign Language Annals VOL. 48, NO. 3 331

scale certification process, attached only to proficiency ratings for a subset of learners oral proficiency interviewing and rating and provides insight into the role of tech- with concomitant recertification possibili- nology and its influences on writing perfor- ties, that was developed throughout the mance. Specifically, it poses the following 1980s. The potential for assessment in read- questions: ing, listening, and writing, comparable to oral proficiency assessment, remained un- 1. What writing proficiency ratings are tapped for more than a decade. Writing learners able to achieve across the levels proficiency has, of course, been included of a 2-year foreign language sequence? from the inception of the formal discussion Are there differences based in English around proficiency rooted in the Foreign cognate vs. English noncognate Service Institute (FSI) guidelines. An early languages? version of the ACTFL guidelines that in- 2. What is the relationship between foreign cluded all four skills appeared in 1986, language learners’ oral proficiency rat- and a further version also accompanied ings and their writing proficiency rat- the revised guidelines for oral proficiency ings? Are there differences in the in 1999 (Breiner-Sanders, Lowe, Miles, & relationship based in English cognate Swender, 2000). In 2001, the Preliminary and English noncognate languages? Proficiency Guidelines–Writing Revised 3. Does technology use influence foreign (Breiner-Sanders, Swender, & Terry, language writing assessment? 2002) were published. In parallel to the process involving oral proficiency, the writing guidelines were widely distributed, and ultimately a training and certification pro- Literature Review cedure was put into place by 2008. Further, Several studies have examined foreign lan- a writing protocol, the Writing Proficiency guage writing as a support mechanism for Test (WPT), was developed and made avail- language learning and have used or men- able commercially via online delivery as tioned the ACTFL guidelines as bench- well as in hard copy. In spite of the consid- marks. Armstrong (2010) compared erable research that exists on oral proficien- graded compositions; for-credit online dis- cy attainment, i.e., the number of hours of cussion boards; and ungraded, not-for-cred- instruction related to proficiency levels it essays in order to determine the effect of (Omaggio, 1986) and the ultimate attain- grades on foreign language writing in a ment of language majors in their oral profi- fourth-semester Spanish class. Specifically, ciency (Glisan, Swender, & Surface, 2013; she tried to understand differences in the Swender, 2003), limited published data ex- accuracy, fluency, and complexity between ist with regard to the role of writing in graded and ungraded assignments, building foreign language programs or the use of on previous work done with the ACTFL the ACTFL writing guidelines. writing guidelines in the curriculum; Arm- In light of the complexities of consider- strong suggested that more frequent and ing the role of writing in the postsecondary ungraded writing assignments should be foreign language programs, the limited re- incorporated into the foreign language search base, and a continued focus on oral classroom, since assessment had little effect proficiency, this article describes a pro- on student writing. Brown, Brown, and Eg- gram-wide approach to assessing students’ gett (2009) also looked to writing as a writing proficiency. It offers writing profi- mechanism for enhancing language devel- ciency data from for 4,476 postsecondary opment. They described a curriculum for learners in 10 languages across 2 years of third-year foreign language courses that foreign language instruction. Further, it was grounded in content-based instruction compares those writing data with oral to enhance written argumentation and to 332 FALL 2015

differentiate it from oral production. The validity of the ACTFL guidelines for writ- aim of the curricular shift was to help stu- ing. Brown, Solovieva, and Eggett (2011) dents cross the Intermediate-Advanced bor- also examined writing and described a cur- der, as defined by the writing guidelines. riculum for Advanced- and Superior-level They found that a focus on Advanced- L2 writing in Russian and discussed the use and Superior-level writing tasks proved sta- of both quantitative and qualitative evalua- tistically significant, as measured by pre- tion measures. The authors conducted a and post-WPT ratings over the course of a quantitative analysis of complexity meas- single semester. Similarly, Dodds (1997) ures in writing samples to understand up- looked to content-based instruction using take in writing proficiency. In comparing film to develop German language proficien- the holistic, qualitative WPT ratings with cy. Her curricular experiment, using Ger- these quantitative measures, their findings man films and television series as the demonstrated the importance of using both context for writing assignments and class- types of measures when analyzing L2 writ- room discussion, indicated that using the ing and led the researchers to question the writing proficiency guidelines as an orga- limits of outcome-based courses incorporat- nizing principle improved student perfor- ing proficiency scales in the curriculum. mance by providing the basis for clear goals Other studies examined the ACTFL for the course. Godfrey, Treacy, and Tarone guidelines for writing in and of themselves. (2014) turned to a study abroad context for Hubert (2013), for example, compared col- examining the foreign language writing pro- legiate students’ scores on ACTFL Oral Pro- cess. In their project, they compared the ficiency Interviews (OPIs) and WPTs in development of writing skills of a group order to better understand how the develop- of study abroad students with those of a ment of speaking and writing proficiencies domestic group and investigated how the was related. The study showed a “fairly students’ ratings on the ACTFL WPT relat- strong” positive correlation between speak- ed to the “more fine-grained measures” (p. ing and writing proficiencies among the stu- 51) of fluency, accuracy, and complexity of dents across beginning, intermediate, and linguistic form, as well as the form-function advanced Spanish. However, that correla- mapping of making claims and providing tion weakened when measured in each supporting evidence. The authors ended by course individually. Hubert concluded that advocating for a more multidimensional ap- speaking and writing proficiencies improved proach in the evaluation of second language at similar rates when viewed on a global level (L2) writing development. and with a long-term perspective and called Findings from two studies challenged for a pedagogical approach that enhanced theuseoftheproficiency guidelines for proficiency across both modalities. mapping development. Henry (1996) exam- Rifkin (2005) conducted a longitudinal ined characteristics of short essays in Rus- analysis of listening, reading, speaking, and sian at four levels of collegiate study in order writing proficiency of students in the Mid- to contribute to the empirical testing of the dlebury College summer immersion pro- ACTFL proficiency guidelines for writing. gram, the majority of whom were In response to Valdes, Haro, and Echevar- collegiate language learners. He measured riarza (1992), she questioned whether the proficiency in each area through computer- guidelines could be used to build a general ized tests based on the ACTFL proficiency theory of L2 writing, specifically for Novice- guidelines for listening, reading, and writing, and Intermediate-level students. Her re- and either ACTFL-certified OPIs or oral ex- search partially supported the existence of ams modeled on OPIs. He found a significant an early writing stage similar to the ACTFL’s correlation between the four modalities and Novice-level descriptors, and she concluded hours of classroom instruction (in immer- by expressing continued doubt about the sion and nonimmersion settings) as well as Foreign Language Annals VOL. 48, NO. 3 333

grammatical competence. In his data, speak- Rifkin (2005), she noted a ceiling effect ing and writing proficiencies showed the caused by the exponential nature of the closest relationship. Overall, the first part ACTFL scale and suggested that this posed of his research provided an overview of post- significant problems for developing tests secondary, nonimmersion Russian language based on the guidelines for reading, writing, instruction in the United States and sug- and listening (Clifford & Cox, 2013; Cox & gested that more than 600 hours of instruc- Clifford, 2014; Glisan et al., 2013). tion were required to bring students to As a whole, these studies provide a Advanced-level reading, writing, speaking, suggestive yet shadowy knowledge base re- and listening proficiency in a noncognate garding foreign language writing specifical- language. The second part of his research ly in postsecondary settings. The field needs calculated “the benefit of immersion” (Rif- to continue to investigate a range of valid, kin, 2005, p. 10), which he deemed more reliable, and efficient tools for examining efficient in bringing students to higher pro- and gauging foreign language writing profi- ficiency levels. Rifkin suggested a possible ciency and thus develop a deeper under- ceiling effect in the higher levels of the standing of what writing as writing—as a ACTFL proficiency pyramid in the tradition- process in and of itself rather than as a tool al university course sequence that offers 400 for grammatical practice—provides the hours or fewer of instruction over 4 years. He field as an insight into written language concluded by advocating for changes in cur- development, a view passionately and co- ricular policy that allowed for more immer- gently expressed by Byrnes et al. (2010). sion experiences as well as more hours of classroom instruction, and that integrated The Institutional Context the teaching of grammar and syntax. I. Thompson (1996) rated the speak- Establishing a Proficiency-Oriented ing, reading, listening, and writing profi- Curriculum ciencies of students of Russian after 1, 2, The trajectory of the Stanford Language 3, 4, and 5 years of study using tests based Center, established in 1995, is parallel to on the ACTFL guidelines for each area. The that of the modern foreign language profes- goal was assessing whether the proficiency sion across the same time period. In its descriptors were realistic and attainable for beginnings, it too emphasized oral profi- collegiate foreign language programs, and ciency without apology. First, because whether there existed a significant positive oral proficiency is the most difficult skill correlation between proficiency levels in the to acquire in formal settings, it was impor- four skills as well as between the four skills tant to measure student progress within this and levels of study. Her data revealed over- challenging context. Second, oral proficien- lapping ranges of performances with no cy was the dimension of language study exact correspondence between levels of perceived as lacking by the wider university study and levels of proficiency in the four community at the founding of the Language modalities. Although the median proficien- Center (Stanford University Board of Trust- cy level increased with each additional year ees, 1994). Third, a nationally recognized of study, students with no change in profi- scale and a concomitant training program, ciency scores were found at almost all levels namely, the ACTFL/FSI scale and related of study. The four modalities themselves rater and interview workshops attached to were found to have a significant positive the OPI, were available. Further, a cost-ef- correlation, although none of the correla- fective, validated mechanism existed that tions “were particularly impressive” (I. enabled large-scale assessment in the form Thompson, 1996, pp. 54–55). Thompson of the Simulated Oral Proficiency Interview, concluded that each skill followed a slightly or SOPI (Kenyon & Tschirner, 2000; Ma- different and nonparallel development. Like lone, 2000; Shohamy, Gordon, Kenyon, & 334 FALL 2015

Stansfield, 1989; Stansfield & Kenyon, speaking, was integrated within the presen- 1992). The SOPI has been used successfully tational or interpersonal mode according to in large-scale program assessment (R. the type of task, purpose, and audience and Thompson et al., 2014). was also articulated developmentally From 1995 on, the Language Center throughout the course sequence as it became conducted program-level assessments of progressively more complex and demon- oral proficiency development (Bernhardt, strated features of increasing proficiency. 1997, 2009), using the SOPI for all learners When the ACTFL scale for assessing exiting a course sequence and the OPI in writing proficiency, after years of discus- subsets of those same learners. The intention sion, debate, and refinement, was finalized, of this systematic assessment program was it followed the general outline of the oral twofold: (1) to be able to document the proficiency scale and focused on functional progress of students through programs in writing ability in a foreign language by mea- Spanish, French, Portuguese, Italian, Ger- suring the performance of specific writing man, Russian, Arabic, Chinese, Japanese, tasks against the criteria stated in the Pre- Korean, and Hebrew, ensuring that students liminary Proficiency Guideline–Writing Re- met established benchmarks across their vised (Breiner-Sanders et al., 2002). In language learning experiences, and (2) to parallel to the OPI scale, the writing scale examine the extent to which individual pro- also had an assessment as well as a certifi- grams met and perhaps exceeded their stated cation procedure for raters attached to it: objectives. Findings from this process have the WPT. What it did not have was a vali- been documented throughout the literature, dated, simulated protocol for writing that most recently by Bernhardt and Brillantes was parallel to the SOPI, which could ac- (2014), and all data reported are available commodate wide-scale programmatic as- at http://www.language.stanford.edu sessment at a reasonably low cost. Alongside assessment, the Language In 2007, the staff at the Stanford Lan- Center also implemented a curricular model guage Center took up the challenge of devel- based on the most current research in L2 oping and piloting a protocol that would literacies. Target objectives were developed capture the intention of the WPT, provide in each language program, following a pro- the potential for appropriate test statistics in totype crafted in 1997 by Spanish and Por- relation to the WPT, and do so in a cost- tuguese language instructors (representing effective and efficient manner. It was impor- the largest enrollments) and respectful of tant to take up the challenge of adding writ- the unique features of each language (Bern- ing assessment to the already-established hardt, Valdes, & Miano, 2009). These docu- systematic program in oral proficiency as- ments, now available in their revised sessment for three significant reasons. First, versions at http://www.language.stanford. adding writing provided a more complete edu, all have as their foundation the National picture of what students were actually able Standards for Foreign Language Learning to do with the language; in other words, it (National Standards, 2006), with particular was a view into their literacy, which is of the emphasis on the interpersonal, interpretive, utmost relevance to their academic future. and presentational modes of communica- Second, writing provided a concrete view tion. In contrast to more traditional curricu- into learners’ linguistic development, unaid- la, which are often textbook- or four-skill- ed by external supports that oral discourse, driven, these objectives laid out concrete particularly interactive speech, provides. A developmental goals by detailing what stu- third reason was that the ACTFL training dents should be able to do with the language and certification component attached to within each of the three modes and within writing assessment had recently been estab- and across courses that form a yearlong se- lished. Having this procedure available en- quence. Writing, specifically in parallel to abled continuing substantive professional Foreign Language Annals VOL. 48, NO. 3 335

development for the teaching staff, a key Ultimately, the collaborative group de- factor in the success of programs that aim veloped prompt types, based on the estab- to bring language learners to higher levels of lished proficiency-based program objectives proficiency. of the language program. Prompts were con- structed to elicit Novice- and Intermediate- level functions, with a “mini-probe” to test Maintaining and Enhancing a for the Advanced level, to elicit Advanced- Systemic Professional Development level functions with a probe-like task of in- Program creasing difficulty targeted at the Superior Over the years, professional programming at level. Using these prompts, two forms of the Language Center had been grounded in a writing assessment were created: a short crucial process: OPI tester training and certifi- form and a long form. The short form was cation. All staff, numbering approximately 65 intended for students completing the first full-time instructors across 14 languages, con- year of a noncognate language (150 hours tinue to participate in the initial stages of oral of instruction). The long form was adminis- proficiency interview training by attending the tered at the end of first-year cognate lan- corresponding workshop in a 2-day or 4-day guages (150 hours) and at the end of all format. Almost all complete the full 4-day second-year languages (300 hours), cognate training. More than 70% of instructors to and noncognate alike. This framework was date have received full certificationinoral consistent with the general structure of the interview rating and testing. Maintaining the SOPI, in that duration and type of tasks cor- momentum in this process to include writing responded to the anticipated proficiency was critical. Instructors began in 2008 to pur- range of the test-takers. Similarly, the WPA sue WPT rater training, adding it to their al- was structured to align with the proficiency ready established OPI certification. Within 5 objectives of the established curriculum and years, more than half of the entire language to respond to institutional constraints such teaching staff was ACTFL-certified in both oral as delivery within a 50-minute class session. rating and testing and in writing. In contrast to oral proficiency assessments, A successful professional development however, writing proficiency assessments are program pushes a staff forward intellectual- obviously not interactive. It was crucial ly. For the Language Center, this meant that therefore that each prompt elicit sufficient instructors showed a growing interest in writing that reflected the writer’sproficiency, writing and in including it as a corollary to and at the same time that a broad range of the already established oral assessment pro- contexts and functions be represented in a gram. This professional stance meant that given test. Sample prompts developed there needed to be parameters and prompts through professional collaboration are pro- for a writing proficiency assessment that vided in the Appendix. This form of a WPA would be consistent with instructors’ formal allowed ratable samples to be collected for knowledge about writing proficiency assess- noncognate as well as cognate languages, ment garnered through their WPT certifica- in first- as well as second-year language tion process. Hence, a core group of ACTFL courses. writing-certified instructors across English cognate and English noncognate languages collaborated on drafting a writing proficien- Using Technology in Writing cy assessment (WPA). The collaboration fo- Assessment cused on format, duration, and level and A critical arena within any modern institu- design of prompts; generated potential writ- tional context is technology. In the initial ing contexts; and contributed sample years of developing and implementing the prompts to a wiki, with an eye toward creat- writing assessment, tasks and topics were ing a template that all languages could use. administered on paper, which was 336 FALL 2015

complicated and time-consuming. For each and about how to ensure that learners were course section in a classroom, test prompts not relying on Web-based help such as had to be delivered from the administrative grammar and spell-checking as well as lo- offices and returned along with student re- cating and copying passages from the Web. sponses. Responses then had to be delivered to raters who then had to return them along Methods with the ratings, all without losing a single sheet of paper since the loss of a student Participants response was a potentially serious breach Students at Stanford University are required of student privacy as dictated by FERPA to complete the first year of language instruc- (Family Educational Rights and Privacy tion or its equivalent. Most students enter Act) as well as a potential threat to test integ- the university having completed the require- rity. The teaching staff also lamented the ment by testing out with an Intermediate inefficiency of this process. Mid level of oral proficiency as well gram- The obvious solution to this problem matical knowledge and/or scores of 5 on an was to administer the writing assessment via Advanced Placement examination. Each the university’s learning management sys- year, the approximately 800 students who tem (LMS). Enrollment information was do not test out of the language requirement readily available in a system built to be complete either a first-year (150 hours of secure, and prompts could be distributed instruction) or a second-year sequence (an and responses collected and rated without additional 150 hours of instruction) and are the risk of losing physical artifacts. Using assessed in their oral and writing proficiency. the LMS also enabled students to use a These data from the academic years 2009 system with which they were familiar and through 2014 academic years are provided comfortable, since most of the work in their in Tables 1–4, for a total of 3,310 students other courses was done on a computer with who completed the first-year sequence and a keyboard. Yet questions and concerns re- 1,166 students who completed the second- mained, specifically about student knowl- year sequence across 10 languages. edge of foreign characters and how to find To better understand the relationship them on a keyboard, about how much writ- between students’ writing performance, as ing students could reasonably produce in a measured by the WPA, and their oral per- standard examination format of 50 minutes, formance, as measured by the SOPI, data

TABLE 1 Writing Proficiency Ratings of 2,066 Learners in English-Cognate Languages After 150 Hours of Instruction, 2009–2014

Ratings After 150 Hours (in percentages) NH IL IM IH AL AM

French (N ¼ 440) 2 21 67 10 German (N ¼ 205) 9 45 36 Italian (N ¼ 378) 2 26 57 13 1 Portuguese (N ¼ 86) 1 48 29 13 9 Spanish (N ¼ 957) 1 16 57 24 2

Notes: NH ¼ Novice High, IL ¼ Intermediate Low, IM ¼ Intermediate Mid, IH ¼ Interme- diate High, AL ¼ Advanced Low, AM ¼ Advanced Mid. Foreign Language Annals VOL. 48, NO. 3 337

TABLE 2 Writing Proficiency Ratings of 1,244 Learners in Non-English-Cognate Languages After 150 Hours of Instruction, 2009–2014

Ratings After 150 Hours (in percentages) NH IL IM IH AL AM

Arabic (N ¼ 218) 9 77 13 Chinese (N ¼ 483) 34 58 3 Japanese (N ¼ 390) 16 49 17 2 Korean (N ¼ 63) 24 62 12 Russian (N ¼ 90) 30 45 5

Notes: NH ¼ Novice High, IL ¼ Intermediate Low, IM ¼ Intermediate Mid, IH ¼ Interme- diate High, AL ¼ Advanced Low, AM ¼ Advanced Mid.

generated by all students who were enrolled 0.91 across a number of studies (Clark & Li, in first- and second-year course sequences 1986; Kenyon & Tschirner, 2000; Shohamy during the 2013–2014 academic year were et al., 1989; Stansfield & Kenyon, 1992). In targeted for analysis. WPA and SOPI data a Language Center internal comparison of from 444 first-year students and 209 sec- SOPI ratings with Language Testing Inter- ond-year students in the 2013–2014 cohort national (LTI) ratings of telephonic OPIs group are reported below in the Findings for (N ¼ 156), the correlation was 0.85 (LTI Question 2. is the testing branch of the ACTFL). The Stanford SOPIs were assessed by certified OPI raters and testers, many of whom regu- Procedures larly test for LTI. Interrater reliability was WPAs were completed at the end of each calculated each year and ranged from 0.87 course from 2009 to 2014 along with SOPIs. to 0.99 across all languages. SOPIs were The correlation between the SOPI and the always administered first. Several class ses- OPI has been reported at between 0.85 and sions later, learners sat for the internal WPA

TABLE 3 Writing Proficiency Ratings of 680 Learners in English-Cognate Languages After 300 Hours of Instruction, 2009–2014

Ratings After 300 Hours (in percentages) NH IL IM IH AL AM

French (N ¼ 149) 1 16 54 23 6 German (N ¼ 14) 8 67 28 Italian (N ¼ 71) 8 33 48 18 Portuguese (N ¼ 51) 2 21 32 24 21 Spanish (N ¼ 395) 1 6 31 40 21

Notes: NH ¼ Novice High, IL ¼ Intermediate Low, IM ¼ Intermediate Mid, IH ¼ Interme- diate High, AL ¼ Advanced Low, AM ¼ Advanced Mid. 338 FALL 2015

TABLE 4 Writing Proficiency Ratings of 486 Learners in Non-English-Cognate Languages After 300 Hours of Instruction, 2009–2014

Ratings After 300 Hours (in percentages) NH IL IM IH AL AM

Arabic (N ¼ 113) 2 25 56 16 Chinese (N ¼ 155) 1 37 47 1 Japanese (N ¼ 138) 23 50 24 2 Korean (N ¼ 21) 23 68 9 Russian (N ¼ 59) 6 26 59 7 1

Notes: NH ¼ Novice High, IL ¼ Intermediate Low, IM ¼ Intermediate Mid, IH ¼ Interme- diate High, AL ¼ Advanced Low, AM ¼ Advanced Mid.

anchored in the ACTFL WPT. Each WPA were eliminated from these analyses due was assessed by two WPT certified raters, to a lack of precisely matched data. All nom- many of whom also test regularly for LTI. inal rater data were converted to numerical Interrater reliability was calculated each equivalents based on Dandonoli and Hen- year and ranged from 0.85 to 0.92. ning (1990) for statistical analysis. In their These oral and writing performances scheme, values range from 0.1 for a Novice were uploaded electronically within the dig- Low to 2.3 for Advanced Mid. Matched pair t ital language laboratory and delivered to tests were then conducted for each language. raters via the university’s course manage- To assess differences in handwriting and ment system. The certified OPI and WPA keyboarding, analyses of variance were con- raters were paid for their assessments via 1 ducted based on word counts across samples or 2 months of summer salary. Ratings were of French and Spanish learners who re- delivered to the Language Center for reli- sponded to identical prompts. ability analyses and for final data entry. When there were any discrepancies be- Findings tween raters, the lower rating was accepted. In the rare instances when there were dis- Question 1: What writing proficiency crepancies at the main proficiency level bor- ratings are learners able to achieve der, a third rater rerated the sample. across the levels of a two-year foreign language sequence? Are there Analyses differences based in English cognate Writing data across five administrations of vs. English noncognate languages? the WPA (2009–2014) were sorted and per- Table 1 displays data for writing perform- centages were calculated for each proficien- ances across five academic years of students cy level. To analyze the relationship between completing a first-year sequence in the En- oral and writing ratings, data were taken glish cognate languages of French, German, from a subsample of more than 600 partic- Italian, Portuguese, and Spanish ipants from the 2013–2014 academic year (N ¼ 2066), and Table 2 displays data gen- cohort for whom ratings on each assessment erated in English noncognate languages of (writing and oral) could be precisely Arabic, Chinese, Japanese, Korean, and matched. Data for Chinese and Korean Russian (N ¼ 1244). Tables 3 and 4 Foreign Language Annals VOL. 48, NO. 3 339

illustrate data from students completing distance were able to apply their ortho- second-year language sequences, grouped graphic background knowledge and achieve by English-cognate (Table 3; N ¼ 680) and aproficiency rating that was higher than English-noncognate languages (Table 4; those achieved in languages in which learn- N ¼ 486). Generally speaking, first-year stu- ers were required to learn not only the lan- dents learning languages that are cognates guage, but also the written script. The data with English achieved ratings that were also revealed some of the advantages that principally (55% on average) in the Inter- alphabetic languages, such as Arabic, have mediate Mid range. Learners of languages over languages written with characters, that are not cognate with English tended to such as Chinese. achieve an Intermediate Low rating (an average of 58%), with 77% of Arabic learners achieving this average. Second-year learners Question 2: What is the relationship (Tables 3 and 4) tended to cross at least one between foreign language learners’ sublevel when compared with first-year oral proficiency ratings and their learners. In the case of noncognate lan- writing proficiency ratings? Are there guages such as Chinese and Japanese, learners moved from Intermediate Low to Mid, differences in the relationship based while in Arabic, many learners moved two in English cognate and English sublevels, namely from Low to High. Learn- noncognate languages? ers in cognate languages such as French, In order to examine the relationships be- Spanish, and Italian frequently (around tween speaking and writing ratings, all pro- 25%) moved into the Advanced range at ficiency ratings, oral and written, from the the end of the second-year sequence. In 2013–2014 academic year (eight languages) summary, not surprisingly, learners who were converted into numerical ratings did not need to conquer an orthographic (Dandonoli & Henning, 1990). Table 5

TABLE 5 Relationship Between Speaking and Writing Ratings of 513 Learners Completing First- or Second-Year Sequence in One of Five English-Cognate Languages, 2013–2014

French N SOPI SD WPA SD rtTests Probability 150 hours 70 1.198 0.02 1.26 0.02 0.374 t(69) ¼ –3.11 p < 0.001 300 hours 46 1.51 0.09 1.624 0.09 0.714 t(45) ¼ –3.14 p < 0.001 German N SOPI SD WPA SD rtTests Probability 150 hours 36 1.244 0.01 1.436 0.06 0.44 t(35) ¼ –5.08 p < 0.001 300 hours 14 1.657 0.05 1.95 0.06 0.19 t(13) ¼ –3.5 p < 0.001 Italian N SOPI SD WPA SD rtTests Probability 150 hours 56 1.267 0.02 1.28 0.02 0.51 t(55) ¼ –0.66 p < 0.25 300 hours 8 1.88 0.08 1.95 0.03 0.81 t(7) ¼ –1 p < 0.17 Portuguese N SOPI SD WPA SD r t Tests Probability 150 hours 14 1.407 0.05 1.371 0.03 0.51 t(13) ¼ 1 p < 0.16 300 hours 22 1.83 0.05 1.786 0.1 0.81 t(21) ¼ 0.82 p < 0.21 Spanish N SOPI SD WPA SD r t Tests Probability 150 hours 167 1.19 0.02 1.57 0.09 0.45 t(166)¼ –18.2 p < 0.001 300 hours 80 1.68 0.11 2.19 0.03 0.27 t(79) ¼ –13.4 p < 0.001 340 FALL 2015

describes the relationships between learn- learner performances in speaking as com- ers’ oral proficiency ratings, as documented pared with writing (Portuguese, by SOPI ratings, and their WPA ratings. t(13) ¼ 1, p < 0.16 and t(21) ¼ 0.82, In Spanish, French, and German, in p < 0.21; Italian, t(55) ¼ –0.66, p < 0.25 both the first- and second-year programs, and t(7) ¼ –1, p < 0.17). A possible expla- writing performances were always statisti- nation for this phenomenon is that each cally significantly higher than the respective language program—Portuguese and Italian oral ratings (French, t(69) ¼ –3.11, —attracts a majority of students who are p < 0.001 and t(45) ¼ –3.14, p < 0.001; already familiar with a closely related lan- German, t(35) ¼ –5.08, p < 0.001 and t- guage (Spanish or French, respectively). (14) ¼ –3.5, p < 0.001; Spanish, t(166) ¼ Perhaps the consolidation of grammatical –18.2, p < 0.001 and t(79) ¼ –13.4, skills for this population in Portuguese p < 0.001). This finding was also consistent and Italian was more facile as compared with first-year Arabic and Japanese (Table 6) to learner processes in the other English- (Arabic, t(24) ¼ –4.71, p < 0.001 and Japa- cognate languages under investigation. nese, t(58) ¼ –4.44, p < 0.001), but not Table 6 indicates that first- and second- with second-year Arabic (t(10) ¼ –1.86, year Russian and second-year Japanese and p < 0.255) and Japanese (t(16) ¼ 1, Arabic were the outliers in the present data p < 0.166). The finding was generally un- set: There was no difference between writing surprising in that learners had more time to and speaking proficiency ratings in first-year compose and to correct in any writing as- Russian (t(16) ¼ –1.6, p < .06) and in sec- sessment than they did in their spontaneous ond-year Arabic (t(10) ¼ –1.86, p < 0.255) oral performances. This underlines the and Japanese (t(16) ¼ 1, p < 0.166). Admit- Byrnes et al. (2010) contention that writing tedly, the total number of subjects in these is an excellent measure of consolidated languages was much smaller than in the skills and can reveal language acquisition English-cognate languages examined, and in a fashion that an oral performance this lack of statistical power may have cannot. skewed the data. It is also possible that stu- Interestingly, Portuguese and Italian dents adopted a “write what I can say” strat- did not follow the pattern of other En- egy in these languages that use non-Roman glish-cognate languages. The data indicated orthographies, whereas English-speaking no statistically significant differences in learners of English cognate languages were

TABLE 6 Relationship Between Speaking and Writing Ratings of 140 Learners Completing First- or Second-Year Sequence in One of Three Non-English-Cognate Languages, 2013–2014

Arabic N SOPI SD WPA SD rtTests Probability 150 hours 25 0.844 0.10 1.12 0.02 0.42 t(24) ¼ –4.71 p < 0.001 300 hours 11 1.39 0.18 1.71 0.14 0.76 t(10) ¼ –1.86 p < 0.255 Japanese N SOPI SD WPA SD rtTests Probability 150 hours 59 0.98 0.05 1.11 0.03 0.49 t(58) ¼ –4.44 p < 0.001 300 hours 17 1.217 0.01 1.182 0.01 0.02 t(16) ¼ 1 p < 0.166 Russian N SOPI SD WPA SD rtTests Probability 150 hours 17 1.017 0.06 1.094 0.01 0.75 t(16) ¼ –1.6 p < 0.06 300 hours 11 1.418 0.06 1.263 0.01 0.42 t(10) ¼ 2.23 p < 0.02 Foreign Language Annals VOL. 48, NO. 3 341

more willing to utilize their first-language produced an average of 236 handwritten literacy knowledge to support and enhance words (243 and 229 words, respectively) their performances. That literacy knowledge and around 287 words (305 and 270 is just not as useful in the learning of non- words, respectively) while keyboarding. English cognate languages. A further oddity All differences were statistically significant within the data set is that second-year Rus- (Spanish, df(1,202) ¼ 24.12, p < 0.001 and sian and Japanese writing ratings were lower df(1,202) ¼ 32.72, p < 0.001; French, than the speaking ratings. df(1,79) ¼ 5.97, p < 0.02; the performance of first-year French learners responding to the 15-minute prompt was inconsistent, as Question 3: Does technology use they wrote less on the computer (174 influence foreign language writing words) than by hand (196 words), assessment? df(1,79) ¼ 5.487, p < 0.002). Table 8 dis- Given that the WPA had already been con- plays data generated by second-year learn- ducted via handwriting, data existed on ers of Spanish (N ¼ 82) and French how much written language learners were (N ¼ 32). Not surprisingly, second-year able to produce. The question became one learners produced more language than of comparing total production across iden- first-year learners of Spanish and French: tical prompts within a 15-minute time around 219 handwritten words (231 and frame (Prompt 1) or a 30-minute assess- 206 words, respectively) and 252 words ment (Prompt 2) in handwriting and via a (288 and 230 words, respectively) while computer in two languages (Spanish and keyboarding when responding to the Inter- French). mediate-level prompts, and around 298 Baseline data provided in Table 7 indi- handwritten words (314 and 282 words, cate that with Intermediate-level prompts, respectively) and 360 typed words (378 learners completing the first year of in- and 352 words, respectively) when re- struction in Spanish (N ¼ 204) and French sponding to the Advanced-level prompts. (N ¼ 81) produced approximately 190 While it may be possible to question the handwritten words (185 and 196, respec- importance of keyboarding for first-year tively, in Spanish and French) in 15 mi- learners with any level of prompt due to nutes and approximately 197 while their limited language proficiency, the ad- keyboarding (220 and 174 words, respec- vantage for more advanced learners is in- tively). With the more advanced prompt, disputable; using a computer almost always for which 30 minutes were allocated, first- provided statistically significant findings year learners of Spanish and French (Spanish, df(1,80) ¼ 18.16, p < 0.001 and

TABLE 7 Mean Number of Words Produced by Students Writing by Hand or Using a Computer Across Spanish and French Responding to Two Prompts After 150 Hours of Instruction, 2013–2014

Spanish Hand Computer p Prompt 1 185 220 df(1,202) ¼ 24.12 < 0.001 Prompt 2 243 305 df(1,202) ¼ 32.72 < 0.001 French Hand Computer p Prompt 1 196 174 df(1,79) ¼ 5.487 < 0.002 Prompt 2 229 270 df(1,79) ¼ 5.97 < 0.02 342 FALL 2015

TABLE 8 Mean Number of Words Produced by Students Writing by Hand or Using a Computer Across Spanish and French Responding to Two Prompts After 300 Hours of Instruction, 2013–2014

Spanish Hand Computer p Prompt 1 231 288 df(1,80) ¼ 18.16 < 0.001 Prompt 2 314 378 df(1,80) ¼ 7.96 < 0.01 French Hand Computer p Prompt 1 206 230 df(1,30) ¼ 2.18 < 0.15 Prompt 2 282 352 df(1,30) ¼ 4.66 < 0.04

df(1,80) ¼ 7.96, p < 0.01; French, Research Implications df(1,30) ¼ 4.66, p < 0.04). An outlier was A first question to pose is whether the data the French performance with Prompt 1. offered here are consistent or inconsistent Even though the subjects wrote more via with previous explorations of writing in for- computer than by hand, findings did eign language classrooms. The writing pro- not reach significance (df(1,30) ¼ 2.18, ficiency tests conducted by Brown et al. p < 0.15). Allowing more advanced learners (2009) differed in both structure and to keyboard afforded them the opportunity prompts, yet the present findings are consis- to produce significantly more language and tent in that their subjects were able to cross at hence resulted in richer samples. least one sublevel over the course of the semester. Brown et al. (2011) utilized a simi- Discussion lar method in a course for third- and fourth- This article provides baseline data from year Russian students and found students more than 4,000 writing samples across moving at least one sublevel up in written 10 languages (Tables 1–4), data from a sub- proficiency over the course of the semester, sample of more than 600 learners’ written as measured by pre- and post-WPTs, also and oral proficiencies in eight languages consistent with the present study. The sec- (Tables 5–6), and analyses of assessments ond-year German students in Dodds’s study of learners’ handwritten and keyboarding (1997) entered with Intermediate Low or production in Spanish and French (Tables Intermediate Mid writing proficiency and 7–8). This extensive data set demonstrates exited with the ability “to achieve Advanced how postsecondary learners at this institu- level proficiency at least some of the time” (p. tion developed their writing proficiency as 143), also consistent with the present study. measured against ACTFL writing proficien- Henry’s(1996)findings were also generally cy criteria and provides a platform for gaug- consistent with the findings from this inves- ing the achievement of learners across tigation. Rifkin (2005) conducted his study different institutions and program configu- over 3 years at the Middlebury Russian rations in writing over 2 years of instruc- school. Overall, the present data reflect the tion. These data also open an array of results of Rifkin’s study, in that students in research questions that may lead to produc- Russian achieved the same writing proficien- tive exploration. Of equal importance are cy levels after 150 and 300 hours of study and the curricular and pedagogical implications that first-year Russian students in the present for including writing instruction and profi- study did not have different speaking and ciency-based assessments in the foreign lan- writing scores either. On the other hand, guage curriculum. this comparison also outlines the outlier Foreign Language Annals VOL. 48, NO. 3 343

character of second-year Russian students of an English noncognate language (Arabic, from this sample, who had writing ratings Chinese, Japanese, Korean, Russian) were, on that were lower than their speaking ratings. the whole, in the Intermediate Low to Inter- The data from the present investigation are mediate Mid range. Their performances indi- somewhat consistent with those of I. cate that they were writing at the sentence Thompson (1996), also focused on Russian, level with little emergent discourse structure. who found that first-year students gained a More advanced students, having completed a median writing proficiency score of Novice second year or a total of 300 hours of instruc- High—one sublevel lower than the students tion, were Intermediate High to Advanced in this data set—with second-year students among English cognate language learners, reaching a median writing score of Interme- meaning they were capable of composing diate Mid, which is consistent with the pres- structured paragraphs using a relatively com- ent data set. Comparing the median of the plete grammatical arsenal. Those at the same spoken and written proficiency scores in her level in the English noncognate languages study shows an inconsistency with the re- were generally Intermediate Mid, indicating sults offered within the context of this arti- that their ability to write in paragraphs was cle: Thompson’s students scored higher in just emerging. Generally speaking, the sec- written proficiency in both their first and ond-year students were approaching the second years of study of a non-English-cog- writing criterion that is expected of a number nate language. of professions such as secondary school Despite some inconsistencies, all of the teaching and some bilingual secretarial data across multiple studies point toward a work. More research is required to refine steady growth in writing proficiency related the target-level descriptors and to understand to the total amount of time spent in instruc- more thoroughly whether topic, for example, tion. The level of growth clearly varies, influences writing performance (see Cox, though not remarkably so, across an array Brown, & Burdis, 2015). of institutions. Further research is obvious- The data also contribute to our under- ly critical to determine the amount of standing of the role of English language growth across different institutional and literacy in the learning of writing in En- instructional configurations: forthcoming glish-cognate languages as well as alphabet- studies need to examine class size in rela- ic languages, as compared with languages tion to writing proficiency outcomes as well that utilize character systems. The baseline as level of instructor professional develop- data suggest that, indeed, English-speaking ment. In a multivariate world, research learners have an advantage in developing must approach issues in foreign language proficiency in writing in English-cognate learning and instruction in a multivariate languages over English-speaking students manner in order to uncover optimal combi- who are learning noncognate languages. Al- nations of factors that lead to student suc- though this may be intuitively obvious, un- cess and instructor satisfaction. derstanding the nature of the advantage is A more specific area for research is in- the critical insight. The advantage appears vestigating writing proficiency attainment to be at least a sublevel (Low to Mid or Mid between and among languages. Approxi- to High) on the ACTFL rating scale. The mately 75% of the learners completing their extent to which this research finding holds first year of an English cognate language across learners in other academic contexts (French, German, Italian, Portuguese, Span- and whether their level of English language ish) were in the Intermediate Mid to Inter- literacy has an impact on writing rating are mediate High range. This indicates that most important areas for further investigation. were emergent paragraph-level writers, thus Factor analytic research designs will lend evidencing ability well beyond lists or isolat- themselves to productive explorations in ed sentences. Learners completing a first year this arena. 344 FALL 2015

An additional area for research is con- work that examined Advanced-level writers. tinuing to understand the relationship be- Are the data generated with first- and sec- tween oral and writing performance. As ond-year learners consistent with the ex- noted in the literature review, when Hubert pectations expressed by researchers (2013) compared oral and writing proficien- examining more advanced learners? In other cy in Spanish for students enrolled in sec- words, is there a potential gap between what ond-year and third-year courses, he found is conventionally defined in the profession that “speaking and writing proficiencies ap- as early language learning vs. the knowledge peared to rise at fairly similar rates as learners and skills that learners are expected to ac- passed from beginning through intermediate quire in more advanced courses? Exploring to advanced levels of Spanish study” (p. 92). such questions will provide the profession This finding is inconsistent with the present with a critical research base that will inform data collection. The discrepancy should be more nuanced curriculum development. investigated. Hubert’s ratings were between Finally, technology and its impact on Novice Mid and Intermediate High, different writing performance need to be probed in ratings from those collected in the present depth. While the data collected within this study, and those ratings may account for the investigation permitted no ancillary assis- difference. Indeed, the current study did tance in the writing process, that lack of indicate a positive relationship between the assistance is fundamentally artificial. Con- oral and writing performance, yet that rela- temporary writers almost inevitably use tionship within this database was not strong. electronic assistance. The foreign language Writing proficiency in the English cognate profession needs to understand thoroughly languages was almost always higher than the implications of various kinds of assis- oral proficiency within the present database, tance for foreign language writers. Does but not exclusively. Again, numbers of sub- enabling writers to employ outside assis- jects, the nature of the relationship between tance enhance their performance? What is and among languages, and the dedication to the difference between generating a sponta- the writing process within postsecondary neous writing performance without assis- curricula probably are influential. These as- tance and a planned writing performance sociations should be explored in greater with permitted assistance? depth to fundamentally understand how spoken and written language proficiencies are linked. Interestingly, in the direct com- Pedagogical Implications parisons between speaking and writing, The descriptive and inferential data have im- writing ratings were statistically significantly plications for developing a broad understand- higher in most cases, but not all, at both the ing of student foreign language writing 150-hour and 300-hour levels of instruction. performance in basic language sequences. It appears that in many cases, as learners Historically, writing has been viewed instruc- became more knowledgeable and comfort- tionally as a vehicle for assessing grammatical able in the language, that relationship be- performance. In other words, students have tween the two modes widened. Toward the traditionally been asked to “write a composi- upper proficiency ranges, vocabulary and tion” and then have received feedback about syntax changed, making written language both content and grammar as separate enti- more nuanced and complex and far less ties. Generally speaking, the grammar score like oral language. The data collected here is the more highly weighted component of provide the profession with a view on the assessment. However, a grammar score how learners cope with that additional does not communicate the full array of what a complexity. learner can do with writing and the extent to The data offered here should be inter- which a learner can communicate ideas in preted in light of the Byrnes et al. (2010) written language. A proficiency orientation Foreign Language Annals VOL. 48, NO. 3 345

calls for a holistic perspective that examines development of expectations and specifica- features of content, function, accuracy, and tions for classroom tasks and assessments text type as critical components. Using the that target discrete elements of written lan- gauge of the ACTFL guidelines for writing guage such as discourse type, function, proficiency offers a more complex lens grammar, and cohesion. Similarly, the de- through which instructors can view their velopment of higher-stakes tests, such as students’ performance. Furthermore, basing midterm and final examinations, where assessment tasks and rubrics on the guide- learners often lament that they do not lines assists in realistic goal setting. The have enough time, can benefit from an un- assessment development process, the assess- derstanding of these baseline data. Implicit ments themselves, and the data provide a in these data is the development of better window into how much this sample of learn- understanding on the part of instructors of ers was able to accomplish across multiple how to construct authentic and research- languages. based assessments that reflect the holistic These data also encourage instructors to nature of writing. think about how they consider their stu- In addition, the data imply that instruc- dents’ collective language performances: tors need to understand that, at the end of a Learners’ ability to communicate messages course or instructional sequence, students orally, for example, may not necessarily be at will demonstrate differing levels of knowledge the same level as their ability to complete and skills, and that instructors must expect similar tasks in writing. In fact, students’ that students will receive a range of proficien- writing ability in the English cognate lan- cy ratings. While the data offered here cluster guages was almost always higher than their around the Intermediate to Advanced ranges, speaking ability within this investigation. there were nevertheless some performances Explanation for this finding must lie in the in the Novice range. When examining curric- fact that writing is a planned performance, ula, it is important for programs to discuss the even when it is impromptu, and this context extent to which word-level writing perform- enables learners to have more thinking time ances are acceptable either at the end of a first- in contrast to an impromptu speaking per- year sequence for the English cognate formance. Note that for English noncognate languages or at the end of a second-year se- languages, instructors need to recognize that quence in the English noncognate languages. their learners may be more adept at speaking The data also offer an opportunity for com- than at writing because writing places extra paring and thus better understanding individ- cognitive burdens on them—not only con- ual students’ performance. tent, but the written form itself. These find- The data offered in this study also pro- ings may help instructors develop curricula vide support for learners’ use of keyboard- that can reflect the realities of individual ing from the beginning of instruction: The languages, enabling a more sensitive view data indicate that learners almost always of the distribution of instructional time. Per- provided a larger sample of language haps in the English cognate languages, when they were permitted to type. Longer homework assignments are sufficient at samples of written language provide in- helping students to learn to produce written structors with more data from which to language. However, based on these data, it make sound instructional and assessment appears that English noncognate language judgments. In addition, instructors need curricula may need to allocate more instruc- to understand that time spent teaching key- tional time in class to written production. boarding—e.g., how to locate an accent The data also give some indication of mark—is not time wasted on clerical skills. how much written language learners can Rather, foreign language instructors need to produce in limited amounts of time. Such take the responsibility of understanding dif- information may help gauge the ferent language keyboards and of 346 FALL 2015

instructing students in the use of these key- both time and money, yet is critical toward boards, both to facilitate the production of establishing the credibility of the WPA. written work at their home university and to In addition, a number of contextual better understand keyboarding as a cultural factors may have influenced the distribution component in instruction when studying of proficiency ratings that are reported here. abroad. Providing a window into the type Because of the institutional context and ad- and level of technology used by members of mission requirement, the learners them- the culture students are learning is critical. selves were likely to be, on average, more Finally, this investigation implies a verbally astute than many postsecondary need for a particular kind and level of pro- learners who presented a more moderate fessional development. The writing prompts range of native language abilities. When and ratings used throughout this investiga- class size was limited to 15 students, in- tion were generated by certified WPT raters, structors may also have been able to pay who were also perforce OPI certified. These additional, special attention to learners’ lit- raters were able to bring an expert level of eracy development. Finally, since all of the professional knowledge to the project and to instructors were WPT certified or were in program development, curriculum design, the process of becoming WPT certified, in- instruction, and assessment. Instructors structors’ level of professional knowledge with this level of knowledge display confi- must be taken into consideration when con- dence in their ratings and in their learners. sidering the findings from this study. Such instructors also display confidence in Renewing writing prompts is also a con- each other. The data across cognate and cern that is worthy of consideration. First, noncognate languages were astonishingly when designing a template that can accom- similar. Indeed, time in instruction was dif- modate a variety of languages, participation, ferent (1 year as opposed to 2 years), but a input, and a commitment of time from in- similar progression occurred across the structors themselves are crucial. A designated wide array of languages investigated. These —and rotating when possible—team of in- types of data enable instructors to see that structors who are trained in like fashion ostensible language difficulty for English- becomes important in generating new items speaking learners (e.g., Arabic learning vs. and potential contexts. Ensuring that all lan- German learning) is not as critical as allo- guages are represented—noncognate and cated—and engaged—time. Coming to this cognate alike—at the time of writing prompt kind of understanding of written language development is important both to sustain development enables instructors to emerge buy-in to the assessment process and to avoid from linguistic silos and brings them into a disadvantaging or, conversely, privileging more collaborative form of collegial interac- certain languages. In addition, like any assess- tion across all languages. ment that is given on a regular basis, and most particularly high-stakes, end-of-course assessments, maintaining test security while Caveats and Concerns avoiding recycling the same core writing A critical step in designing any program in tasks and topics is critical, particularly writing assessment is to validate a protocol when the tasks and contexts are predictable, that is convenient and cost-effective. In the especially at the Novice and Intermediate lev- present case, this means examining the WPA el. Since the use of familiar or semi-familiar in light of its relationship to the WPT. This writing contexts supports lower-proficiency validation must take the form of having learn- writers in producing as much language as ers sit for each exam in close time proximity possible, the contexts of the writing prompts to probe whether they are awarded the same, must be general enough to be accessible to a or virtually identical, ratings. This classic val- broad range of students, yet close enough to idation process is, of course, costly in terms of students’ experiences to stimulate them to Foreign Language Annals VOL. 48, NO. 3 347

write, thus making topic and task selection References fi more dif cult and deserving of attention. Armstrong, K. M. (2010). Fluency, accuracy, and complexity in graded and ungraded writ- – Conclusion ing. Foreign Language Annals, 43, 690 702. This article outlines the staff and assessment Bernhardt, E. (1997). Victim narratives or development procedures that resulted in the victimizing narratives? Discussions of the re- ’ invention of language departments and lan- creation of an assessment of students writ- guage programs. ADFL Bulletin, 29,13–19. ing that was aligned with the ACTFL guidelines and consistent with WPT training and Bernhardt, E. (2009). Systemic and systematic fi assessment as a keystone for language and certi cation processes. The assessments literature programs. ADFL Bulletin, 40,14–19. used protocols that replicated those used fi Bernhardt, E. (2011). Understand advanced sec- by certi ed WPT trainers. In addition, the ond-language reading. New York: Routledge. study reports data that were gathered over a 5-year period using those assessments from Bernhardt, E., & Brillantes, M. (2014). The development, management and costs of a more than 4,000 samples of undergraduate large-scale foreign language assessment pro- writing after 1 year (150 hours) and more gram. In N. Mills & J. Norris (Eds.), Innovation than 1,000 samples after 2 years (300 hours) and accountability in language program evalua- – of language instruction in French, Italian, tion (pp. 41 61). Boston: Heinle & Heinle. Spanish, German, Portuguese, Arabic, Chi- Bernhardt, E., Valdes, G., & Miano, A. (2009). nese, Japanese, Korean, and Russian. The A chronicle of standards-based curricular re- data provide insight into foreign language form in a research university. In V. Scott (Ed.), ’ Principles and practices of the standards in col- learners productive capacities in writing lege foreign language education (pp. 54–85). across languages. Thanks to SOPI ratings Boston: Heinle & Heinle. for more than 600 learners in a cohort Breiner-Sanders, K., Lowe, P., Miles, J., & group, the article also illustrates how those Swender, E. (2000). ACTFL proficiency writing ratings related to students’ oral rat- guidelines: Speaking, revised 1999. Foreign ings. In addition, the data support the ad- Language Annals, 33,13–18. vantages offered by allowing students to Breiner-Sanders, K., Swender, E., & Terry, R. prepare and submit their assessments by (2002). Preliminary proficiency guidelines– computer as well as the ease and security Writing revised 2001. Foreign Language An- – that are provided when assessments are nals, 35,9 15. managed using secure, electronic submis- Brown, N. A., Brown, J., & Eggett, D. L. (2009). sion and rating processes. Moreover, the Making rapid gains in second language writing: data provide insight into curriculum devel- A case study of a third-year Russian language course. Foreign Language Annals, 42, 424–452. opment: Having substantive information helps set expectations for students across Brown, N. A., Solovieva, R. V., & Eggett, D. L. (2011). Qualitative and quantitative measures language and levels and allows instructors of second language writing: Potential outcomes to set appropriate student learning out- of informal target language learning abroad. comes and design cohesive learning experi- Foreign Language Annals, 44, 105–121. ences. Further, the overall assessment Byrnes, H., Maxim, H. H., & Norris, J. M. project demonstrates the impact of an ex- (2010). Realizing advanced foreign language tensive and ongoing professional training writing development in collegiate education: ’ Curricular design, pedagogy, assessment. Mod- program for instructors: Instructors deep – and consistent level of knowledge of both ern Language Journal, 94 [Supplement], 1 235. the oral and writing proficiency guidelines Clark, J. L., & Li, Y. C. (1986). Development, fi as well as their formal OPI and WPT train- validation, and dissemination of a pro ciency-based test of speaking ability in Chinese and an associated ing underpinned the development of the assessment model for other less commonly taught writing protocol and its use across lan- languages. Washington, DC: Center for Applied guages and instructional levels. Linguistics. 348 FALL 2015

Clifford, R., & Cox, T. L. (2013). Empirical Malone, M. (2000). Simulated oral proficiency validation of reading proficiency guidelines. interviews: Recent developments [Online re- Foreign Language Annals, 46,45–61. source digest]. Retrieved July 10, 2013, from http://www.cal.org/resources/digest/ Cox, T., Bown, J., & Burdis, J. (2015). 0014sumulated.html Exploring proficiency-based vs. performance-based items with elicited imitation National Standards in Foreign Language Edu- assessment. Foreign Language Annals, cation Project. (2006). Standards for foreign doi:10.1111/flan.12152 language learning: Preparing for the 21st centu- ry. Yonkers, NY: ACTFL. Cox, T. L., & Clifford, R. (2014). Empirical validation of listening proficiency guidelines. Omaggio, A. (1986). Teaching language in con- Foreign Language Annals, 47, 379–403. text: Proficiency-oriented instruction.Boston: Heinle & Heinle. Dandonoli, P., & Henning, G. (1990). An investigation of the construct validity of the Reichelt, M. (1999). Toward a more compre- ACTFL Proficiency Guidelines and oral inter- hensive view of L2 writing: Foreign language view procedure. Foreign Language Annals, 23, writing in the U.S. Journal of Second Language 11–21. Writing, 8, 181–204. Dodds, D. (1997). Using film to build writing Rifkin, B. (2005). A ceiling effect in traditional proficiency in a second-year language class. classroom foreign language instruction: Data Foreign Language Annals, 30, 140–147. from Russian. Modern Language Journal, 89, 3–18. Glisan, E. W., Swender, E., & Surface, E. A. fi Shohamy, E., Gordon, C., Kenyon, D. M., & (2013). Oral pro ciency standards and foreign fi language teacher candidates: Current findings Stans eld, C. W. (1989). The development and validation of a semi-direct test for assess- and future research directions. Foreign Lan- fi guage Annals, 46, 264–289. ing oral pro ciency in Hebrew. Bulletin of Hebrew Higher Education, 4,4–9. Godfrey, L., Treacy, C., & Tarone, E. (2014). Stanford University Board of Trustees. (1994). Change in French second language writing in Report of the commission on undergraduate ed- study abroad and domestic contexts. Foreign ucation. Stanford, CA: Stanford University. Language Annals, 47,48–65. Stansfield, C. W., & Kenyon, D. M. (1992). Hedgcock, J. (2005). Taking stock of research The development and validation of a simulat- and pedagogy in L2 writing. In E. Hinkel ed oral proficiency interview. Modern Lan- (Ed.), Handbook of research in second language guage Journal, 76, 129–141. teaching and learning (pp. 597–613). Mahwah, NJ: Erlbaum. Swender, E. (2003). Oral proficiency testing in the real world: Answers to frequently asked ques- Henry, K. (1996). Early L2 writing develop- tions. Foreign Language Annals, 36,520–526. ment: A study of autobiographical essays by university-level students of Russian. Modern Thompson, I. (1996). Assessing foreign lan- Language Journal, 80, 309–326. guage skills: Data from Russian. Modern Lan- guage Journal, 80,47–65. Hubert, M. D. (2013). The development of speaking and writing proficiencies in the Thompson, R. J. Jr., Walter, I., Tufts, C., Lee, Spanish language classroom: A case study. K. C., Paredes, L., Fellin, L., et al. (2014). Foreign Language Annals, 46,88–95. Development and assessment of the effective- ness of an undergraduate general education Kenyon, D., & Tschirner, E. (2000). The rat- foreign language requirement. Foreign Lan- ing of direct and semi-direct oral proficiency guage Annals, 47, 653–668. interviews: Comparing performance at lower proficiency levels. Modern Language Journal, Valdes, G., Haro, P., & Echevarriarza, M. P. 84,85–101. (1992). The development of writing abilities in a foreign language: Contributions toward a Leki, I. (1995). Coping strategies of ESL stu- general theory of L2 writing. Modern Language dents in writing tasks across the curriculum. Journal, 76, 333–352. TESOL Quarterly, 29, 235–260. Leki, I., & Carson, J. (1997). “Completely ” different worlds : EAP and the writing expe- Submitted May 12, 2015 riences of ESL students in university courses. TESOL Quarterly, 31,39–69. Accepted June 17, 2015 Foreign Language Annals VOL. 48, NO. 3 349

APPENDIX

Sample Prompts From 2014 WPA Short Form You and your family will be hosting an exchange student from [place] this summer. The student wants to know about your hometown and the surrounding area, and some of the things to see or do while s/he is there. Write an e-mail in [language] to this student in which you:

1. Briefly describe your town (or neighborhood), its location, geography, attractions, etc. 2. Describe, in one or two paragraphs, a local event or tradition typical of your community, for example, a celebration, festival, social or religious practice, etc. Compare it with an event or tradition that may be similar to one where the exchange student is from. 3. Ask four or five questions to find out more about the student in order to plan for her/his arrival.

Note: Be sure to include an appropriate greeting, introduction, and closing in your message. Suggested length: 2–3 paragraphs Suggested time: 20–25 minutes

Long Form (includes Short Form) Imagine that you have been asked to contribute a short article to a [language] blog. The blog has recently published a series of articles on the presence of individual and team sports within American universities. You have been asked to write a short essay in [language] that focuses on the role that organized sports play in campus life. In your essay, you should:

1. First, give a snapshot description of the issue from your perspective as a Stanford student. For example, how prevalent are sports on campus? Does participation in a team sport change the college experience for those students? Second, brieﬂy compare this with another campus organization you feel is of equal importance, e.g., sorority or fraternity, student government, creative arts, professional club, etc. 2. Now recount a speciﬁc past experience or event that you observed or heard about (or in which you yourself participated), relating to a sport or other campus group. Describe in detail what happened and how this event illustrated the relationship of the particular organization to campus life. 3. Finally, present your opinion on what you think the role of sports should be within a university setting. For example, is it essential to developing school spirit and community, or could this be accomplished through an alternate structure? To what degree should universities support organizations more closely related to academics? If you were a campus administrator, what changes would you make to the current balance between sports and academics on campus?

Suggested length: 3–4 paragraphs Suggested time: 30 minutes