A Cross-Genre Analysis of Chinese EFL Learners' Writing Performance Journal of Second Lang

Journal of Second Language Writing 33 (2016) 3–17

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/seclan

Same language, different functions: A cross-genre analysis of Chinese EFL learners’ writing performance

Wenjuan Qina,*, Paola[172_TD$IF] Uccellib a Graduate School of Education, Harvard University, Larsen Hall 303, 14 Appian Way, Cambridge, MA 02138, United States b Graduate School of Education, Harvard University, Larsen Hall 320, 14 Appian Way, Cambridge, MA 02138, United States

ARTICLE INFO ABSTRACT

Article history: Received 20 May 2015 Secondary-school learners of English as a foreign language (EFL) in China constitute a Received in revised form 24 May 2016 rapidly growing yet understudied population. This study examined Chinese secondary- Accepted 1 June 2016 school EFL learners’ writing performance in two genres, argumentative essays and Available online xxx narratives. Crossgenre research on native language writing has documented that, at the macro-text level, secondary-school students’ narratives are typically of higher quality than Keywords: their written essays; while at the lexico-syntactic level, essays display higher complexity Chinese EFL learners than narratives. To investigate cross-genre differences in EFL learners, 200 English written EFL writing texts (100 essays; 100 narratives) were collected from 100 EFL Chinese secondary school School genre learners and scored for quality, lexico-syntactic, and genrespecific discourse features. Narrative writing fi Argumentative writing Unlike prior research on native language writing, no signi cant differences in quality Sociocultural pragmatics ratings were found across the two genres. However, in line with prior research, results revealed that argumentative essays displayed a higher lexico-syntactic complexity. Regression analyses identified distinct sets of predictors of writing quality ratings for each genre. Controlling for length, lexico-syntactic complexity and diversity of organizational markers were identified as predictors of argumentative essay quality. Conversely, controlling for length, narrative quality was only predicted by the frequency of stance markers. Results are discussed in relation to pedagogical implications and directions for future research. ã 2016 Elsevier Inc. All rights reserved.

Since its ﬁrst appearance in the literature in the 1980s (Swales, 1981; Tarone, Dwyer, Gillette, & Icke, 1981), the notion of genre and its application to language teaching and learning has attracted increasing attention in writing studies, and particularly in second or foreign language writing (Tardy, 2006, 2011). Genre refers to conventionalized or socially recognized ways of using language within certain discourse communities (Swales, 1981). In writing across genres, English- as-foreign-language (EFL) writers face the dual challenge of recognizing privileged forms of discourse within a communicative context and at the same time deploying their available linguistic resources to serve that function appropriately. The last decade has seen signiﬁcant developments in genre-based pedagogies designed to support second language writers’ understanding of how language is structured to achieve social purposes (Gebhard & Harman, 2011; Hyland, 2007; Johns, 2011; Swales,1990). However, the majority of empirical studies evaluating EFL writing focused solely on a single genre – argumentative writing in academic context – without examining how EFL writers perform across genres. Moreover,

* Corresponding author. E-mail addresses: [email protected] (W. Qin), [email protected] (P. Uccelli). http://dx.doi.org/10.1016/j.jslw.2016.06.001 1060-3743/ã 2016 Elsevier Inc. All rights reserved. 4 W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17 considerable EFL writing research has focused on the post-secondary years, with only minimal research devoted to understanding secondary school students’ performances. The current study investigated secondary school Chinese EFL students’ writing performance in two different genres: narrative and argumentation. Writing performance is defined by two major constructs: 1) holistic writing quality ratings evaluated by experienced EFL teachers; 2) text-based lexical, syntactic, and discourse features. By comparing writing performance across narrative and argumentative genres, this study seeks to reveal differences in performance that result from the genre-specific language demands faced by these learners. Two main questions drove this study. First, do EFL learners’ overall writing quality ratings and/or lexico-syntactic features vary by genre? Second, which lexico-syntactic and discourse features predict overall writing quality ratings within each genre? The ultimate goal of this line of research is to generate findings that would inform the design of pedagogical approaches that will be specially attuned to the needs of Chinese EFL students as they learn to participate in two prevalent discourse genres: narrative production and evidence-based argumentative writing.

1. Theoretical framework: a sociocultural pragmatics-based view

A sociocultural pragmatics-based view of language development understands language learning as the result of individuals’ socialization and enculturation histories (Halliday, Matthiessen, & Matthiessen, 2014; Ninio & Snow,1996; Ochs, 1993; Snow & Uccelli, 2009; Uccelli et al., 2015). This view of language entails that being a skilled language user in some social contexts does not guarantee adequate performance in other social contexts. From a pragmatics-based developmental perspective, learners are first enculturated at home into the language of face-to-face interaction, which typically prepares them for colloquial conversations in their respective communities (Heath, 1983, 2012; Ochs, 1993). However, being able to successfully participate in academic discourses has been documented as a challenging task for many colloquially fluent monolingual or bilingual students with scarce opportunities to be socialized into more academic discourse (Schleppegrell, 2004; Snow & Uccelli, 2009; Uccelli et al., 2015). In the context of his research with heritage language speakers growing up in English-dominant countries, Cummins proposed the well-known distinction between Basic Interpersonal Communicative Skill (BICS) and Cognitive Academic Language Proficiency (CALP). This distinction has triggered substantive research that documents both the more challenging nature of CALP and also the often unsupportive instructional conditions in which it is expected to be mastered (Cummins, 1980, 1991). Writing is a highly complex task that integrates cognitive processing, deployment of linguistic knowledge, and awareness of the social context in which the written communication takes place (Bereiter & Scardamalia, 2013; Flower & Hayes, 1981; Gee, 2001; Graham & Perin, 2007). The performance on writing is expected to be influenced by the writer’s ability to flexibly use a variety of language forms and functions that are attuned to different communicative contexts, specific audiences, and purposes (Hyland, 2009; Ravid & Tolchinsky, 2002; Schleppegrell, 2002). Certain types of texts are believed to be linguistically and cognitively more demanding than others and the frequency of exposure to different communicative contexts differs remarkably across individuals. Consequently, we anticipate that EFL learners might be able to excel at writing in one genre but not in another. The present study was conducted to advance our current understanding of EFL writing by addressing two research gaps. First, this study focuses on secondary school EFL learners, an understudied population in L2 writing research. Second, informed by a pragmatics-based view of language in which writers are understood as not necessarily equally skilled across all uses of language, this study compares EFL students’ writing performance across genres. Adolescent second language learners represent the largest and fastest growing population in international settings, yet research that documents their needs in writing development to inform instruction is scarce (Leki, Cumming, & Silva, 2008; Matsuda & De Pew, 2002; Ortmeier-Hooper & Enright, 2011). Though recent studies have investigated adolescent L2 writing development in some English-dominant contexts (Enright & Gilliland, 2011; Fránquiz & Salinas, 2011), in EFL settings, adolescents continue to be an understudied population. In China, more specifically, the majority of recent empirical studies focused on undergraduate or graduate learners (Li & Wharton, 2012; Liardet, 2013; Liu, 2013; Miao & Lei, 2008; Qin & Karabacak, 2010). Additionally, many genre-based studies using L2 corpus focused on advanced academic discourse such as academic research articles (Marco, 2000; Swales et al.,1998) or writing across academic disciplines (Hardy & Römer, 2013). Yet, less is understood about EFL adolescent learners’ writing and how their skills might vary across argumentative and narrative genres.

2. Writing performance across two genres

Narrative and argumentative texts are two distinct discourse genres deﬁned by different communicative functions (Berman, 2008; Grabe, 2002; Paltridge, 2001). The progress in mastering new genres in one’s native language has been characterized by Martin (1989) and Schleppegrell (2004) as moving progressively across three categories: (1) personal genres (e.g., narratives); (2) factual genres (e.g., procedures); and (3) analytic genres (e.g., argumentative essays). Recent empirical data on native language writing development support this developmental sequence, showing that while written narrative structures tend to be well mastered by around age ten, argumentative writing constitutes a later developmental accomplishment (Berman & Nir-Sagiv, 2007). A number of empirical studies examining writing produced by middle and high school students further suggest that adolescents are typically able to produce higher-quality narratives than argumentative essays (Crowhurst, 1980, 1990; Engelhard, Gordon, & Gabrielson, 1992; Hall-Mills & Apel, 2013; Reed, 1992; Scott & Windsor, W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17 5

2000). However, all of these studies investigated native English writing. To our knowledge, to date no similar contrastive studies have been conducted for EFL adolescent learners. This documented developmental sequence can be explained in relation to the different linguistic and cognitive demands posed by each genre. Narratives are agent-oriented, that is, they focus on people, their actions, and the unfolding of events in a temporal order that tends to mimic real world events (Berman & Slobin, 2013). Argumentative essays are, instead, topic- oriented, require the writer to impose a logical structure to interrelate ideas in a coherent manner, and to organize claims and arguments in a stepwise hierarchical format (Grabe, 2002). Apart from the different macro-level organization, the two genres also vary in their micro-level linguistic features. At the lexical level, research on monolingual students has reported that argumentative texts contain a higher proportion of structurally complex, semantically abstract and low-frequency vocabulary items than narratives. At the syntactic level, compared to written narratives, argumentative texts produced by native language writers tend to display more complex structures (Beers & Nagy, 2009; Berman & Nir-Sagiv, 2007; Ravid and[173_TD$IF] Berman, 2010). Though few L2 writing studies focused on writing across genres, longitudinal studies have shed light on the development of lexical and syntactic complexity in L2 writing over time in a single genre. In studying a common corpus of L2 written descriptions produced by students in a semester-long ESL writing course at a U.S. university, research has shown that growth in L2 syntactic complexity occurred as a function of time spent learning English (Crossley & McNamara, 2014). Using the same corpus, Bulté and Housen (2014) also found increases in measures of syntactic complexity at all levels of structure (phrasal, clausal, and sentential), but no significant change in measures of lexical complexity (e.g., lexical diversity/richness). Interestingly, in this study, the only syntactic measure that did not demonstrate significant change over time was frequency of subordinate clauses. This finding might echo a critical evaluation on this measure done by Biber, Gray, and Poonpon (2011), showing that most clausal subordination measures are actually more effective in discriminating language proficiency in spoken rather than in writing registers. In light of the cross-genre lexical and syntactic differences documented in research on native language writers, and a lack of knowledge in cross-genre performance among EFL writers, this study will first explore if Chinese EFL learners’ writings in two genres achieve different levels of quality, as rated by experienced EFL teachers. Additionally, guided by the inconclusive argument regarding the best-fitted lexical-syntactic measures within and across genres, various lexical and syntactic measures – commonly used in both native language and L2 writing research – will be used to examine how lexico-syntactic features vary by genre.

3. Predictors of writing quality ratings within each genre

Unpacking the predictive relationship between certain text-based features and overall writing quality rated by teachers, research on both native English speakers and EFL learners has identiﬁed lexical, syntactic, and discourse features that are signiﬁcantly associated with overall writing quality ratings. However, as reviewed below, this line of research has mostly been conducted with post-secondary students, with only a few studies focused on secondary school students. Besides, most of them examine only argumentative essays with minimal attention to narratives.

3.1. Research on native English writing

Research on native English speakers, at both post-secondary and secondary level, has been inconclusive on which lexico- syntactic or discourse factors are predictive of overall quality ratings of argumentative and narrative writing. For instance, using an automated text analysis tool (Coh-Metrix), McNamara, Crossley, and McCarthy (2009) found that the three most predictive indices of essay quality were syntactic complexity, lexical diversity and word frequency, but none of the cohesion indices correlated with essay ratings. However, a later study conducted by the same group of researchers on post-secondary freshmen’s writing revealed that essays rated as higher-quality were characterized by not only lexical diversity, but also more cohesive textual information (Crossley, Roscoe, & McNamara, 2011). At the secondary school level, Connor (1990) analyzed argumentative essays written by high school students and found that higher rated essays used sophisticated syntactic factors, including nominalizations, propositions, passives and specific conjuncts. On the other hand, Uccelli, Dobbs, and Scott (2013) and Dobbs (2014) identified additional discourse-level components that are predictive of writing quality, including organizational markers that signal argumentative structure and epistemic stance markers which entail degree of possibility, certainty or acknowledgment of the writer’s beliefs about the truth of certain assertions or state of affairs (e.g., Certainly, Possibly, It is unlikely that . . . ). Fewer studies have explored the relation between linguistic features and writing quality ratings of narrative texts, and most of them have been conducted with younger students. For instance, studies with 2nd and 4th grade students have found that length and diversity of vocabulary predicted narrative writing quality ratings (Olinghouse & Leaird, 2009). Additionally, in 9th graders’ writing, frequency of cohesive indices was found to explain a significant amount of variations in narrative writing quality ratings (Cameron et al., 1995). 6 W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17

3.2. Research on EFL writing

The majority of studies on predictors of EFL writing quality ratings focused on argumentative writing at the undergraduate or graduate level. Using the Test of Written English (TWE) corpus, researchers found that highly rated EFL essays were usually longer, with longer average word length, and higher frequencies of certain lexical and grammatical categories (e.g., nouns, hedges, conjuncts)(Ferris, 1994; Frase, Faletti, Ginther, & Grant, 1999; Grant & Ginther, 2000). Interestingly, ﬁndings by Jarvis, Grant, Bikowski, and Ferris (2003) and Friginal, Li, and Weigle (2014) suggest that the quality of a written text may depend less on the use of individual linguistic features than on the underlying patterns of co- occurrence of these features. Apart from these lower-level linguistic measures, Zhao (2013) operationalized “voice” as a macro level construct characterized by hedges and boosters that convey the author’s attitude and credibility, and directives or questions that establish engagement with the reader. In her study, a strong positive correlation was detected between EFL writers’ authorial voice and writing quality rating. Thus, for university-level EFL argumentative writing, raters’ perception of quality is associated with both micro-level lexical features and macro-level discourse features. EFL narrative writing has received less attention than argumentative writing. To our knowledge, only two studies have focused on this question. Narrative writing quality ratings have been found to be related to length and syntactic complexity in Japanese post-secondary students’ writing (Ishikawa, 1995). In a second study, Hungary EFL learners, when compared to native English speakers’ higher-rated narrative writing, tended to use less diverse and sophisticated vocabulary (Kormos, 2011). Thus, the rating of university-level EFL narrative writing quality, so far, has only been found to relate to micro-level lexico-syntactic features.

3.3. Contrasting predictive relations across genres

To our knowledge, there is only one study contrasting predictive relation between linguistic features and writing quality ratings across genres, conducted by Beers and Nagy (2009). They found syntactic complexity – as measured by words per clause – to be positively correlated with writing quality ratings for argumentative essays but not for narratives. In other words, the syntactic complexity that contributed to argumentative quality was clause-internal, as demonstrated in the use of clause-lengthening prepositional phrases, attributive adjectives, gerund phrases and infinitive phrases. On the other hand, Beers and Nagy (2009) found that clauses per T-unit was positively correlated with quality ratings for narratives, but negatively correlated with quality ratings for essays. This finding can be interpreted as aligned with the claim that increased subordination is a more effective discriminator of conversational proficiency, which is arguably closer to the narrative genre, rather than to academic writing (Biber et al., 2011). In light of the interesting contrast found in Beers and Nagy (2009), the present study will further unpack the relation between these two syntactic measures and overall writing quality ratings within and across genres. In summary, research on both native English speakers and EFL learners has been inconclusive about which lexico- syntactic and discourse features are associated with writing quality ratings in each genre. Moreover, limited research is available to contrast such predictive relations across genres, especially in EFL adolescent learners’ writing.

4. Research questions

In order to better understand Chinese secondary school EFL learners’ writing performance across genres and unpack the relations between certain linguistic features and writing quality ratings within each genre, the present study seeks to answer the following two sets of research questions: 1. Comparing writing performance across genres: (a) Do overall quality ratings of Chinese secondary school EFL learners’ written personal narratives and argumentative essays differ by genre? (b) Do incidences of key lexico-syntactic features vary by genre in these Chinese secondary school EFL learners’ written texts? 2. Predicting overall writing quality ratings within each genre: (a) What genre-speciﬁc discourse features characterize Chinese secondary school EFL learners’ writing? (b) Controlling for essay length and participants’ grade level, what lexical, syntactic, and discourse features are predictive of overall writing quality ratings in each genre?

5. Methods

5.1. Participants

The sample consisted of 100 secondary school EFL learners, whose ages ranged from 11 to 17 years (see Table 1). Students’ grade-level ranged from 6th to 11th grade. The sample was relatively balanced by gender, with 53 boys and 47 girls. All participants received comparable standard instruction in the same language institute in east China. Besides the regular English classes they took at local middle/high schools, these participants attended an extra language program outside of school in the same language institute, mostly for the purpose of test preparation (e.g., TOEFL, IELTS) in order to apply for undergraduate study in English-dominant countries, such as the U.S. and U.K. As described in the curriculum and by teachers W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17 7

Table 1 Demographic characteristics (n = 100).

Frequency Gender Male 53 Female 47

Grade level Grade 6 23 Grade 7 24 Grade 8 27 Grade 9+ 26

English proﬁciency level (Common European Framework) B1.1—Intermediate 1 15 B1.2—Intermediate 2 32 B2.1—Upper intermediate 1 35 B2.2—Upper Intermediate 2 4 Not available 14 at the language institute, most writing practices focused on argumentative genre, which is not surprising given that this is the most commonly tested genre in high-stake examinations.

5.2. Data collection

Compositions written by Chinese secondary school EFL learners were collected using a digital platform as part of students’ regular classroom activities in the language institute. During the computer-based writing assessment, each student was asked to respond in writing to a narrative prompt and an argumentative prompt in 90 min (40 min for each text plus a 10-min break). Both writing prompts were on a similar topic to optimize comparison across genres (see Appendix A for the full writing prompt).

5.3. Data analysis

Data were transcribed, segmented into clauses, coded, and analyzed using the transcription conventions and automated language analysis tools of the CHILDES program (MacWhinney, 2000). To homogenize the formatting of the data and avoid any subjective impressions of writing quality ratings due to mechanical mistakes, all unconventional spellings, capitalizations and punctuations were removed from the texts, and were recorded on a separate coding tier. After data were transcribed and veriﬁed by a second researcher, the following measures were generated:

5.3.1. Writing quality Writing quality was rated using two genre-specific six-point-scale holistic scoring rubrics (The National Assessment of Educational Progress, 2011). These rubrics offered the advantage of providing genre-specific yet comparable scores across genres along similar dimensions, including content, organization, use of details, voice, and effective use of language. In addition, scorers were made aware of the different demands expected in each genre and were asked to assess “whether the text elicited appropriate information and language style for the particular genre.” Scorers were also provided with a packet of prototypical examples, selected by an experienced native-English-speaking scorer and the authors, which represented different levels of writing quality in both genres. The writing quality ratings are comparable across genres in that, for instance, a 6-point argumentative essay and a 6-point written narrative both represent the best possible writing performance in the corresponding genre. Two native English-speaking, experienced teachers who were blind to the research questions scored each text. All texts were double-scored and a satisfactory level of inter-rater reliability was achieved (k = 0.83 for argumentative; k = 0.80 for narrative). Following standard SAT scoring practices, when two scorers reached either exact or adjacent agreement, their ratings were summed to form the final scores; when the two scorers were more than one point apart, a third expert scorer intervened to settle the disagreement, and the expert’s score was doubled as the final score.

5.3.2. Length Text length was measured by the total number of clauses. A clause is defined as “a unit that contains a unified predicate, . . . [i.e.,] a predicate that expresses a single situation (activity, event, state). Predicates include finite and nonfinite verbs, as well as predicate adjectives” (Berman & Slobin, 2013; p. 660). 8 W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17

5.3.3. Lexical measures This set of measures captured word-level characteristics for both narrative and argumentative texts, measuring the frequency of: a)[174_TD$IF] Long words: or polysyllabic words; speciﬁcally, words with three or more syllables (e.g., perspective, transportation) (Wimmer, Köhler, Grotjahn, & Altmann, 1994). b)[175_TD$IF] Abstract nouns: using a four-level semantic abstractness scale (Berman & Nir-Sagiv, 2007; Ravid, 2006), nouns used in students’ writing were categorized into: concrete objects or proper names (e.g., bike, Mary); categorical and generic nouns (e.g., people, doctor); abstract but high-frequency nouns (e.g., idea, exam); abstract and low-frequency nouns or derivational nouns (e.g., perspective, communication). c)[176_TD$IF] Academic vocabulary: words that appear on the Academic Vocabulary List (Coxhead, 2000). d)[17_TD$IF] Lexical diversity: was captured through the widely used VocD measure, which reduces the impact of length in estimating the variety of words used in a text (McKee, Malvern, & Richards, 2000).

The measures for frequency of long words, academic vocabulary, and VocD were automatically generated using the CHILDES programs (MacWhinney, 2000). Two researchers doubly coded all narrative and argumentative texts for frequency of abstract nouns, with an inter-rater reliability of k = 0.92.

5.3.4. Syntactic measures Two measures based on Beers and Nagy (2009) were generated to assess syntactic complexity in both narrative and argumentative texts. a)[174_TD$IF] Words per clause: A higher ratio of words per clause is associated with the literate or academic register, which indicates the writers’ skill to convey information in a more concise manner. b)[175_TD$IF] Clauses per T-unit: T-units are deﬁned as thematic units of complete and autonomous meaning, corresponding to a main clause plus all the subordinate clauses embedded in it (Hunt, 1965). Using multiple clauses per T-unit is one way of expressing complex relations between ideas and the number of clauses per T-unit is found to increase with students’ age (Beers & Nagy, 2009).

Two researchers double-coded 20% of transcripts. Inter-rater reliability was estimated by applying Cohen’s kappa statistics, with k = 0.89 for T-unit coding and k = 0.93 for clause coding.

5.3.5. Genre-specific discourse measures While the same lexico-syntactic features can be investigated across genres, well-formed text construction in each genre requires the writer to apply different discourse markers to generate the text structure and express a personal evaluative stance. Using research-based genre-specific features, two discourse dimensions were coded for each genre: (1) discourse organization and (2) evaluative stance: Organizational markers in argumentative texts: Following research on metadiscourse analysis (Dobbs, 2014; Hyland, 2005; Uccelli et al., 2013), markers used to explicitly signal the organization of argumentative text structure were identified and classified into four subcategories: a)[174_TD$IF] Frame markers signal the sequence of arguments or counter-arguments (e.g., first of all, on the other hand); b)[175_TD$IF] Code glosses introduce an example or paraphrase (e.g., for example, in other words); c)[176_TD$IF] Transition markers signal additive, adversative, or causal relations between clauses and paragraphs (e.g., moreover, even though, because). Temporal markers and the coordinating conjunction “and” were not coded; d)[17_TD$IF] Conclusion markers explicitly state the writer’s summary or conclusion of the essay (e.g., in conclusion; all in all).

Stance markers in argumentative texts: Based on,[178_TD$IF] Reilly, Baruch, Jisa, and Berman (2002), Hyland (2005), and Uccelli et al. (2013), stance markers in argumentative essays were identiﬁed and coded as: a)[174_TD$IF] Deontic markers: indicate a writer’s absolute or categorical stance or viewpoint towards an assertion (e.g., everybody should do . . . , it is wrong to . . . ); b)[175_TD$IF] Epistemic markers: display the writer’s stance towards the truth of an assertion. Three subtypes of epistemic markers were identiﬁed: a) Epistemic hedges which express degree of uncertainty, signaling a writer’s cautiousness when making assertions (e.g., it might be true . . . , it is possible that . . . ); b) Epistemic boosters which emphasize the writers’ commitment to the truth of an assertion (e.g., it is absolutely true . . . ); c) Personal beliefs signal that assertions are the result of one’s or others’ personal beliefs (e.g., I think, people assume that . . . ).

Organizational markers in narrative texts: Following a classic framework for cohesion analysis (Halliday & Hasan, 2014), narrative texts were coded for transitional connectives that denote temporal and logical relations at the inter-clausal level. These markers were identified and subsequently coded for type of relation signaled including 1) additive (e.g., furthermore, W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17 9 that is), 2) adversative (e.g., but, although), 3) causal (e.g., because, therefore) and 4) temporal/aspectual (e.g., first, last, finally) relations. The coordinating conjunction and was not included in this coding. Stance markers in narrative texts: Following Peterson and McCabe (1983) tools for identifying evaluative markers in narratives, narrative texts were coded for four types of evaluative stance markers used by the writer to offer an implicit or explicit subjective interpretation of the narrated events. Stance markers in narrative texts were subsequently coded for: a) internal states (e.g., markers that express emotions, thoughts), b) rhetorical moves (e.g., similes and metaphors; exaggeration), c) objective judgments (e.g., a means by which the narrator uses other people to evaluate the narrated event) d) evaluative qualifiers (e.g., adjective, adverbs, intensifier). Two researchers doubly coded 20% of the data for discourse markers and achieved adequate reliability (k = 0.88 for argumentative organizational markers; k = 0.91 for argumentative stance markers; k = 0.92 for narrative organizational markers; k = 0.87 for narrative stance markers).

5.4. Analytic plan

To address the first set of research questions, descriptive statistics were generated for writing quality ratings by genre. After examining the score distribution, we conducted paired sample t-tests with writing quality scores as the dependent variable. Then descriptive statistics were generated for lexical and syntactic measures by genre. After examining the distribution of variables and checking the important assumptions, paired sample t-tests were conducted to investigate whether the incidences of lexical and syntactic features varied by genre. To address the second set of research questions, a descriptive analysis was first conducted to examine genre-specific discourse features. Principal Component Analysis (PCA) was performed to reduce collinearity among certain linguistic variables. Then, correlational analysis informed the construction of a series of theory-based hierarchical regression models, independently for narrative and argumentative genres.

6. Results

6.1. RQ1: writing quality and lexico-syntactic variations across genres

6.1.1. Comparing writing quality across genres Argumentative essays and narratives were double-scored for overall writing quality (with a possible range of 2–12). Narrative mean quality score was 7.31 (SD = 2.80). Argumentative essays had a slightly higher mean quality score of 7.53 (SD = 2.11). A paired sample t-test showed that the difference between mean quality ratings of argumentative and narrative writing was not statistically signiﬁcant (t(99) = 0.89, p = 0.37). Furthermore, a closer examination of the data revealed enormous individual variability within and across genres. As illustrated in Fig. 1, 80% of argumentative essays displayed a score that fell in the upper half of the scale (6–10). In comparison, the distribution of narrative scores was more spread-out across all levels of writing quality scale, with a high number of narratives displaying either a high score (11–12) or a score at the lower-end (2–5) on the scale.

6.1.2. Comparing lexical-syntactic features across genres Table 2 summarizes descriptive statistics for text length and lexico-syntactic features in sampled argumentative and narrative writing. Given that we are investigating eight measures and therefore performing eight tests on the same dataset simultaneously, we employ the Bonferroni correction to avoid spurious positives, following (Lu, 2010). This sets the alpha value for each comparison to 0.05/8, or 0.006. Paired t-tests demonstrated that argumentative and narrative genres varied [(Fig._1)TD$IG]

Fig. 1. Distribution of writing quality scores by genre. 10 W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17

Table 2 Descriptive statistics for text length, lexical, and syntactic features by genre (N = 100).

Variable Argumentative Narrative t(99)

M (SD) Min–Max M (SD) Min–Max Length Number of clauses 32.81 (12.07) 6–68 36.7 (18.19) 6–82 3.08* À Number of words 197.61 (76.57) 38–409 200.23 (101.83) 34–491 0.40 À Lexical features Freq of Long words 11.91 (7.41) 0–31 9.07 (6.90) 0–27 4.32* Freq of Abstract nouns 19.28 (8.81) 4–44 12.53 (9.28) 0–39 7.99* Freq of Academic words 1.36 (1.95) 0–13 2.76 (2.70) 0–11 4.68* À Lexical diversity 67.26 (21.42) 22.12–117.34 57.27 (15.77) 21.06–98.41 5.17*

Syntactic features Clauses per T-unit 1.77 (0.33) 1.17–3.1 1.68 (0.33) 1–3 2.03 Words per Clause 6.02 (0.73) 4.55–8.40 5.44 (0.51) 4.13–7.23 7.64*

* p < 0.006, adjusted alpha value based on Bonferroni correction. significantly at most lexico-syntactic dimensions. When length is measured by number of clauses, narratives were significantly longer than argumentative essays (p = 0.003). Despite that no significant difference is found in the total number of words across genres (p = 0.69), argumentative texts displayed considerably higher frequencies of complex lexical features, namely: a significantly greater frequency of long words that contain three or more syllables (p < 0.001), semantically abstract nouns (p < 0.001), and more diverse vocabulary (p < 0.001). Surprisingly, narratives contained, on average, more academic vocabulary than argumentative essays (p < 0.001). However, it is noteworthy that both genres displayed only limited use of academic vocabulary, ranging from an average of 1.36–2.76 academic words per text. In addition, argumentative texts also differed from narratives in syntactic features when measured by the number of words per clauses (p < 0.001), but there is no statistically significant cross-genre difference in the number of clauses per T-unit (p = 0.05). Thus, argumentative texts tended to display a higher level of lexico-syntactic sophistication than narrative texts, as indexed by most measures, with the exception of frequency of academic vocabulary and number of clauses per T-unit.

6.2. RQ2: predicting writing quality within genre

6.2.1. Genre-specific discourse features Table 3 exhibits both frequency and diversity of discourse markers coded in these secondary school EFL learners’ argumentative and narrative texts. In argumentative essays, among the four types of organizational markers coded, inter- clausal transitional markers displayed the highest frequency within each essay and appeared in 97 out of the 100 essays. Additionally, more than 60% of essays contained code glosses (e.g., for example, such as), and nearly half of the sample used frame markers to explicitly signal sequence and organization of arguments (e.g., first, second; on the other hand). The least frequently used markers within a single essay were markers of conclusion, which were only present in 37 out of the 100 essays in the sample. Stance markers in argumentative essays were used less frequently than organizational markers. The most widely used type of stance markers were markers of personal beliefs (e.g., I think), which appeared in 85 essays. However, a closer look at the data revealed overabundant usage of this type of stance markers in repetitive pattern, such as using “I think . . . ” every time a new argument was introduced. Consistently with prior research (Reilly et al., 2002), it was not always possible to decide whether this is a form of epistemic stance or was only used as discourse-sequencer. On the other hand, only a third of the essays included epistemic hedges (e.g., it might be, it is possible), with an average of less than one instance per essay. Finally, epistemic boosters (e.g., it is absolutely true) were the least frequently used stance marker both within and across essays. Narrative discourse organization differs from that of argumentative discourse, in that it usually follows a sequence of events in a temporal order, and thus the organizational markers in narrative discourse mostly denote the transition at micro inter-clausal level. As expected, temporal markers (e.g., earlier, finally) displayed the highest frequency. Comparably, 88 students used causal markers (e.g., because, the reason that . . . ) followed by adversative markers (e.g., though, however). Finally, additive markers (e.g., also, too) had a lower frequency, given that the most frequently used colloquial additive marker and was excluded from the coding. Finally, written narratives demonstrated a variety of stance markers. All four types of stance markers were identified in 80 out of the 100 narratives. Evaluative qualifiers, that is, the use of adjectives (e.g., unforgettable), and adverbs (e.g., actively), and intensifiers (e.g., really, very much), were the most frequently used type of evaluation. Internal states (e.g., emotion, hypothesis) and rhetorical moves (e.g., metaphor or exaggeration) showed comparable frequencies, with more than three instances per text, on average. Finally, 80 students also used other people’s perspectives to evaluate the narrated events. W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17 11

Table 3 Descriptive statistics for frequency and diversity of genre-speciﬁc discourse markers in argumentative and narrative writing (N = 100).

Variables Mean SD Min–Max # of Essaysa Argumentative texts Organizational markers Frequency of organizational markers 7.07 4.10 Diversity of organizational markers 2.42 1.02 Frequency by type Frame markers 0.85 1.19 0–5 46 Transition markers 4.85 3.12 0–14 97 Code Glosses 0.99 0.98 0–4 62 Conclusion markers 0.38 0.51 0–237

Stance markers Frequency of stance markers 3.26 2.32 Diversity of stance markers 1.72 0.85 Frequency by type Deontic markers 1.15 1.64 0–9 52 Epistemic hedges 0.51 1.00 0–4 29 Personal beliefs 1.52 1.11 0–6 85 Epistemic boosters 0.08 0.34 0–26

Narrative texts Organizational markers Frequency of organizational markers 7.07 9.10 Diversity of organizational markers 3.06 1.04 Frequency by type Additive 1.00 1.16 0–5 55 Adversative 1.91 1.67 0–876 Causal 3.05 2.76 0–21 88 Temporal 3.54 2.70 0–12 87

Stance markers Frequency of stance markers 17.18 10.12 Diversity of stance markers 3.54 0.81 Frequency by type Evaluative qualiﬁers 8.56 5.74 0–25 98 Internal state 3.14 2.82 0–15 87 Rhetorical moves 3.48 2.63 0–13 89 Objective judgments 2.00 1.62 0–6 80

a # of texts in which the frequency of the discourse marker is not zero.

6.2.2.[179_TD$IF] Correlations of linguistic features with writing quality Table 4 presents the intercorrelations among argumentative and narrative writing quality ratings and all linguistic measures. As expected, text length, as measured by number of clauses, displayed statistically significant high correlations with both argumentative and narrative quality ratings, indicating the necessity to include it as a control variable in subsequent regression analyses. All lexical measures not only showed moderate-to-high correlations with quality ratings across genres, but also strong correlations among themselves (see Table 4). Therefore, following previous research (Uccelli et al., 2013), we created a composite score named lexical complexity using Principal Component Analyses (Eigenvalue = 2.06, 70% of the variation was explained), and weighted the three lexical measures – frequency of long words, abstract nouns and lexical diversity – by their factor loadings of 0.62, 0.60, and 0.51, respectively. At the syntactic level, words per clause showed a moderate and significant relation with argumentative quality, and a low-to-moderate correlation with narrative quality that approached significance. However, no statistically significant relation was detected between clauses per T-unit and writing quality ratings in either genre. We were encouraged to see that most discourse markers coded in both argumentative and narrative writing captured individual variability relevant to predicting the variability in overall writing quality ratings. For argumentative essays, both frequency and diversity of organizational markers displayed moderate pairwise correlations with writing quality, but only diversity of stance markers showed a weak positive (but non-significant) association with argumentative quality. As for narratives, both frequency and diversity of organizational and stance markers showed high correlations with writing quality ratings, but it needs to be tested in regression analyses whether such association is confounded by text length.

6.2.3.[180_TD$IF] Distinct predictors of writing quality within genre Informed by the correlation analysis, a series of hierarchical regression models was built to explore the predictive power of lexical, syntactic, and discourse features in explaining ratings of argumentative and narrative writing quality (see Tables 5 and 6, respectively). Visual inspection of the distribution of standardized residuals for each model conﬁrms that all assumptions of Multiple Regression are met. 12 W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17

Table 4 Intercorrelations between argumentative/narrative quality ratings and all linguistic measures.

12 3 4 567 8910 Argumentative 1. quality – 2. text length 0.63** – 3. w_long 0.58** 0.66** – 4. w_abstract 0.58** 0.61** 0.60** – 5. w_diversity 0.48** 0.34** 0.42** 0.25* – 6. sx_w/c 0.32** 0.01 0.27** 0.36** 0.25* – À * ** 7. sx_c/t 0.13 0.18 0.24 0.17 0.05 0.28 – À ** À ** À ** À ** À 8. org_freq 0.47 0.58 0.41 0.47 0.09 0.19 0.18 – ** ** ** ** * À ** 9. org_divs 0.46 0.42 0.39 0.35 0.19 0.22 0.09 0.63 – À 10. sta_freq 0.04 0.29** 0.15 0.08 0.03 0.06 0.02 0.34** 0.28** – ** À ** ** ** 11. sta_divs 0.15 0.28 0.17 0.11 0.16 0.04 0.14 0.36 0.27 0.75

Narrative 1. quality – 2. text length 0.73** – 3. w_long 0.62** 0.74** – 4. w_abstract 0.59** 0.70** 0.82** – 5. w_diversity 0.44** 0.45** 0.51** 0.48** – ** ** 6. sx_w/c 0.19 0.05 0.28 0.28 0.10 – 7. sx_c/t 0.09 0.25* 0.22* 0.23* 0.27** 0.08 – À 8. org_freq 0.60** 0.73** 0.53** 0.47** 0.23* 0.30** 0.12 – ** ** ** ** * ** 9. org_divs 0.55 0.62 0.44 0.45 0.20 0.21 0.10 0.67 – 10. sta_freq 0.70** 0.69** 0.56** 0.50** 0.27** 0.21* 0.02 0.58** 0.52** – ** ** ** ** ** ** ** 11. sta_divs 0.48 0.56 0.34 0.38 0.19 0.04 0.17 0.53 0.50 0.53

Note: quality = writing quality ratings; text length = number of clauses; w_long = frequency of long words; w_abstract = frequency of abstract nouns; w_diversity = vocd; sx_w/c = words per clause; sx_c/t = clauses per t-unit; org_freq = frequency of organizational markers org_divs = diversity of organizational markers; sta_freq = frequency of stance markers; sta_divs = diversity of stance markers. p < 0.10. * p < 0.05. ** p < 0.01.

We first used a theory driven incremental approach of hierarchical multiple regression analyses to explore linguistic features that predict writing quality ratings of argumentative essays. First, we entered text length as a control variable, which accounted for 40% of the variation in writing quality ratings. Then, we introduced the key predictor variables one at a time, starting with lexical complexity. We found a significant main effect of lexical complexity on writing quality ratings, controlling for text length. In Model A3, syntactic complexity explained an additional 5% of the variance in quality scores. In Model A4, the effect of the diversity of organizational markers on writing quality approached significance. Though adding this predictor only increased the R2 by 1%, it is indicative of a potential relation worth exploring further in a larger sample. In Model A5, we added diversity of stance markers as an additional predictor, which, however, showed no statistically significant association with writing quality, controlling for other variables. Using Akaike information criterion (AIC), Model A4 was retained as the final model as it is the most parsimonious model containing promising predictors of writing quality scores, which explain 51.6% of the variation in writing quality ratings. All possible interactions were tested, but none was found to be statistically significant.

Table 5 Hierarchical multiple regression analyses predicting argumentative writing quality from lexical, syntactic and discourse features.

Model A1 Model A2 Model A3 Model A4 Model A5 Length 0.11*** (0.014) 0.04* (0.018) 0.06** (0.018) 0.05** (0.019) 0.06** (0.019) Lexical complexity 0.72*** (0.146) 0.47** (0.164) 0.46** (0.163) 0.45** (0.164) Syntactic complexity 0.70** (0.233) 0.62** (0.236) 0.63** (0.237) Organizational markers (diversity) 0.24 (0.157) 0.26 (0.161) Stance markers (diversity) 0.11 (0.178) À _cons 3.93*** (0.475) 5.99*** (0.584) 1.29 (1.669) 1.36 (1.658) 1.38 (1.664)

N 100 96 96 96 96 R2 0.400 0.456 0.504 0.516 0.518 AIC 383.680 347.986 341.094 340.747 342.33

p < 0.1. * p < 0.05. ** p < 0.01. *** p < 0.001. W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17 13

Table 6 Hierarchical multiple regression analyses predicting narrative writing quality from lexical, syntactic, and discourse features.

Model N1 Model N2 Model N3 Model N4 Model N5 Model N6 Length 0.11*** (0.011) 0.08*** (0.017) 0.11*** (0.010) 0.10*** (0.016) 0.07*** (0.017) 0.11*** (0.024) Lexical complexity 0.47* (0.207) Syntactic complexity 0.82* (0.374) 0.73 (0.405) 0.44 (0.390) 0.49 (0.380) Organizational markers (Freq) 0.03 (0.053) 0.02 (0.050) 0.01 (0.050) À Stance markers (Freq) 0.09*** (0.025) 0.19*** (0.048) Organizational markers (Freq) Length 0.00* (0.001) Â À _cons 3.18*** (0.433) 4.65*** (0.740) 1.23 (2.062) 0.78 (2.211) 0.49 (2.110) 1.04 (2.144) À À À N 100 94 100 100 100 100 R2 0.536 0.508 0.557 0.559 0.613 0.637 AIC 415.903 399.801 413.100 414.750 403.663 399.232

In a similar process, text length was first entered into a baseline model to predict narrative writing quality (see Table 6), which accounted for 53.6% of the variance in narrative writing quality ratings. In Model N2, when introducing lexical complexity as a key predictor, we saw a decrease in the R2, which indicates that lexical complexity did not contribute to explain additional variation in writing quality ratings above and beyond text length. This is expected given the high correlation between length and lexical complexity (r = 0.72, p < 0.001). Therefore, lexical complexity was dropped from subsequent models to resolve the collinearity issue. In Model N3, syntactic complexity showed a positive main effect controlling for text length, but in contrast to what we observed in argumentative models, such effect became non-significant in Models N4 and N5 when introducing the frequency of the two types of discourse markers. We were also encouraged to see that the frequency of stance markers was a significant predictor of writing quality. Moreover, the significant interaction (in Model N6) denoted that the effect of frequency of stance markers in written narratives was weaker for shorter texts than longer ones. Ultimately, using Akaike information criterion (AIC), Model N6 was retained as the final model, explaining 63.7% of the variability in narrative writing quality.

6.2.4.[18_TD$IF] Illustrating predictors of writing quality In the appendices, we display excerpts from two examples per genre that illustrate the higher and lower ends of the writing quality continuum for the sample of students. These examples were selected to illustrate how specific lexico- syntactic and discourse features contributed to differentiated higher-quality and lower-quality writing in each genre. It is worth noticing that all four examples show space for improvement in lexico-grammatical accuracy. Given that a large amount of research has been conducted to specifically address issues of mechanical or grammatical accuracy (Chandler, 2003; Polio, 1997), the present study focused instead on lexico-syntactic and discourse resources. Argumentative Essay 1 received a score of 4 out of 12, whereas Argumentative essay 2 received the highest score possible on the writing quality scale. The holistic rubric used in this study was calibrated to capture the variability within the sample; thus, despite the notable opportunities for improvement in Essay 2, this essay represents the best writing performance produced by this sample. In comparing the two argumentative texts displayed in Appendix B, we can observe that the high- quality essay differed from the low-quality essay in lexical complexity, syntactic complexity, and diversity of organizational markers. At the lexico-syntactic level, Essay 2 demonstrated more incidences of structurally complex (e.g., extremely, talented) and semantically abstract vocabulary (e.g., achievement, courage) compared to Essay 1. Moreover, the syntactic complexity in Essay 2, as measured by average words per clause, was 6.63 compared to 5.31 in Essay 1, which indicated a higher frequency of embedded clauses used in Essay 2 (e.g., because a person who doesn’t work hard may not be successful though he is extremely talented.). At the discourse level, Essay 2 displayed the author’s strategic use of diverse organizational markers to signal argumentative structure. In addition to several transitional markers adopted to explicitly indicate the logical relations across sentences (e.g., because, that is), the author also used a variety of frame markers (e.g., on the one hand, the final thing), code glosses (e.g., for example) and conclusion markers (e.g., in summary), which successfully oriented the reader to the progression of arguments. Appendix B displays excerpts of relatively low- and high-quality narrative writing, with scores of 4 and 12 respectively. As observed in the text, the two narratives do not differ remarkably in lexical and syntactic complexity, as both make use of mostly simple vocabulary (e.g., wide, big) and syntactic structure. However, we can observe a considerable difference in the use of evaluative markers between these two narratives. Narrative 2 displays abundant use of evaluative stance markers compared to Narrative 1, including description of internal states (e.g., nervous), evaluative qualifiers (e.g., scary, suddenly), exaggeration (e.g., I almost couldn’t move), and objective judgments (e.g., my mum looked at me and gave me a smile). In doing so, the author vividly incorporated her evaluative stance to the narration and actively engaged readers in the story.

7. Discussion

In sum, in contrast to the findings for native English writers, the present study revealed no statistically significant differences in Chinese secondary school EFL learners’ writing quality in narrative and argumentative essay. However, in line 14 W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17 with research on native English adolescent writers, argumentative writing demonstrated more sophisticated lexico- syntactic features, showing higher frequency of long words, abstract nouns, and words per clause. Finally, distinct lexical, syntactic, and discourse features were identified as predictors of overall writing quality for each genre. Controlling for text length, lexical complexity, syntactic complexity, and diversity of discourse organizational markers was found to significantly and independently contribute to explain the variability in argumentative writing. For narrative writing quality, only frequency of stance markers was found to be predictive after controlling for length.

7.1. Similarities and differences with previous research

Our finding of higher-level lexico-syntactic complexity in EFL learners’ argumentative writing compared to their narratives is similar to what researchers have found with native English speakers (Berman & Nir-Sagiv, 2007). This finding indicates that EFL learners not only distinguished each genre’s communicative purposes—which would be expected at this age, but also adequately deployed relevant linguistic resources from their non-native language to serve these distinct purposes. Nevertheless, our results differ from previous research on native English speakers’ cross-genre writing in several aspects. First, in contrast to the replicated finding of native English secondary school writers achieving higher overall quality in narrative as compared to argumentative writing, our analysis with EFL learners revealed no differences in quality across genres. Moreover, different from Beers and Nagy (2009), who found that the quality of argumentative and narrative writing was correlated with different measures of syntactic complexity, our results revealed that words per clause had positive and stronger correlations with quality in both genres. One possible reason that clauses per T-unit did not correlate with writing quality might be these EFL learners’ redundant use of the same syntactic structure. For example, many essays started with a sentence of the form “I think . . . because . . . ” and repeated this formula a number of times. The finding that clause subordination was correlated with neither narrative nor argumentative writing quality also lends support to Biber et al. (2011), showing that dense use of clause subordination may not necessarily indicate higher writing quality.

7.2. Limitations and future directions

While promising, our results must be viewed in light of some critical limitations. First, it is necessary to acknowledge that our ﬁndings reveal characteristics of these writers’ performances on what is called on-demand writing, without possibly capturing their full capabilities as EFL writers under different conditions. Thus, the study did not attempt to assess students’ overall writing ability; rather, it had the modest goal of assessing the features of particular writing products. Moreover, the sample contains a wide age range without large numbers of participants per grade. Thus, results only demonstrate individual variability and could by no means offer general developmental trends. Cross-sectional studies with larger sample sizes at each grade level or, ideally, longitudinal research that follows writers throughout the secondary school years would be helpful to examine developmental trajectories and individual variability in the development of EFL writing performance across genres. Thirdly, the linguistic measures adopted in the present study only assessed certain aspects of linguistic complexity (e.g., clausal complexity, clause subordination), whereas other important measures deemed to be critical in advanced academic writing (e.g., complex noun phrase constituents) were not explored. Future research could conduct a more comprehensive evaluation of linguistic complexity in this population. Lastly, students’ writing performance in L1 plays an important role in their EFL writing performance, yet information on students’ L1 was not available for this sample. In future research, it will be interesting to explore cross-linguistic relations between EFL learners’ L1 and L2 genre knowledge and cognitive abilities. Moreover, it would be insightful to examine the relative importance of L1 genre knowledge and L2 language proﬁciency in predicting L2 writing performance across genres.

7.3. Instructional implications

Our study revealed several possible areas that merit special instructional attention. First, results showed that Chinese secondary-school EFL learners in the sample used a limited number of academic vocabulary in writing. This suggests that these students would benefit from instruction that was designed to expand their academic vocabulary through purposeful selection of words and optimal conditions for students to actively use academic vocabulary orally and in writing (Coxhead, 2000). EFL learners’ limited use of stance markers, especially the epistemic stance markers, is worth noticing as well. These markers constitute grammatically complex and cognitively advanced forms, which usually develop around the adolescent years in native English writers. It would be interesting to explore whether the use of these markers could indeed be scaffolded through EFL instruction designed to scaffold both language and cognitive development; or whether, instead, this constitutes an area of later development in EFL writing compared to native language writing. That EFL learners in the sample did not perform better on narrative writing seemed to counter the data-driven conventional belief about the narrative–argumentation developmental trajectory. We will surely need a longitudinal study to confirm the developmental trajectory for EFL learners, but the present study provides some preliminary understanding of EFL learners’ writing performance across genres. The findings are in line with a view of language learning as a repertoire of distinct context-specific types of discourses that learners acquire on the basis of opportunities to learn—learning one set of W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17 15 discourse practices relevant for a particular context does not guarantee language performance in other contexts. In the case of EFL learners, we cannot assume that those students who are proficient in complex uses of language (e.g., argumentative writing) would automatically master other supposedly “easier” discourse practices or those acquired earlier by native language learners (e.g., narrative writing). Teaching EFL cannot be understood as promoting a global English proficiency expected to be equally functional across contexts. Instead, EFL researchers and practitioners should reflect on the diverse needs students would encounter in authentic communicative contexts in order to make informed decisions regarding which discourse practices should be selected for instruction. Another important motivation of this work is to make visible to EFL practitioners and researchers a repertoire of linguistic features that are closely associated with high writing quality, at least as measured by experienced native-English-speaking teachers. The predictive relation differs by genre. Therefore, our findings highlight the need to make students aware of the different language resources expected across genres. Our findings are relevant to inform the design of pedagogical approaches attuned to different genres, in particular, narrative and argumentative writing. Future research could design interventions informed by the strengths and weaknesses identified in this study in order to explore how to effectively teach these research-based repertoires of skills. Testing research-based interventions can inform classroom practices that seek to promote EFL learners’ writing performance and especially encourage students to understand language as a functional solution to specific contexts of communication.

Appendix A. : The writing prompt

Argumentative writing prompt

Some people believe that success in life comes from risks or chances. Others believe that success results from careful planning. In your opinion, what does success come from? Use speciﬁc reasons and examples to explain your position. You will have 40 min to complete this essay. A typical effective response would be between 300 and 350 words.

Narrative writing prompt

Write a personal story about a time when you achieved success. Please include detailed memories about that experience, including the context, your actions, feelings, etc. You will have 40 min to complete this narrative story. A typical effective response would be between 300 and 350 words.

Appendix B.

Students’ argumentative writing: low-quality and high-quality examples

Essay 1: Low-quality writing (7th grade, male student) Lots of people think success is a further question, it’s hard to be. But I think, successful is just around us. Everyone wants to be success, but a few people are really hardworking for it. I think be success is very easy. A very warm wish, a hot drink in winter all can be little success. Doing good in the Final exam, get much money by the ﬁrst month to work. They all are success. So, do well in everything you will be successful.

Essay 2: High-quality writing (9th grade, female student) I think success comes from three things, hard work, being careful with details and courage. Hard work is the easiest, the most basic and the most important thing of the three. You can get abilities to succeed. It doesn't need talent. Because a person who doesn't work hard may not be successful though he is extremely talented. On the contrary, a not so talented man will succeed because of his hard work. It's the thing that everybody can do. Next is being careful with details. On the one hand, it means to do everything carefully. Let me show you an example. In China, students need to take exams before they go to high schools or universities. This kind of exam is very important to students. But the exam is not very difficult. Students just need to be careful and make mistakes as few as possible. On the other hand, being careful with details can bring you some chances. That is, details in our life can show us the way to success. So, we should pay attention to them and then get the chance. The final thing is courage. You need courage to do many things. For example, when you face your chance, you need courage to decide to take it. Another example, when you meet some difficulties, you need some courage to move on, to fight against it, to continue to walk on your way to success. In summary, courage can help you never give up. Maybe, we are not as successful as those celebrities. And maybe, we don’t get many achievements in our life. But I think it doesn't matter. Because I think everybody is the biggest success of his generation.

Bold: Organizational markers Underline: Stance markers

Students’ narrative writing: low-quality and high-quality examples

Narrative 1: Low-quality writing (7th grade, male student) I have a story let me tell you. It is about my success. When I was 11 years old in summer, I played computer games at home. I ﬁnished my homework and have a break. It was ﬁve o’clock. At that moment, I heard someone was shouting, so I ran out of the door and look what wrong. Suddenly, I know at six 16 W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17

floor was get on fire. Everybody were ran outside the building. I get back to my home. There was no people in it, only me. So I pack something very important. And I ran outside the building. A few minutes later, firemen put down the fire. Narrative 2: High-quality writing (8th grade, male student) I didn’t like swimming at all when I was young. But after that rainy night, I think that I’m interested in it. I’m afraid of water when I was young. I think there might be something scary in the deep water. Because the water is so wide and big, I can't imagine what can I move in the scary place. But I change my opinion after one night. That night, I was taken by my mother. She wanted me to learn how to swim. It was such bad information for me. But I couldn’t be against my mother, I followed her to the swimming pool near my home. We wore swimming suits and walked to the scary water quickly. My mother first dived into the water. But I just waited on the bank. My mother was a little angry and asked me to jump into the pool. I was very nervous at that moment. I didn’t want to swim at all because of the scary water. But I should obeyed my mum. I almost couldn’t move then. My mum saw the situation of me and said ‘You just can't swim forever, you will never be great because of your heart.’ I was surprised when I heard that. My heart? What’s wrong with my heart? I was thinking about the question and sitting on the bank. Then I saw a boy who is very young, he was swimming difficultly but very hard. I admired him and I suddenly got a point. Such a young boy could swim, but why I couldn’t make it? I felt so shame and I thought about my mum’s words again. After a while, I suddenly stood up and rushed into the water without hesitation. When I got into the water, I felt so weird. Why should I afraid of it? Then I became to swim slowly. My mum looked at me and gave me a smile. I was full of confidence at that time and became more and more faster. I know that I should believe myself. The things that disturb you are not so difficult. I can make it!

Bold: Organizational markers. Underline: Stance markers.

References

Beers, S. F., & Nagy, W. E. (2009). Syntactic complexity as a predictor of adolescent writing quality: which measures? Which genre?. Reading and Writing, 22, 185–200. Bereiter, C., & Scardamalia, M. (2013). The psychology of written composition. New York: Routledge. Berman, R. A. (2008). The psycholinguistics of developing text construction. Journal of Child Language, 35, 735–771. Berman, R. A., & Nir-Sagiv, B. (2007). Comparing narrative and expository text construction across adolescence: a developmental paradox. Discourse Processes, 43, 79–120. Berman, R. A., & Slobin, D. I. (2013). Relating events in narrative: a crosslinguistic developmental study. New York: Psychology Press. Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? TESOL Quarterly, 45,5–35. Bulté, B., & Housen, A. (2014). Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing, 26, 42–65. Cameron, C. A., Lee, K., Webster, S., Munro, K., Hunt, A. K., & Linton, M. J. (1995). Text cohesion in children’s narrative writing. Applied Psycholinguistics, 16, 257–269. Chandler, J. (2003). The efficacy of various kinds of error feedback for improvement in the accuracy and fluency of L2 student writing. Journal of Second Language Writing, 12, 267–296. Connor, U. (1990). Linguistic/rhetorical measures for international persuasive student writing. Research in the Teaching of English, 24, 67–87. Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34,213–238. Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26,66–79. Crossley, S. A., Roscoe, R., & McNamara, D. S. (2011). Predicting human scores of essay quality using computational indices of linguistic and textual features. In G. Biswas, S. Bull, J. Kay, & A. Mitrovic (Eds.), Artificial intelligence in education: 15th international conference, AIED 2011, Auckland, New Zealand, June 28– July 2011 (pp. 438–440).Berlin, Heidelberg: Springer Berlin Heidelberg. Crowhurst, M. (1980). Syntactic complexity in narration and argument at three grade levels. Canadian Journal of Education/Revue canadienne de l’e´ducation, 5, 6–13. Crowhurst, M. (1990). The development of persuasive/argumentative writing. Developing Discourse Practices in Adolescence and Adulthood, 1, 200–223. Cummins, J. (1980). The cross-lingual dimensions of language proficiency: implications for bilingual education and the optimal age issue. TESOL Quarterly, 14, 175–187. Cummins, J. (1991). Interdependence of first- and second-language proficiency in bilingual children. Language Processing in Bilingual Children70–89. Dobbs, C. L. (2014). Signaling organization and stance: academic language use in middle grade persuasive writing. Reading and Writing, 27, 1327–1352. Engelhard, G. Jr., Gordon, B., & Gabrielson, S. (1992). The influences of mode of discourse, experiential demand, and gender on the quality of student writing. Research in the Teaching of English, 26,315–336. Enright, K. A., & Gilliland, B. (2011). Multilingual writing in an age of accountability: from policy to practice in US high school classrooms. Journal of Second Language Writing, 20, 182–195. Ferris, D. R. (1994). Lexical and syntactic features of ESL writing by students at different levels of L2 proficiency. TESOL Quarterly, 28,414–420. Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition and Communication, 32, 365–387. Fránquiz, M. E., & Salinas, C. S. (2011). Newcomers developing English literacy through historical thinking and digitized primary sources. Journal of Second Language Writing, 20, 196–210. Frase, L. T., Faletti, J., Ginther, A., & Grant, L. (1999). Computer[182_TD$IF] analysis of the TOEFL test of written English. Educational Testing Service. Friginal, E., Li, M., & Weigle, S. C. (2014). Revisiting multiple profiles of learner compositions: a comparison of highly rated NS and NNS essays. Journal of Second Language Writing, 23,1–16. http://dx.doi.org/10.1016/j.jslw.2013.10.001. Gebhard, M., & Harman, R. (2011). Reconsidering genre theory in K-12 schools: a response to school reforms in the United States. Journal of Second Language Writing, 20, 45–55. Gee, J. P. (2001). Reading as situated language: a sociocognitive perspective. Journal of Adolescent & Adult Literacy, 44,714–725. Grabe, W. (2002). Narrative and expository macro-genres. In A. Johns (Ed.), Genre in the classroom: multiple perspectives (pp. 249–267).Mahwah, N.J: L. Erlbaum. Graham, S., & Perin, D. (2007). Writing next: effective strategies to improve writing of adolescents in middle and high schools. A report to Carnegie Corporation of New York. Alliance for Excellent Education. Grant, L., & Ginther, A. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9, 123–145. http://dx.doi.org/10.1016/S1060-3743(00)00019-9. Hall-Mills, S., & Apel, K. (2013). Narrative and expository writing of adolescents with language-learning disabilities a pilot study. Communication Disorders Quarterly, 34, 135–143. Halliday, M. A. K., & Hasan, R. (2014). Cohesion in English. New York: Routledge. Halliday, M., Matthiessen, C. M., & Matthiessen, C. (2014). An introduction to functional grammar. New York: Routledge. Hardy, J. A., & Römer, U. (2013). Revealing disciplinary variation in student writing: a multi-dimensional analysis of the Michigan Corpus of Upper-level Student Papers (MICUSP). Corpora, 8, 183–207. W. Qin, P. Uccelli / Journal of Second Language Writing 33 (2016) 3–17 17

Heath, S. B. (1983). Ways with words: language, life and work in communities and classrooms. New York: Cambridge University Press. Heath, S. B. (2012). Words at work and play: three decades in family and community life. New York: Cambridge University Press. Hunt, K. W. (1965). Grammatical structures written at three grade levels. NCTE Research Report No. 3. Hyland, K. (2005). Metadiscourse: exploring interaction in writing. New York: Bloomsbury Publishing. Hyland, K. (2007). Genre pedagogy: language, literacy and L2 writing instruction. Journal of Second Language Writing, 16,148–164. Hyland, K. (2009). Teaching and researching writing. New York: Routledge. Ishikawa, S. (1995). Objective measurement of low-proficiency EFL narrative writing. Journal of Second Language Writing, 4,51–69. Jarvis, S., Grant, L., Bikowski, D., & Ferris, D. (2003). Exploring multiple profiles of highly rated learner compositions. Journal of Second Language Writing, 12, 377–403. http://dx.doi.org/10.1016/j.jslw.2003.09.001. Johns, A. M. (2011). The future of genre in L2 writing: fundamental, but contested, instructional decisions. Journal of Second Language Writing, 20, 56–68. Kormos, J. (2011). Task complexity and linguistic and discourse features of narrative writing performance. Journal of Second Language Writing, 20,148–161. Leki, I., Cumming, A., & Silva, T. (2008). A synthesis of research on L2 writing in English. Mahwah, NJ: Lawrence Erlbaum. Li, T., & Wharton, S. (2012). Metadiscourse repertoire of L1 Mandarin undergraduates writing in English: a cross-contextual, cross-disciplinary study. Journal of English for Academic Purposes, 11, 345–356. Liardet, C. L. (2013). An exploration of Chinese EFL learner’s deployment of grammatical metaphor: learning to make academically valued meanings. Journal of Second Language Writing, 22,161–178. Liu, X. (2013). Evaluation in Chinese university EFL students’ English argumentative writing: an appraisal study. Electronic Journal of Foreign Language Teaching, 10, 40–53. Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15,474–496. MacWhinney, B. (2000). The CHILDES project: tools for analyzing talk: volume I: transcription format and programs, volume II: the database. Computational Linguistics, 26, 657. Marco, M. J. L. (2000). Collocational frameworks in medical research papers: a genre-based study. English for Specific Purposes, 19, 63–86. Martin, J. R. (1989). Factual writing: exploring and challenging social reality. Oxford: Oxford University Press. Matsuda, P. K., & De Pew, K. E. (2002). Early second language writing: an introduction. Journal of Second Language Writing, 11, 261–268. McKee, G., Malvern, D., & Richards, B. (2000). VOCD: Software for measuring vocabulary diversity through mathematical modeling. Version II for PC and Macintosh. Pittsburgh: Carnegie Mellon University, Child Language Data Exchange System (CHILDES) at http://childes.psy.cmu.edu. McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2009). Linguistic features of writing quality. Written Communication, 27,57–86. Miao, R., & Lei, X. (2008). Discourse features of argumentative essays written by chinese EFL students. ITL International Journal of Applied Linguistics, 156,179– 200. Ninio, A., & Snow, C. E. (1996). Pragmatic development. Boulder, Colo: Westview Press. Ochs, E. (1993). Constructing social identity: a language socialization perspective. Research on Language and Social Interaction, 26, 287–306. Olinghouse, N. G., & Leaird, J. T. (2009). The relationship between measures of vocabulary and narrative writing quality in second- and fourth-grade students. Reading and Writing, 22, 545–565. Ortmeier-Hooper, C., & Enright, K. A. (2011). Mapping new territory: toward an understanding of adolescent L2 writers and writing in US contexts. Journal of Second Language Writing, 20, 167–181. Paltridge, B. (2001). Genre and the language learning classroom. Ann Arbor: University of Michigan Press. Peterson, C., & McCabe, A. (1983). Developmental psycholinguistics: three ways of looking at a child’s narrative. New York: Plenum Press. Polio, C. G. (1997). Measures of linguistic accuracy in second language writing research. Language Learning, 47,101–143. Qin, J., & Karabacak, E. (2010). The analysis of Toulmin elements in Chinese EFL university argumentative writing. System, 38,444–456. Ravid, D. (2006). Semantic development in textual contexts during the school years: noun scale analyses. Journal of Child Language, 33, 791–821. Ravid, D., & Berman, R. A. (2010). Developing noun phrase complexity at school age: a text-embedded cross-linguistic analysis. First Language, 30,3–26. Ravid, D., & Tolchinsky, L. (2002). Developing linguistic literacy: a comprehensive model. Journal of Child Language, 29,417–447. Reed, W. M. (1992). The effects of computer-based writing tasks and mode of discourse on the performance and attitudes of writers of varying abilities. Computers in Human Behavior, 8,97–119. Reilly, J. S., Baruch, E., Jisa, H., & Berman, R. A. (2002). Propositional attitudes in written and spoken language. Written Language & Literacy, 5, 183–218. Schleppegrell, M. J. (2002). Linguistic features of the language of schooling. Linguistics and Education, 12, 431–459. Schleppegrell, M. J. (2004). The language of schooling: a functional linguistics perspective. New York: Routledge. Scott, C. M., & Windsor, J. (2000). General language performance measures in spoken and written narrative and expository discourse of school-age children with language learning disabilities. Journal of Speech, Language, and Hearing Research, 43, 324–339. Snow, C. E., & Uccelli, P. (2009). The challenge of academic language. The[187_TD$IF] Cambridge handbook of literacy. New York: Cambridge University Press112–133. Swales, J. M. (1981). Aspects of article introductions: language studies unit. Birmingham: University of Aston. Swales, J. (1990). Genre analysis: English in academic and research settings. New York: Cambridge University Press. Swales, J. M., Ahmad, U. K., Chang, Y.-Y., Chavez, D., Dressen, D. F., & Seymour, R. (1998). Consider this: the role of imperatives in scholarly writing. Applied Linguistics, 19, 97–121. Tardy, C. M. (2006). Researching first and second language genre learning: a comparative review and a look ahead. Journal of Second Language Writing,15, 79– 101. Tardy, C. M. (2011). The history and future of genre in second language writing. Journal of Second Language Writing, 20,1–5. Tarone, E., Dwyer, S., Gillette, S., & Icke, V. (1981). On the use of the passive in two astrophysics journal papers. The ESP Journal, 1, 123–140. Uccelli, P., Dobbs, C. L., & Scott, J. (2013). Mastering academic language organization and stance in the persuasive writing of high school students. Written Communication, 30, 36–62. Uccelli, P., Barr, C. D., Dobbs, C. L., Galloway, E. P., Meneses, A., & Sanchez, E. (2015). Core academic language skills: an expanded operational construct and a novel instrument to chart school-relevant language proficiency in preadolescent and adolescent learners. Applied Psycholinguistics, 36, 1077–1109. Wimmer, G., Köhler, R., Grotjahn, R., & Altmann, G. (1994). Towards a theory of word length distribution. Journal of Quantitative Linguistics, 1, 98–106. Zhao, C. G. (2013). Measuring authorial voice strength in L2 argumentative writing: the development and validation of an analytic rubric. Language Testing, 30, 201–230.

Wenjuan Qin is a doctoral candidate at the Harvard Graduate School of Education. Her current research focuses on writing development of learners using English as a Foreign Language (EFL). Beyond lexico-grammatical accuracy, she is particularly interested in how EFL writers learn to ﬂexibly and effectively deploy a variety of lexical, syntactic and discourse features that are attuned to different communicative contexts, genres and registers.

Paola Uccelli is associate professor at the Harvard Graduate School of Education. With a background in linguistics, she studies socio-cultural and individual differences in language and literacy development throughout the school years. Her research focuses on how different language skills (at the lexical, grammatical, and discourse levels) interact with each other to either promote or hinder advances in language expression and comprehension in monolingual and bilingual students. Uccelli’s current projects focus on describing individual trajectories of school-relevant language development; on the design and validation of a research instrument to assess school-relevant language skills in elementary and middle school students; and on understanding how monolingual and multilingual speakers and writers learn to use a variety of discourse structures ﬂexibly and effectively for diverse communicative and learning purposes.