Universal Dependencies for Learner English
Yevgeni Berzak Jessica Kenney Carolyn Spadine Jing Xian Wang CSAIL MIT EECS & Linguistics MIT Linguistics MIT EECS MIT [email protected] [email protected] [email protected] [email protected]
Lucia Lam Keiko Sophie Mori Sebastian Garza Boris Katz MECHE MIT Linguistics MIT Linguistics MIT CSAIL MIT [email protected] [email protected] [email protected] [email protected]
Abstract currently no publicly available syntactic treebank for English as a Second Language (ESL). We introduce the Treebank of Learner En- glish (TLE), the first publicly available To address this shortcoming, we present the syntactic treebank for English as a Sec- Treebank of Learner English (TLE), a first of ond Language (ESL). The TLE provides its kind resource for non-native English, contain- manually annotated POS tags and Univer- ing 5,124 sentences manually annotated with POS sal Dependency (UD) trees for 5,124 sen- tags and dependency trees. The TLE sentences are tences from the Cambridge First Certifi- drawn from the FCE dataset (Yannakoudakis et al., cate in English (FCE) corpus. The UD 2011), and authored by English learners from 10 annotations are tied to a pre-existing er- different native language backgrounds. The tree- ror annotation of the FCE, whereby full bank uses the Universal Dependencies (UD) for- syntactic analyses are provided for both malism (De Marneffe et al., 2014; Nivre et al., the original and error corrected versions of 2016), which provides a unified annotation frame- each sentence. Further on, we delineate work across different languages and is geared to- ESL annotation guidelines that allow for wards multilingual NLP (McDonald et al., 2013). consistent syntactic treatment of ungram- This characteristic allows our treebank to sup- matical English. Finally, we benchmark port computational analysis of ESL using not only POS tagging and dependency parsing per- English based but also multilingual approaches formance on the TLE dataset and measure which seek to relate ESL phenomena to native lan- the effect of grammatical errors on parsing guage syntax. accuracy. We envision the treebank to sup- While the annotation inventory and guidelines port a wide range of linguistic and compu- are defined by the English UD formalism, we tational research on second language ac- build on previous work in learner language anal- quisition as well as automatic processing ysis (Dıaz-Negrillo et al., 2010; Dickinson and of ungrammatical language1. Ragheb, 2013) to formulate an additional set of annotation conventions aiming at a uniform treat- 1 Introduction ment of ungrammatical learner language. Our The majority of the English text available world- annotation scheme uses a two-layer analysis, wide is generated by non-native speakers (Crys- whereby a distinct syntactic annotation is pro- tal, 2003). Such texts introduce a variety of chal- vided for the original and the corrected version lenges, most notably grammatical errors, and are of each sentence. This approach is enabled by a of paramount importance for the scientific study pre-existing error annotation of the FCE (Nicholls, of language acquisition as well as for NLP. De- 2003) which is used to generate an error corrected spite the ubiquity of non-native English, there is variant of the dataset. Our inter-annotator agree- ment results provide evidence for the ability of the 1The treebank is available at universaldependencies.org. The annotation manual used in this project and a graphical annotation scheme to support consistent annota- query engine are available at esltreebank.org. tion of ungrammatical structures.
737 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 737–746, Berlin, Germany, August 7-12, 2016. c 2016 Association for Computational Linguistics Finally, a corpus that is annotated with both sentences were split further manually. Word level grammatical errors and syntactic dependencies tokenization was generated using the Stanford paves the way for empirical investigation of the PTB word tokenizer3. relation between grammaticality and syntax. Un- The treebank represents learners with 10 dif- derstanding this relation is vital for improving tag- ferent native language backgrounds: Chinese, ging and parsing performance on learner language French, German, Italian, Japanese, Korean, Por- (Geertzen et al., 2013), syntax based grammati- tuguese, Spanish, Russian and Turkish. For every cal error correction (Tetreault et al., 2010; Ng et native language, we randomly sampled 500 au- al., 2014), and many other fundamental challenges tomatically segmented sentences, under the con- in NLP. In this work, we take the first step in straint that selected sentences have to contain at this direction by benchmarking tagging and pars- least one grammatical error that is not punctuation ing accuracy on our dataset under different train- or spelling. ing regimes, and obtaining several estimates for The TLE annotations are provided in two ver- the impact of grammatical errors on these tasks. sions. The first version is the original sentence au- To summarize, this paper presents three contri- thored by the learner, containing grammatical er- butions. First, we introduce the first large scale rors. The second, corrected sentence version, is a syntactic treebank for ESL, manually annotated grammatical variant of the original sentence, gen- with POS tags and universal dependencies. Sec- erated by correcting all the grammatical errors in ond, we describe a linguistically motivated anno- the sentence according to the manual error anno- tation scheme for ungrammatical learner English tation provided in the FCE dataset. The resulting and provide empirical support for its consistency corrected sentences constitute a parallel corpus of via inter-annotator agreement analysis. Third, we standard English. Table 1 presents basic statistics benchmark a state of the art parser on our dataset of both versions of the annotated sentences. and estimate the influence of grammatical errors original corrected on the accuracy of automatic POS tagging and de- sentences 5,124 5,124 pendency parsing. tokens 97,681 98,976 The remainder of this paper is structured as fol- sentence length 19.06 (std 9.47) 19.32 (std 9.59) lows. We start by presenting an overview of the errors per sentence 2.67 (std 1.9) - treebank in section 2. In sections 3 and 4 we authors 924 native languages 10 provide background information on the annota- tion project, and review the main annotation stages Table 1: Statistics of the TLE. Standard deviations leading to the current form of the dataset. The ESL are denoted in parenthesis. annotation guidelines are summarized in section 5. Inter-annotator agreement analysis is presented in To avoid potential annotation biases, the anno- section 6, followed by parsing experiments in sec- tations of the treebank were created manually from tion 7. Finally, we review related work in section scratch, without utilizing any automatic annota- 8 and present the conclusion in section 9. tion tools. To further assure annotation quality, each annotated sentence was reviewed by two ad- 2 Treebank Overview ditional annotators. To the best of our knowledge, The TLE currently contains 5,124 sentences TLE is the first large scale English treebank con- (97,681 tokens) with POS tag and dependency an- structed in a completely manual fashion. notations in the English Universal Dependencies (UD) formalism (De Marneffe et al., 2014; Nivre 3 Annotator Training et al., 2016). The sentences were obtained from The treebank was annotated by six students, five the FCE corpus (Yannakoudakis et al., 2011), a undergraduates and one graduate. Among the un- collection of upper intermediate English learner dergraduates, three are linguistics majors and two essays, containing error annotations with 75 error are engineering majors with a linguistic minor. categories (Nicholls, 2003). Sentence level seg- The graduate student is a linguist specializing in mentation was performed using an adaptation of syntax. An additional graduate student in NLP the NLTK sentence tokenizer2. Under-segmented participated in the final debugging of the dataset.
2http://www.nltk.org/api/nltk.tokenize.html 3http://nlp.stanford.edu/software/tokenizer.shtml
738 Prior to annotating the treebank sentences, the (IND) and the second the word itself (WORD). annotators were trained for about 8 weeks. Dur- The remaining four columns had to be filled in ing the training, the annotators attended tutorials with a Universal POS tag (UPOS), a Penn Tree- on dependency grammars, and learned the English bank POS tag (POS), a head word index (HIND) UD guidelines4, the Penn Treebank POS guide- and a dependency relation (REL) according to ver- lines (Santorini, 1990), the grammatical error an- sion 1 of the English UD guidelines. notation scheme of the FCE (Nicholls, 2003), as The annotation section of the sentence is pre- well as the ESL guidelines described in section 5 ceded by a metadata header. The first field in this and in the annotation manual. header, denoted with SENT, contains the FCE er- Furthermore, the annotators completed six an- ror coded version of the sentence. The annotators notation exercises, in which they were required to were instructed to verify the error annotation, and annotate POS tags and dependencies for practice add new error annotations if needed. Corrections sentences from scratch. The exercises were done to the sentence segmentation are specified in the individually, and were followed by group meet- SEGMENT field5. Further down, the field TYPO ings in which annotation disagreements were dis- is designated for literal annotation of spelling er- cussed and resolved. Each of the first three exer- rors and ill formed words that happen to form valid cises consisted of 20 sentences from the UD gold words (see section 5.2). standard for English, the English Web Treebank The example below presents a pre-annotated (EWT) (Silveira et al., 2014). The remaining three original sentence given to an annotator. exercises contained 20-30 ESL sentences from the FCE. Many of the ESL guidelines were introduced #SENT=That time I had to sleep in
739 4.2 Review 5 Annotation Scheme for ESL
All annotated sentences were randomly assigned Our annotations use the existing inventory of En- to a second annotator (henceforth reviewer), in a glish UD POS tags and dependency relations, and double blind manner. The reviewer’s task was to follow the standard UD annotation guidelines for mark all the annotations that they would have an- English. However, these guidelines were for- notated differently. To assist the review process, mulated with grammatical usage of English in we compiled a list of common annotation errors, mind and do not cover non canonical syntactic available in the released annotation manual. structures arising due to grammatical errors6. To The annotations were reviewed using an active encourage consistent and linguistically motivated editing scheme in which an explicit action was re- annotation of such structures, we formulated a quired for all the existing annotations. The scheme complementary set of ESL annotation guidelines. was introduced to prevent reviewers from over- Our ESL annotation guidelines follow the gen- looking annotation issues due to passive approval. eral principle of literal reading, which emphasizes Specifically, an additional # sign was added at the syntactic analysis according to the observed lan- seventh column of each token’s annotation. The guage usage. This strategy continues a line of reviewer then had to either “sign off” on the exist- work in SLA which advocates for centering analy- ing annotation by erasing the # sign, or provide an sis of learner language around morpho-syntactic alternative annotation following the # sign. surface evidence (Ragheb and Dickinson, 2012; Dickinson and Ragheb, 2013). Similarly to our framework, which includes a parallel annotation 4.3 Disagreement Resolution of corrected sentences, such strategies are often presented in the context of multi-layer annota- In the final stage of the annotation process all tion schemes that also account for error corrected annotator-reviewer disagreements were resolved sentence forms (Hirschmann et al., 2007; Dıaz- by a third annotator (henceforth judge), whose Negrillo et al., 2010; Rosen et al., 2014). main task was to decide in favor of the annotator Deploying a strategy of literal annotation within or the reviewer. Similarly to the review process, UD, a formalism which enforces cross-linguistic the judging task was carried out in a double blind consistency of annotations, will enable meaning- manner. Judges were allowed to resolve annotator- ful comparisons between non-canonical structures reviewer disagreements with a third alternative, as in English and canonical structures in the author’s well as introduce new corrections for annotation native language. As a result, a key novel character- issues overlooked by the reviewers. istic of our treebank is its ability to support cross- Another task performed by the judges was to lingual studies of learner language. mark acceptable alternative annotations for am- biguous structures determined through review dis- 5.1 Literal Annotation agreements or otherwise present in the sentence. With respect to POS tagging, literal annotation im- These annotations were specified in an additional plies adhering as much as possible to the observed metadata field called AMBIGUITY. The ambigu- morphological forms of the words. Syntactically, ity markings are provided along with the resolved argument structure is annotated according to the version of the annotations. usage of the word rather than its typical distribu- tion in the relevant context. The following list of 4.4 Final Debugging conventions defines the notion of literal reading for some of the common non canonical structures After applying the resolutions produced by the associated with grammatical errors. judges, we queried the corpus with debugging tests for specific linguistics constructions. This Argument Structure additional testing phase further reduced the num- Extraneous prepositions We annotate all nominal ber of annotation errors and inconsistencies in the dependents introduced by extraneous prepositions treebank. Including the training period, the tree- 6The English UD guidelines do address several issues en- bank creation lasted over a year, with an aggregate countered in informal genres, such as the relation “goeswith”, of more than 2,000 annotation hours. which is used for fragmented words resulting from typos.
740 as nominal modifiers. In the following sentence, Similarly, in the example below we annotate “him” is marked as a nominal modifier (nmod) in- “necessaryiest” as a superlative. stead of an indirect object (iobj) of “give”. #SENT=The necessaryiest things... #SENT=...I had to give
Tense #SENT=...we
741 ... Annotator-Reviewer UPOS POS HIND REL 6 interestings ADJ JJ 7 amod original 98.83 98.35 97.74 96.98 7 things NOUN NNS 3 dobj ... corrected 99.02 98.61 97.97 97.20 Reviewer-Judge Wrong word formations that result in a valid, original 99.72 99.68 99.37 99.15 but contextually implausible word form are also corrected 99.80 99.77 99.45 99.28 annotated according to the word correction. In the example below, the nominal form “sale” is Table 2: Inter-annotator agreement on the entire likely to be an unintended result of an ill formed TLE corpus. Agreement is measured as the frac- verb. Similarly to spelling errors that result in tion of tokens that remain unchanged after an edit- valid words, we mark the typical literal POS an- ing round. The four evaluation columns corre- notation in the TYPO metadata field. spond to universal POS tags, PTB POS tags, un- labeled attachment, and dependency labels. Co- #SENT=...they do not
742 Test set Train Set UPOS POS UAS LA LAS Tokens Train Set UPOS POS UAS LA LAS TLEorig EWT 91.87 94.28 86.51 88.07 81.44 Ungrammatical EWT 87.97 88.61 82.66 82.66 74.93 Grammatical EWT 92.62 95.37 87.26 89.11 82.7 TLEcorr EWT 92.9 95.17 88.37 89.74 83.8 Ungrammatical TLEorig 90.76 88.68 83.81 83.31 77.22 TLEorig TLEorig 95.88 94.94 87.71 89.26 83.4 Grammatical TLEorig 96.86 96.14 88.46 90.41 84.59 TLEcorr TLEcorr 96.92 95.17 89.69 90.92 85.64 Ungrammatical EWT+TLEorig 89.76 90.97 86.32 85.96 80.37 TLEorig EWT+TLEorig 93.33 95.77 90.3 91.09 86.27 Grammatical EWT+TLEorig 94.02 96.7 91.07 92.08 87.41 TLEcorr EWT+TLEcorr 94.27 96.48 92.15 92.54 88.3 Table 4: Tagging and parsing results on the origi- Table 3: Tagging and parsing results on a test set of nal version of the TLE test set for tokens marked 500 sentences from the TLE corpus. EWT is the with grammatical errors (Ungrammatical) and to- English UD treebank. TLEorig are original sen- kens not marked for errors (Grammatical). tences from the TLE. TLEcorr are the correspond- ing error corrected sentences. the global measurements in the first experiment, this analysis, which focuses on the local impact rics. An additional factor which negatively af- of remove/replace errors, suggests a stronger ef- fects performance in this regime are systematic fect of grammatical errors on the dependency la- differences in the EWT annotation of possessive bels than on the dependency structure. pronouns, expletives and names compared to the Finally, we measure tagging and parsing perfor- UD guidelines, which are utilized in the TLE. In mance relative to the fraction of sentence tokens particular, the EWT annotates possessive pronoun marked with grammatical errors. Similarly to the UPOS as PRON rather than DET, which leads the previous experiment, this analysis focuses on re- UPOS results in this setup to be lower than the move/replace rather than insert errors. PTB POS results. Improved results are obtained using the TLE training data, which, despite its 100 smaller size, is closer in genre and syntactic char- 98 acteristics to the TLE test set. The strongest PTB 96 POS tagging and parsing results are obtained by 94 combining the EWT with the TLE training data, 92 yielding 95.77 POS accuracy and a UAS of 90.3 90 on the original version of the TLE test set. 88 86
The dual annotation of sentences in their orig- 84 Mean Per Sentence Score Sentence Per Mean inal and error corrected forms enables estimating 82 POS original POS corrected 80 the impact of grammatical errors on tagging and UAS original UAS corrected parsing by examining the performance gaps be- 78 LAS original LAS corrected 76 tween the two sentence versions. Averaged across 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 (362) (1033) (1050) (955) (613) (372) (214) (175) the three training conditions, the POS tagging ac- % of Original Sentence Tokens Marked as Grammatical Errors curacy on the original sentences is lower than the accuracy on the sentence corrections by 1.0 UPOS Figure 1: Mean per sentence POS accuracy, UAS and 0.61 POS. Parsing performance degrades by and LAS of the Turbo tagger and Turbo parser, as 1.9 UAS, 1.59 LA and 2.21 LAS. a function of the percentage of original sentence To further elucidate the influence of grammati- tokens marked with grammatical errors. The tag- cal errors on parsing quality, table 4 compares per- ger and the parser are trained on the EWT cor- formance on tokens in the original sentences ap- pus, and tested on all 5,124 sentences of the TLE. pearing inside grammatical error tags to those ap- Points connected by continuous lines denote per- pearing outside such tags. Although grammatical formance on the original TLE sentences. Points errors may lead to tagging and parsing errors with connected by dashed lines denote performance on respect to any element in the sentence, we expect the corresponding error corrected sentences. The erroneous tokens to be more challenging to ana- number of sentences whose errors fall within each lyze compared to grammatical tokens. percentage range appears in parenthesis. This comparison indeed reveals a substantial difference between the two types of tokens, with Figure 1 presents the average sentential perfor- an average gap of 5.0 UPOS, 6.65 POS, 4.67 UAS, mance as a function of the percentage of tokens 6.56 LA and 7.39 LAS. Note that differently from in the original sentence marked with grammati-
743 cal errors. In this experiment, we train the parser proach, called “morphosyntactic dependencies” is on the EWT training set and test on the entire related to our annotation scheme in its focus on TLE corpus. Performance curves are presented surface structures. Differently from this proposal, for POS, UAS and LAS on the original and error our annotations are grounded in a parallel anno- corrected versions of the annotations. We observe tation of grammatical errors and include an ad- that while the performance on the corrected sen- ditional layer of analysis for the corrected forms. tences is close to constant, original sentence per- Moreover, we refrain from introducing new syn- formance is decreasing as the percentage of the er- tactic categories and dependency relations specific roneous tokens in the sentence grows. to ESL, thereby supporting computational treat- Overall, our results suggest a negative, albeit ment of ESL using existing resources for standard limited effect of grammatical errors on parsing. English. At the same time, we utilize a multilin- This outcome contrasts a study by Geertzen et al. gual formalism which, in conjunction with our lit- (2013) which reported a larger performance gap of eral annotation strategy, facilitates linking the an- 7.6 UAS and 8.8 LAS between sentences with and notations to native language syntax. without grammatical errors. We believe that our While the above mentioned studies focus on an- analysis provides a more accurate estimate of this notation guidelines, attention has also been drawn impact, as it controls for both sentence content and to the topic of parsing in the learner language do- sentence length. The latter factor is crucial, since main. However, due to the shortage of syntactic it correlates positively with the number of gram- resources for ESL, much of the work in this area matical errors in the sentence, and negatively with resorted to using surrogates for learner data. For parsing accuracy. example, in Foster (2007) and Foster et al. (2008) parsing experiments are carried out on synthetic 8 Related Work learner-like data, that was created by automatic in- Previous studies on learner language proposed sertion of grammatical errors to well formed En- several annotation schemes for both POS tags and glish text. In Cahill et al. (2014) a treebank of sec- syntax (Hirschmann et al., 2007; Dıaz-Negrillo et ondary level native students texts was used to ap- al., 2010; Dickinson and Ragheb, 2013; Rosen et proximate learner text in order to evaluate a parser al., 2014). The unifying theme in these proposals that utilizes unlabeled learner data. is a multi-layered analysis aiming to decouple the Syntactic annotations for ESL were previously observed language usage from conventional struc- developed by Nagata et al. (2011), who annotate tures in the foreign language. an English learner corpus with POS tags and shal- In the context of ESL, Dıaz et al. (2010) pro- low syntactic parses. Our work departs from shal- pose three parallel POS tag annotations for the low syntax to full syntactic analysis, and provides lexical, morphological and distributional forms of annotations on a significantly larger scale. Fur- each word. In our work, we adopt the distinc- thermore, differently from this annotation effort, tion between morphological word forms, which our treebank covers a wide range of learner na- roughly correspond to our literal word readings, tive languages. An additional syntactic dataset for and distributional forms as the error corrected ESL, currently not available publicly, are 1,000 words. However, we account for morphological sentences from the EFCamDat dataset (Geertzen forms only when these constitute valid existing et al., 2013), annotated with Stanford dependen- PTB POS tags and are contextually plausible. Fur- cies (De Marneffe and Manning, 2008). This thermore, while the internal structure of invalid dataset was used to measure the impact of gram- word forms is an interesting object of investiga- matical errors on parsing by comparing perfor- tion, we believe that it is more suitable for anno- mance on sentences with grammatical errors to er- tation as word features rather than POS tags. Our ror free sentences. The TLE enables a more direct treebank supports the addition of such features to way of estimating the magnitude of this perfor- the existing annotations. mance gap by comparing performance on the same The work of Ragheb and Dickinson (2009; sentences in their original and error corrected ver- 2012; 2013) proposes ESL annotation guidelines sions. Our comparison suggests that the effect of for POS tags and syntactic dependencies based on grammatical errors on parsing is smaller that the the CHILDES annotation framework. This ap- one reported in this study.
744 9 Conclusion Markus Dickinson and Marwa Ragheb. 2009. Depen- dency annotation for learner corpora. In Proceed- We present the first large scale treebank of ings of the Eighth Workshop on Treebanks and Lin- learner language, manually annotated and double- guistic Theories (TLT-8), pages 59–70. reviewed for POS tags and universal dependen- Markus Dickinson and Marwa Ragheb. 2013. Annota- cies. The annotation is accompanied by a linguis- tion for learner English guidelines, v. 0.1. Technical tically motivated framework for handling syntactic report, Indiana University, Bloomington, IN, June. structures associated with grammatical errors. Fi- June 9, 2013. nally, we benchmark automatic tagging and pars- ing on our corpus, and measure the effect of gram- Jennifer Foster, Joachim Wagner, and Josef Van Gen- abith. 2008. Adapting a wsj-trained parser to gram- matical errors on tagging and parsing quality. The matically noisy text. In Proceedings of the 46th treebank will support empirical study of learner Annual Meeting of the Association for Computa- syntax in NLP, corpus linguistics and second lan- tional Linguistics on Human Language Technolo- guage acquisition. gies: Short Papers, pages 221–224. Association for Computational Linguistics.
10 Acknowledgements Jennifer Foster. 2007. Treebanks gone bad. Interna- We thank Anna Korhonen for helpful discussions tional Journal of Document Analysis and Recogni- tion (IJDAR), 10(3-4):129–145. and insightful comments on this paper. We also thank Dora Alexopoulou, Andrei Barbu, Markus Jeroen Geertzen, Theodora Alexopoulou, and Anna Dickinson, Sue Felshin, Jeroen Geertzen, Yan Korhonen. 2013. Automatic linguistic annotation Huang, Detmar Meurers, Sampo Pyysalo, Roi Re- of large scale l2 databases: The ef-cambridge open ichart and the anonymous reviewers for valuable language database (efcamdat). In Proceedings of the 31st Second Language Research Forum. Somerville, feedback on this work. This material is based MA: Cascadilla Proceedings Project. upon work supported by the Center for Brains, Minds, and Machines (CBMM), funded by NSF Hagen Hirschmann, Seanna Doolittle, and Anke STC award CCF-1231216. Ludeling.¨ 2007. Syntactic annotation of non- canonical linguistic structures.
Andre´ FT Martins, Miguel Almeida, and Noah A References Smith. 2013. Turning on the turbo: Fast third-order Aoife Cahill, Binod Gyawali, and James V Bruno. non-projective turbo parsers. In ACL (2), pages 2014. Self-training for parsing learner text. In 617–622. Citeseer. Proceedings of the First Joint Workshop on Statisti- cal Parsing of Morphologically Rich Languages and Ryan T McDonald, Joakim Nivre, Yvonne Quirmbach- Syntactic Analysis of Non-Canonical Languages, Brundage, Yoav Goldberg, Dipanjan Das, Kuzman pages 66–73. Ganchev, Keith B Hall, Slav Petrov, Hao Zhang, Oscar Tackstr¨ om,¨ et al. 2013. Universal depen- Jacob Cohen. 1960. A Coefficient of Agreement for dency annotation for multilingual parsing. In ACL Nominal Scales. Educational and Psychological (2), pages 92–97. Citeseer. Measurement, 20(1):37. Ryo Nagata, Edward Whittaker, and Vera Shein- David Crystal. 2003. English as a global language. man. 2011. Creating a manually error-tagged Ernst Klett Sprachen. and shallow-parsed learner corpus. In Proceed- ings of the 49th Annual Meeting of the Association Marie-Catherine De Marneffe and Christopher D Man- for Computational Linguistics: Human Language ning. 2008. Stanford typed dependencies manual. Technologies-Volume 1, pages 1210–1219. Associ- Technical report, Technical report, Stanford Univer- ation for Computational Linguistics. sity.
Marie-Catherine De Marneffe, Timothy Dozat, Na- Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian talia Silveira, Katri Haverinen, Filip Ginter, Joakim Hadiwinoto, Raymond Hendy Susanto, and Christo- Nivre, and Christopher D Manning. 2014. Univer- pher Bryant. 2014. The conll-2014 shared task sal stanford dependencies: A cross-linguistic typol- on grammatical error correction. In CoNLL Shared ogy. In Proceedings of LREC, pages 4585–4592. Task, pages 1–14.
Ana Dıaz-Negrillo, Detmar Meurers, Salvador Valera, Diane Nicholls. 2003. The cambridge learner corpus: and Holger Wunsch. 2010. Towards interlanguage Error coding and analysis for lexicography and elt. pos annotation for effective learner corpora in sla In Proceedings of the Corpus Linguistics 2003 con- and flt. Language Forum, 36(1–2):139–154. ference, pages 572–581.
745 Joakim Nivre, Marie-Catherine de Marneffe, Filip Gin- ter, Yoav Goldberg, Jan Hajic,ˇ Christopher Man- ning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, et al. 2016. Universal dependen- cies v1: A multilingual treebank collection. In Pro- ceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). Marwa Ragheb and Markus Dickinson. 2012. Defin- ing syntax for learner language annotation. In COL- ING (Posters), pages 965–974. Victoria Rosen´ and Koenraad De Smedt. 2010. Syn- tactic annotation of learner corpora. Systematisk, variert, men ikke tilfeldig, pages 120–132.
Alexandr Rosen, Jirka Hana, Barbora Stindlovˇ a,´ and Anna Feldman. 2014. Evaluating and automating the annotation of a learner corpus. Language Re- sources and Evaluation, 48(1):65–92.
Beatrice Santorini. 1990. Part-of-speech tagging guidelines for the penn treebank project (3rd revi- sion). Technical Reports (CIS).
Natalia Silveira, Timothy Dozat, Marie-Catherine de Marneffe, Samuel R Bowman, Miriam Connor, John Bauer, and Christopher D Manning. 2014. A gold standard dependency corpus for english. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC- 2014). Joel Tetreault, Jennifer Foster, and Martin Chodorow. 2010. Using parse features for preposition selection and error detection. In Proceedings of the acl 2010 conference short papers, pages 353–358. Associa- tion for Computational Linguistics. Helen Yannakoudakis, Ted Briscoe, and Ben Medlock. 2011. A new dataset and method for automatically grading ESOL texts. In ACL, pages 180–189.
746