Sta n z a : A Python Natural Language Processing Toolkit for Many Human Languages Peng Qi* Yuhao Zhang* Yuhui Zhang Jason Bolton Christopher D. Manning Stanford University Stanford, CA 94305 {pengqi, yuhaozhang, yuhuiz}@stanford.edu {jebolton, manning}@stanford.edu

Abstract Tokenization & Sentence Split Hello! Bonjour! 你好! Hallo! EN FR ZH DE TOKENIZE !안녕하세요! ¡Hola! Здравствуйте !ﻣرﺣﺑﺎ We introduce Sta n z a , an open-source Python Multi- Token Expansion AR KO ES RU

MWT こんにちは! Hallo! xin chào! नमकार! natural language processing toolkit support- JA NL VI HI ing 66 human languages. Compared to ex- Lemmatization Multilingual: 66 Languages LEMMA isting widely used toolkits, Sta n z a features RAW TEXT POS & Morphological Tagging a language-agnostic fully neural pipeline for POS

WORDS text analysis, including tokenization, multi- Dependency Native Python Objects TOKEN word token expansion, lemmatization, part-of- DEPPARSE LEMMA POS HEAD DEPREL ... Named Entity Recognition speech and morphological feature tagging, de- WORD NER pendency parsing, and named entity recogni- Fully Neural: Language-agnostic SENTENCE

tion. We have trained Sta n z a on a total of PROCESSORS DOCUMENT 112 datasets, including the Universal Depen- dencies and other multilingual cor- Figure 1: Overview of Sta n z a ’s neural NLP pipeline. pora, and show that the same neural architec- Sta n z a takes multilingual text as input, and produces ture generalizes well and achieves competitive annotations accessible as native Python objects. Be- performance on all languages tested. Addition- sides this neural pipeline, Sta n z a also features a ally, Sta n z a includes a native Python interface Python client interface to the Java CoreNLP software. to the widely used Java Stanford CoreNLP software, which further extends its function- ing downstream applications and insights obtained ality to cover other tasks such as coreference from them. Third, some tools assume input text has resolution and relation extraction. Source code, documentation, and pretrained models been tokenized or annotated with other tools, lack- for 66 languages are available at https:// ing the ability to process raw text within a unified stanfordnlp.github.io/stanza/. framework. This has limited their wide applicabil- ity to text from diverse sources. 1 Introduction We introduce Sta n z a 2, a Python natural language The growing availability of open-source natural lan- processing toolkit supporting many human lan- guage processing (NLP) toolkits has made it easier guages. As shown in Table1, compared to existing for users to build tools with sophisticated linguistic widely-used NLP toolkits, Sta n z a has the following processing. While existing NLP toolkits such as advantages: CoreNLP (Manning et al., 2014), FLAIR (Akbik • From raw text to annotations. Sta n z a fea- et al., 2019), spaCy1, and UDPipe (Straka, 2018) tures a fully neural pipeline which takes raw have had wide usage, they also suffer from several text as input, and produces annotations includ- limitations. First, existing toolkits often support ing tokenization, multi-word token expansion, only a few major languages. This has significantly lemmatization, part-of-speech and morpholog- limited the community’s ability to process multilin- ical feature tagging, dependency parsing, and gual text. Second, widely used tools are sometimes named entity recognition. under-optimized for accuracy either due to a focus on efficiency (e.g., spaCy) or use of less power- • Multilinguality. Sta n z a ’s architectural de- ful models (e.g., CoreNLP), potentially mislead- sign is language-agnostic and data-driven, which allows us to release models support- ∗Equal contribution. Order decided by a tossed coin. 1https://spacy.io/ 2The toolkit was called StanfordNLP prior to v1.0.0.

101 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 101–108 July 5 - July 10, 2020. c 2020 Association for Computational Linguistics System # Human Programming Raw Text Fully Pretrained State-of-the-art Languages Language Processing Neural Models Performance

CoreNLP 6 Java !! FLAIR 12 Python !!! spaCy 10 Python !! UDPipe 61 C++ !!!

Sta n z a 66 Python !!!!

Table 1: Feature comparisons of Sta n z a against other popular natural language processing toolkits.

ing 66 languages, by training the pipeline on (fr) L’Association des Hôtels (en) The Association of Hotels the Universal Dependencies (UD) treebanks (fr) Il y a des hôtels en bas de la rue and other multilingual corpora. (en) There are hotels down the street

• State-of-the-art performance. We evaluate Figure 2: An example of multi-word tokens in French. Sta n z a on a total of 112 datasets, and find its The des in the first sentence corresponds to two syntac- neural pipeline adapts well to text of different tic , de and les; the second des is a single word. genres, achieving state-of-the-art or competi- tive performance at each step of the pipeline. Tokenization and Sentence Splitting. When presented raw text, Sta n z a tokenizes it and groups Additionally, Sta n z a features a Python interface tokens into sentences as the first step of processing. to the widely used Java CoreNLP package, allow- Unlike most existing toolkits, Sta n z a combines tok- ing access to additional tools such as coreference enization and sentence segmentation from raw text resolution and relation extraction. into a single module. This is modeled as a tagging Sta n z a is fully open source and we make pre- problem over character sequences, where the model trained models for all supported languages and predicts whether a given character is the end of a datasets available for public download. We hope Sta token, end of a sentence, or end of a multi-word n z a can facilitate multilingual NLP research and ap- token (MWT, see Figure2). 3 We choose to predict plications, and drive future research that produces MWTs jointly with tokenization because this task insights from human languages. is context-sensitive in some languages.

2 System Design and Architecture Multi-word Token Expansion. Once MWTs are identified by the tokenizer, they are expanded At the top level, Sta n z a consists of two individual into the underlying syntactic words as the basis components: (1) a fully neural multilingual NLP of downstream processing. This is achieved with pipeline; (2) a Python client interface to the Java an ensemble of a frequency lexicon and a neural Stanford CoreNLP software. In this section we sequence-to-sequence (seq2seq) model, to ensure introduce their designs. that frequently observed expansions in the training set are always robustly expanded while maintaining 2.1 Neural Multilingual NLP Pipeline flexibility to model unseen words statistically. Sta n z a ’s neural pipeline consists of models that range from tokenizing raw text to performing syn- POS and Morphological Feature Tagging. For tactic analysis on entire sentences (see Figure1). each word in a sentence, Sta n z a assigns it a part- All components are designed with processing many of-speech (POS), and analyzes its universal mor- phological features (UFeats, e.g., singular/plural, human languages in mind, with high-level design st nd rd choices capturing common phenomena in many 1 /2 /3 person, etc.). To predict POS and UFeats, languages and data-driven models that learn the dif- we adopt a bidirectional long short-term mem- ference between these languages from data. More- ory network (Bi-LSTM) as the basic architecture. For consistency among universal POS (UPOS), over, the implementation of Sta n z a components is highly modular, and reuses basic model architec- 3Following Universal Dependencies (Nivre et al., 2020), tures when possible for compactness. We highlight we make a distinction between tokens (contiguous spans of characters in the input text) and syntactic words. These are the important design choices here, and refer the interchangeable aside from the cases of MWTs, where one reader to Qi et al.(2018) for modeling details. token can correspond to multiple words.

102 -specific POS (XPOS), and UFeats, we existing server interface in CoreNLP, and imple- adopt the biaffine scoring mechanism from Dozat ment a robust client as its Python interface. and Manning(2017) to condition XPOS and When the CoreNLP client is instantiated, Sta n z UFeats prediction on that of UPOS. a will automatically start the CoreNLP server as a local process. The client then communicates with Lemmatization. Sta n z a also lemmatizes each the server through its RESTful APIs, after which word in a sentence to recover its canonical form annotations are transmitted in Protocol Buffers, and (e.g., did→do). Similar to the multi-word token ex- converted back to native Python objects. Users can pander, Sta n z a ’s lemmatizer is implemented as an also specify JSON or XML as annotation format. ensemble of a dictionary-based lemmatizer and a To ensure robustness, while the client is being used, neural seq2seq lemmatizer. An additional classifier Sta n z a periodically checks the health of the server, is built on the encoder output of the seq2seq model, and restarts it if necessary. to predict shortcuts such as lowercasing and iden- tity copy for robustness on long input sequences 3 System Usage such as URLs. Sta n z a ’s user interface is designed to allow quick Dependency Parsing. Sta n z a parses each sen- out-of-the-box processing of multilingual text. To tence for its syntactic structure, where each word achieve this, Sta n z a supports automated model in the sentence is assigned a syntactic head that download via Python code and pipeline customiza- is either another word in the sentence, or in the tion with processors of choice. Annotation results case of the root word, an artificial root symbol. We can be accessed as native Python objects to allow implement a Bi-LSTM-based deep biaffine neural for flexible post-processing. dependency parser (Dozat and Manning, 2017). We further augment this model with two linguistically 3.1 Neural Pipeline Interface motivated features: one that predicts the lineariza- Sta n z a ’s neural NLP pipeline can be initialized tion order of two words in a given language, and with the Pipeline class, taking language name the other that predicts the typical distance in linear as an argument. By default, all processors will be order between them. We have previously shown loaded and run over the input text; however, users that these features significantly improve parsing can also specify the processors to load and run with accuracy (Qi et al., 2018). a list of processor names as an argument. Users can additionally specify other processor-level prop- Named Entity Recognition. For each input sen- erties, such as batch sizes used by processors, at tence, Sta n z a also recognizes named entities in it initialization time. (e.g., person names, organizations, etc.). For NER The following code snippet shows a minimal us- we adopt the contextualized string representation- age of Sta n z a for downloading the Chinese model, based sequence tagger from Akbik et al.(2018). annotating a sentence with customized processors, We first train a forward and a backward character- and printing out all annotations: level LSTM language model, and at tagging time we concatenate the representations at the end of import stanza # download Chinese model each word position from both language models stanza.download(’zh’) with word embeddings, and feed the result into a # initialize Chinese neural pipeline nlp = stanza.Pipeline(’zh’, processors=’tokenize, standard one-layer Bi-LSTM sequence tagger with pos,ner’) # run annotation overa sentence a conditional random field (CRF)-based decoder. doc = nlp(’斯坦福是一所私立研究型大学。’) print(doc) 2.2 CoreNLP Client After all processors are run, a Document in- Stanford’s Java CoreNLP software provides a com- stance will be returned, which stores all annotation prehensive set of NLP tools especially for the En- results. Within a Document, annotations are fur- glish language. However, these tools are not easily ther stored in Sentences, Tokens and Words accessible with Python, the programming language in a top-down fashion (Figure1). The following of choice for many NLP practitioners, due to the code snippet demonstrates how to access the text lack of official support. To facilitate the use of and POS tag of each word in a document and all CoreNLP from Python, we take advantage of the named entities in the document:

103 # print the text and POS of all words for sentence in doc.sentences: for word in sentence.words: print(word.text, word.pos)

# print all entities in the document print(doc.entities)

Sta n z a is designed to be run on different hard- ware devices. By default, CUDA devices will be used whenever they are visible by the pipeline, or otherwise CPUs will be used. However, users can force all computation to be run on CPUs by setting use_gpu=False at initialization time.

3.2 CoreNLP Client Interface The CoreNLP client interface is designed in a way that the actual communication with the backend Figure 3: Sta n z a annotates a German sentence, as vi- CoreNLP server is transparent to the user. To an- sualized by our interactive demo. Note am is expanded notate an input text with the CoreNLP client, a into syntactic words an and dem before downstream analyses are performed. CoreNLPClient instance needs to be initialized, with an optional list of CoreNLP annotators. After An example of running Sta n z a on a German sen- the annotation is complete, results will be accessi- tence can be found in Figure3. ble as native Python objects. This code snippet shows how to establish a 3.4 Training Pipeline Models CoreNLP client and obtain the NER and corefer- ence annotations of an English sentence: For all neural processors, Sta n z a provides command-line interfaces for users to train their from stanza.server import CoreNLPClient own customized models. To do this, users need

# starta CoreNLP client to prepare the training and development data in with CoreNLPClient(annotators=[’tokenize’,’ssplit compatible formats (i.e., CoNLL-U format for the ’,’pos’,’lemma’,’ner’,’parse’,’coref’]) as client: Universal Dependencies pipeline and BIO format # run annotation over input ann = client.annotate(’Emily said that she column files for the NER model). The following liked the movie.’) command trains a neural dependency parser with # access all entities for sent in ann.sentence: user-specified training and development data: print(sent.mentions) # access coreference annotations print(ann.corefChain) $ python -m stanza.models.parser \ --train_file train.conllu \ --eval_file dev.conllu \ With the client interface, users can annotate text --gold_file dev.conllu \ in 6 languages as supported by CoreNLP. --output_file output.conllu

3.3 Interactive Web-based Demo 4 Performance Evaluation To help visualize documents and their annotations generated by Sta n z a , we build an interactive web To establish benchmark results and compare with demo that runs the pipeline interactively. For all other popular toolkits, we trained and evaluated languages and all annotations Sta n z a provides in Sta n z a on a total of 112 datasets. All pretrained those languages, we generate predictions from the models are publicly downloadable. models trained on the largest treebank/NER dataset, Datasets. We train and evaluate Sta n z a ’s tokeniz- and visualize the result with the Brat rapid annota- er/sentence splitter, MWT expander, POS/UFeats tion tool.4 This demo runs in a client/server archi- tagger, lemmatizer, and dependency parser with tecture, and annotation is performed on the server the Universal Dependencies v2.5 treebanks (Ze- side. We make one instance of this demo publicly man et al., 2019). For training we use 100 tree- available at http://stanza.run/. It can also be banks from this release that have non-copyrighted run locally with proper Python libraries installed. training data, and for treebanks that do not include 4https://brat.nlplab.org/ development data, we randomly split out 20% of

104 Treebank System Tokens Sents. Words UPOS XPOS UFeats Lemmas UAS LAS

Overall (100 treebanks) Sta n z a 99.09 86.05 98.63 92.49 91.80 89.93 92.78 80.45 75.68 Sta n z a 99.98 80.43 97.88 94.89 91.75 91.86 93.27 83.27 79.33 Arabic-PADT UDPipe 99.98 82.09 94.58 90.36 84.00 84.16 88.46 72.67 68.14 Sta n z a 92.83 98.80 92.83 89.12 88.93 92.11 92.83 72.88 69.82 Chinese-GSD UDPipe 90.27 99.10 90.27 84.13 84.04 89.05 90.26 61.60 57.81 Sta n z a 99.01 81.13 99.01 95.40 95.12 96.11 97.21 86.22 83.59 English-EWT UDPipe 98.90 77.40 98.90 93.26 92.75 94.23 95.45 80.22 77.03 spaCy 97.30 61.19 97.30 86.72 90.83 – 87.05 – – Sta n z a 99.68 94.92 99.48 97.30 – 96.72 97.64 91.38 89.05 French-GSD UDPipe 99.68 93.59 98.81 95.85 – 95.55 96.61 87.14 84.26 spaCy 98.34 77.30 94.15 86.82 – – 87.29 67.46 60.60 Sta n z a 99.98 99.07 99.98 98.78 98.67 98.59 99.19 92.21 90.01 Spanish-AnCora UDPipe 99.97 98.32 99.95 98.32 98.13 98.13 98.48 88.22 85.10 spaCy 99.47 97.59 98.95 94.04 – – 79.63 86.63 84.13

Table 2: Neural pipeline performance comparisons on the Universal Dependencies (v2.5) test treebanks. For our system we show macro-averaged results over all 100 treebanks. We also compare our system against UDPipe and spaCy on treebanks of five major languages where the corresponding pretrained models are publicly available. All results are F1 scores produced by the 2018 UD Shared Task official evaluation script. the training data as development data. These tree- nese Gigaword corpora5, respectively. We again banks represent 66 languages, mostly European applied the same hyper-parameters to models for languages, but spanning a diversity of language all languages. families, including Indo-European, Afro-Asiatic, Uralic, Turkic, Sino-Tibetan, etc. For NER, we Universal Dependencies Results. For perfor- train and evaluate Sta n z a with 12 publicly avail- mance on UD treebanks, we compared Sta n z a able datasets covering 8 major languages as shown (v1.0) against UDPipe (v1.2) and spaCy (v2.2) on in Table3(Nothman et al., 2013; Tjong Kim Sang treebanks of 5 major languages whenever a pre- and De Meulder, 2003; Tjong Kim Sang, 2002; trained model is available. As shown in Table2, St Benikova et al., 2014; Mohit et al., 2012; Taulé a n z a achieved the best performance on most scores et al., 2008; Weischedel et al., 2013). For the reported. Notably, we find that Sta n z a ’s language- WikiNER corpora, as canonical splits are not avail- agnostic architecture is able to adapt to datasets of able, we randomly split them into 70% training, different languages and genres. This is also shown 15% dev and 15% test splits. For all other corpora by Sta n z a ’s high macro-averaged scores over 100 we used their canonical splits. treebanks covering 66 languages. NER Results. For performance of the NER com- Training. On the Universal Dependencies tree- ponent, we compared Sta n z a (v1.0) against FLAIR banks, we tuned all hyper-parameters on several (v0.4.5) and spaCy (v2.2). For spaCy we reported large treebanks and applied them to all other tree- results from its publicly available pretrained model banks. We used the embeddings released whenever one trained on the same dataset can be as part of the 2018 UD Shared Task (Zeman et al., found, otherwise we retrained its model on our 2018), or the fastText embeddings (Bojanowski datasets with default hyper-parameters, follow- et al., 2017) whenever word2vec is not available. ing the publicly available tutorial.6 For FLAIR, For the character-level language models in the NER since their downloadable models were pretrained component, we pretrained them on a mix of the 5https://catalog.ldc.upenn.edu/ Common Crawl and Wikipedia dumps, and the LDC2011T13 news corpora released by the WMT19 Shared Task 6https://spacy.io/usage/training#ner (Barrault et al., 2019), except for English and Chi- Note that, following this public tutorial, we did not use pretrained word embeddings when training spaCy NER nese, for which we pretrained on the Google One models, although using pretrained word embeddings may Billion Word (Chelba et al., 2013) and the Chi- potentially improve the NER results.

105 Language Corpus # Types Sta n z a FLAIR spaCy Sta n z a UDPipe FLAIR Task CPUGPUCPUCPUGPU Arabic AQMAR 4 74.3 74.0 – UD 10.3× 3.22× 4.30× –– Chinese OntoNotes 18 79.2 –– NER 17.7× 1.08× – 51.8× 1.17× Dutch CoNLL02 4 89.2 90.3 73.8 WikiNER 4 94.8 94.8 90.9 Table 4: Annotation runtime of various toolkits rela- English CoNLL03 4 92.1 92.7 81.0 tive to spaCy (CPU) on the English EWT treebank and OntoNotes 18 88.8 89.0 85.4∗ OntoNotes NER test sets. For reference, on the com- French WikiNER 4 92.9 92.5 88.8∗ pared UD and NER tasks, spaCy is able to process 8140 and 5912 tokens per second, respectively. German CoNLL03 4 81.9 82.5 63.9 GermEval14 4 85.2 85.4 68.4 Russian WikiNER 4 92.9 –– For future work, we consider the following areas of improvement in the near term: Spanish CoNLL02 4 88.1 87.3 77.5 AnCora 4 88.6 88.4 76.1 • Models downloadable in Sta n z a are largely trained on a single dataset. To make mod- Table 3: NER performance across different languages els robust to many different genres of text, and corpora. All scores reported are entity micro- we would like to investigate the possibility of averaged test F1. For each corpus we also list the num- ber of entity types. ∗ marks results from publicly avail- pooling various sources of compatible data to able pretrained models on the same dataset, while oth- train “default” models for each language; ers are from models retrained on our datasets. • The amount of computation and resources available to us is limited. We would there- on dataset versions different from canonical ones, fore like to build an open “model zoo” for we retrained all models on our own dataset splits Sta n z a , so that researchers from outside our with their best reported hyper-parameters. All test group can also contribute their models and results are shown in Table3. We find that on all benefit from models released by others; datasets Sta n z a achieved either higher or close F1 scores when compared against FLAIR. When com- • Sta n z a was designed to optimize for accuracy pared to spaCy, Sta n z a ’s NER performance is much of its predictions, but this sometimes comes at better. It is worth noting that Sta n z a ’s high per- the cost of computational efficiency and lim- formance is achieved with much smaller models its the toolkit’s use. We would like to further compared with FLAIR (up to 75% smaller), as we investigate reducing model sizes and speed- intentionally compressed the models for memory ing up computation in the toolkit, while still efficiency and ease of distribution. maintaining the same level of accuracy. • We would also like to expand Sta n z a ’s func- Speed comparison. We compare Sta n z a against existing toolkits to evaluate the time it takes to an- tionality by adding other processors such as notate text (see Table4). For GPU tests we use a neural coreference resolution or relation ex- single NVIDIA Titan RTX card. Unsurprisingly, traction for richer text analytics. Sta n z a ’s extensive use of accurate neural models makes it take significantly longer than spaCy to Acknowledgments annotate text, but it is still competitive when com- The authors would like to thank the anonymous pared against toolkits of similar accuracy, espe- reviewers for their comments, Arun Chaganty for cially with the help of GPU acceleration. his early contribution to this toolkit, Tim Dozat for 5 Conclusion and Future Work his design of the original architectures of the tagger and parser models, Matthew Honnibal and Ines We introduced Sta n z a , a Python natural language Montani for their help with spaCy integration and processing toolkit supporting many human lan- helpful comments on the draft, Ranting Guo for the guages. We have showed that Sta n z a ’s neural logo design, and John Bauer and the community pipeline not only has wide coverage of human lan- contributors for their help with maintaining and guages, but also is accurate on all tasks, thanks improving this toolkit. This research is funded in to its language-agnostic, fully neural architectural part by Samsung Electronics Co., Ltd. and in part design. Simultaneously, Sta n z a ’s CoreNLP client by the SAIL-JD Research Initiative. extends its functionality with additional NLP tools.

106 References Daniel Zeman. 2020. Universal dependencies v2: An evergrowing multilingual treebank collection. In Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Proceedings of the Twelfth International Conference Rasul, Stefan Schweter, and Roland Vollgraf. 2019. on Language Resources and Evaluation (LREC’20). FLAIR: An easy-to-use framework for state-of-the- art NLP. In Proceedings of the 2019 Conference of Joel Nothman, Nicky Ringland, Will Radford, Tara the North American Chapter of the Association for Murphy, and James R Curran. 2013. Learning mul- Computational Linguistics (Demonstrations). Asso- tilingual named entity recognition from Wikipedia. ciation for Computational Linguistics. Artificial Intelligence, 194:151–175.

Alan Akbik, Duncan Blythe, and Roland Vollgraf. Peng Qi, Timothy Dozat, Yuhao Zhang, and Christo- 2018. Contextual string embeddings for sequence pher D. Manning. 2018. Universal dependency pars- labeling. In Proceedings of the 27th International ing from scratch. In Proceedings of the CoNLL 2018 Conference on Computational Linguistics. Associa- Shared Task: Multilingual Parsing from Raw Text to tion for Computational Linguistics. Universal Dependencies. Association for Computa- tional Linguistics. Loïc Barrault, Ondrejˇ Bojar, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Yvette Gra- Milan Straka. 2018. UDPipe 2.0 prototype at CoNLL ham, Barry Haddow, Matthias Huck, Philipp Koehn, 2018 UD shared task. In Proceedings of the CoNLL Shervin Malmasi, Christof Monz, Mathias Müller, 2018 Shared Task: Multilingual Parsing from Raw Santanu Pal, Matt Post, and Marcos Zampieri. 2019. Text to Universal Dependencies. Association for Findings of the 2019 conference on machine transla- Computational Linguistics. tion (WMT19). In Proceedings of the Fourth Con- ference on (Volume 2: Shared Mariona Taulé, M. Antònia Martí, and Marta Recasens. Task Papers, Day 1). Association for Computational 2008. AnCora: Multilevel annotated corpora for Linguistics. Catalan and Spanish. In Proceedings of the Sixth International Conference on Language Resources Darina Benikova, Chris Biemann, and Marc Reznicek. and Evaluation (LREC’08). European Language Re- 2014. NoSta-D named entity annotation for Ger- sources Association (ELRA). man: Guidelines and dataset. In Proceedings of the Ninth International Conference on Language Re- Erik F. Tjong Kim Sang. 2002. Introduction to the sources and Evaluation (LREC’14). CoNLL-2002 shared task: Language-independent named entity recognition. In COLING-02: The Piotr Bojanowski, Edouard Grave, Armand Joulin, and 6th Conference on Natural Language Learning 2002 Tomas Mikolov. 2017. Enriching word vectors with (CoNLL-2002). subword information. Transactions of the Associa- tion for Computational Linguistics, 5. Erik F. Tjong Kim Sang and Fien De Meulder. Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, 2003. Introduction to the CoNLL-2003 shared task: Thorsten Brants, Phillipp Koehn, and Tony Robin- Language-independent named entity recognition. In son. 2013. One billion word benchmark for measur- Proceedings of the Seventh Conference on Natural ing progress in statistical language modeling. Tech- Language Learning at HLT-NAACL 2003. nical report, Google. Ralph Weischedel, Martha Palmer, Mitchell Marcus, Timothy Dozat and Christopher D. Manning. 2017. Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Ni- Deep biaffine attention for neural dependency pars- anwen Xue, Ann Taylor, Jeff Kaufman, Michelle ing. In International Conference on Learning Rep- Franchini, et al. 2013. OntoNotes release 5.0. Lin- resentations (ICLR). guistic Data Consortium. Christopher D. Manning, Mihai Surdeanu, John Bauer, Daniel Zeman, Jan Hajic,ˇ Martin Popel, Martin Pot- Jenny Finkel, Steven J. Bethard, and David Mc- thast, Milan Straka, Filip Ginter, Joakim Nivre, and Closky. 2014. The Stanford CoreNLP natural lan- Slav Petrov. 2018. CoNLL 2018 shared task: Mul- guage processing toolkit. In Association for Compu- tilingual parsing from raw text to universal depen- tational Linguistics (ACL) System Demonstrations. dencies. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Univer- Behrang Mohit, Nathan Schneider, Rishav Bhowmick, sal Dependencies. Association for Computational Kemal Oflazer, and Noah A Smith. 2012. Recall- Linguistics. oriented learning of named entities in Arabic Wikipedia. In Proceedings of the 13th Conference of Daniel Zeman, Joakim Nivre, Mitchell Abrams, Noëmi the European Chapter of the Association for Compu- Aepli, Željko Agic,´ Lars Ahrenberg, Gabriele˙ Alek- tational Linguistics. Association for Computational sandraviciˇ ut¯ e,˙ Lene Antonsen, Katya Aplonova, Linguistics. Maria Jesus Aranzabe, Gashaw Arutie, Masayuki Asahara, Luma Ateyah, Mohammed Attia, Aitz- Joakim Nivre, Marie-Catherine de Marneffe, Filip Gin- iber Atutxa, Liesbeth Augustinus, Elena Badmaeva, ter, Jan Hajic,ˇ Christopher D. Manning, Sampo Miguel Ballesteros, Esha Banerjee, Sebastian Bank, Pyysalo, Sebastian Schuster, Francis Tyers, and Verginica Barbu Mititelu, Victoria Basmov, Colin

107 Batchelor, John Bauer, Sandra Bellato, Kepa Ben- Munro, Yugo Murawaki, Kaili Müürisep, Pinkey goetxea, Yevgeni Berzak, Irshad Ahmad Bhat, Nainwani, Juan Ignacio Navarro Horñiacek, Anna Riyaz Ahmad Bhat, Erica Biagetti, Eckhard Bick, Nedoluzhko, Gunta Nešpore-Berzkalne,¯ Lương Agne˙ Bielinskiene,˙ Rogier Blokland, Victoria Bo- Nguyên˜ Thi., Huyên` Nguyên˜ Thi. Minh, Yoshi- bicev, Loïc Boizou, Emanuel Borges Völker, Carl hiro Nikaido, Vitaly Nikolaev, Rattima Nitisaroj, Börstell, Cristina Bosco, Gosse Bouma, Sam Bow- Hanna Nurmi, Stina Ojala, Atul Kr. Ojha, Adédayo. man, Adriane Boyd, Kristina Brokaite,˙ Aljoscha Olúòkun, Mai Omura, Petya Osenova, Robert Burchardt, Marie Candito, Bernard Caron, Gauthier Östling, Lilja Øvrelid, Niko Partanen, Elena Pas- Caron, Tatiana Cavalcanti, Gül¸sen Cebiroglu˘ Ery- cual, Marco Passarotti, Agnieszka Patejuk, Guil- igit,˘ Flavio Massimiliano Cecchini, Giuseppe G. A. herme Paulino-Passos, Angelika Peljak-Łapinska,´ Celano, Slavomír Céplö,ˇ Savas Cetin, Fabri- Siyao Peng, Cenel-Augusto Perez, Guy Perrier, cio Chalub, Jinho Choi, Yongseok Cho, Jayeol Daria Petrova, Slav Petrov, Jason Phelan, Jussi Chun, Alessandra T. Cignarella, Silvie Cinková, Piitulainen, Tommi A Pirinen, Emily Pitler, Bar- Aurélie Collomb, Çagrı˘ Çöltekin, Miriam Con- bara Plank, Thierry Poibeau, Larisa Ponomareva, nor, Marine Courtin, Elizabeth Davidson, Marie- Martin Popel, Lauma Pretkalnin¸a, Sophie Prévost, Catherine de Marneffe, Valeria de Paiva, Elvis Prokopis Prokopidis, Adam Przepiórkowski, Tiina de Souza, Arantza Diaz de Ilarraza, Carly Dicker- Puolakainen, Sampo Pyysalo, Peng Qi, Andriela son, Bamba Dione, Peter Dirix, Kaja Dobrovoljc, Rääbis, Alexandre Rademaker, Loganathan Ra- Timothy Dozat, Kira Droganova, Puneet Dwivedi, masamy, Taraka Rama, Carlos Ramisch, Vinit Rav- Hanne Eckhoff, Marhaba Eli, Ali Elkahky, Binyam ishankar, Livy Real, Siva Reddy, Georg Rehm, Ivan Ephrem, Olga Erina, Tomaž Erjavec, Aline Eti- Riabov, Michael Rießler, Erika Rimkute,˙ Larissa Ri- enne, Wograine Evelyn, Richárd Farkas, Hector naldi, Laura Rituma, Luisa Rocha, Mykhailo Ro- Fernandez Alcalde, Jennifer Foster, Cláudia Fre- manenko, Rudolf Rosa, Davide Rovati, Valentin itas, Kazunori Fujita, Katarína Gajdošová, Daniel Rosca, Olga Rudina, Jack Rueter, Shoval Sadde, Galbraith, Marcos Garcia, Moa Gärdenfors, Se- Benoît Sagot, Shadi Saleh, Alessio Salomoni, Tanja bastian Garza, Kim Gerdes, Filip Ginter, Iakes Samardžic,´ Stephanie Samson, Manuela Sanguinetti, Goenaga, Koldo Gojenola, Memduh Gökırmak, Dage Särg, Baiba Saul¯ıte, Yanin Sawanakunanon, Yoav Goldberg, Xavier Gómez Guinovart, Berta Nathan Schneider, Sebastian Schuster, Djamé Sed- González Saavedra, Bernadeta Griciut¯ e,˙ Matias Gri- dah, Wolfgang Seeker, Mojgan Seraji, Mo Shen, oni, Normunds Gruz¯ ¯ıtis, Bruno Guillaume, Céline Atsuko Shimada, Hiroyuki Shirasu, Muh Shohibus- Guillot-Barbance, Nizar Habash, Jan Hajic,ˇ Jan Ha- sirri, Dmitry Sichinava, Aline Silveira, Natalia Sil- jicˇ jr., Mika Hämäläinen, Linh Hà My,˜ Na-Rae veira, Maria Simi, Radu Simionescu, Katalin Simkó, Han, Kim Harris, Dag Haug, Johannes Heinecke, Fe- Mária Šimková, Kiril Simov, Aaron Smith, Isabela lix Hennig, Barbora Hladká, Jaroslava Hlavácová,ˇ Soares-Bastos, Carolyn Spadine, Antonio Stella, Florinel Hociung, Petter Hohle, Jena Hwang, Milan Straka, Jana Strnadová, Alane Suhr, Umut Takumi Ikeda, Radu Ion, Elena Irimia, O. lájídé Sulubacak, Shingo Suzuki, Zsolt Szántó, Dima Ishola, Tomáš Jelínek, Anders Johannsen, Fredrik Taji, Yuta Takahashi, Fabio Tamburini, Takaaki Jørgensen, Markus Juutinen, Hüner Ka¸sıkara, An- Tanaka, Isabelle Tellier, Guillaume Thomas, Li- dre Kaasen, Nadezhda Kabaeva, Sylvain Kahane, isi Torga, Trond Trosterud, Anna Trukhina, Reut Hiroshi Kanayama, Jenna Kanerva, Boris Katz, Tsarfaty, Francis Tyers, Sumire Uematsu, Zdenkaˇ Tolga Kayadelen, Jessica Kenney, Václava Ket- Urešová, Larraitz Uria, Hans Uszkoreit, Andrius tnerová, Jesse Kirchner, Elena Klementieva, Arne Utka, Sowmya Vajjala, Daniel van Niekerk, Gert- Köhn, Kamil Kopacewicz, Natalia Kotsyba, Jolanta jan van Noord, Viktor Varga, Eric Villemonte de la Kovalevskaite,˙ Simon Krek, Sookyoung Kwak, Clergerie, Veronika Vincze, Lars Wallin, Abigail Veronika Laippala, Lorenzo Lambertino, Lucia Walsh, Jing Xian Wang, Jonathan North Washing- Lam, Tatiana Lando, Septina Dian Larasati, Alexei ton, Maximilan Wendt, Seyi Williams, Mats Wirén, Lavrentiev, John Lee, Phương Lê Hông,` Alessandro Christian Wittern, Tsegay Woldemariam, Tak-sum Lenci, Saran Lertpradit, Herman Leung, Cheuk Ying Wong, Alina Wróblewska, Mary Yako, Naoki Ya- Li, Josie Li, Keying Li, KyungTae Lim, Maria Li- mazaki, Chunxiao Yan, Koichi Yasuoka, Marat M. ovina, Yuan Li, Nikola Ljubešic,´ Olga Loginova, Yavrumyan, Zhuoran Yu, Zdenekˇ Žabokrtský, Amir Olga Lyashevskaya, Teresa Lynn, Vivien Macke- Zeldes, Manying Zhang, and Hanzhi Zhu. 2019. tanz, Aibek Makazhanov, Michael Mandl, Christo- Universal Dependencies 2.5. LINDAT/CLARIAH- pher Manning, Ruli Manurung, Cat˘ alina˘ Mar˘ an-˘ CZ digital library at the Institute of Formal and Ap- duc, David Marecek,ˇ Katrin Marheinecke, Héc- plied Linguistics (ÚFAL), Faculty of Mathematics tor Martínez Alonso, André Martins, Jan Mašek, and Physics, Charles University. Yuji Matsumoto, Ryan McDonald, Sarah McGuin- ness, Gustavo Mendonça, Niko Miekka, Mar- garita Misirpashayeva, Anna Missilä, Cat˘ alin˘ Mi- titelu, Maria Mitrofan, Yusuke Miyao, Simonetta Montemagni, Amir More, Laura Moreno Romero, Keiko Sophie Mori, Tomohiko Morioka, Shin- suke Mori, Shigeki Moro, Bjartur Mortensen, Bohdan Moskalevskyi, Kadri Muischnek, Robert

108