Parsing Universal Dependencies Without Training Hector Martinez Alonso, Željko Agić, Barbara Plank, Anders Søgaard

Parsing Universal Dependencies Without Training Hector Martinez Alonso, Željko Agić, Barbara Plank, Anders Søgaard

Parsing Universal Dependencies without training Hector Martinez Alonso, Željko Agić, Barbara Plank, Anders Søgaard To cite this version: Hector Martinez Alonso, Željko Agić, Barbara Plank, Anders Søgaard. Parsing Universal Dependen- cies without training. EACL 2017 - 15th Conference of the European Chapter of the Association for Computational Linguistics, Apr 2017, Valencia, Spain. pp.229 - 239. hal-01677405 HAL Id: hal-01677405 https://hal.inria.fr/hal-01677405 Submitted on 8 Jan 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Parsing Universal Dependencies without training Héctor Martínez Alonso♠ Željko Agic´♥ Barbara Plank♣ Anders Søgaard♦ ♠Univ. Paris Diderot, Sorbonne Paris Cité – Alpage, INRIA, France ♥IT University of Copenhagen, Denmark ♣Center for Language and Cognition, University of Groningen, The Netherlands ♦University of Copenhagen, Denmark [email protected],[email protected],[email protected],[email protected] Abstract Furthermore, given that UD rests on a series of simple principles like the primacy of lexical heads, We propose UDP, the first training-free cf. Johannsen et al. (2015) for more details, we parser for Universal Dependencies (UD). expect that such a formalism lends itself more nat- Our algorithm is based on PageRank and a urally to a simple and linguistically sound rule- small set of head attachment rules. It fea- based approach to cross-lingual parsing. In this tures two-step decoding to guarantee that paper we present such an approach. function words are attached as leaf nodes. Our system is a dependency parser that requires The parser requires no training, and it is no training, and relies solely on explicit part-of- competitive with a delexicalized transfer speech (POS) constraints that UD imposes. In par- system. UDP offers a linguistically sound ticular, UD prescribes that trees are single-rooted, unsupervised alternative to cross-lingual and that function words like adpositions, auxil- parsing for UD, which can be used as a iaries, and determiners are always dependents of baseline for such systems. The parser has content words, while other formalisms might treat very few parameters and is distinctly ro- them as heads (De Marneffe et al., 2014). We as- bust to domain change across languages. cribe our work to the viewpoints of Bender (2009) about the incorporation of linguistic knowledge in 1 Introduction language-independent systems. Grammar induction and unsupervised dependency Contributions We introduce, to the best of our parsing are active fields of research in natural knowledge, the first unsupervised rule-based de- language processing (Klein and Manning, 2004; pendency parser for Universal Dependencies. Gelling et al., 2012). However, many data-driven Our method goes substantially beyond the exist- approaches struggle with learning relations that ing work on rule-aided unsupervised dependency match the conventions of the test data, e.g., Klein parsing, specifically by: and Manning reported the tendency of their DMV parser to make determiners the heads of German i) adapting the dependency head rules to UD- nouns, which would not be an error if the test data compliant POS relations, used a DP analysis (Abney, 1987). Even super- ii) incorporating the UD restriction of function vised transfer approaches (McDonald et al., 2011) words being leaves, suffer from target adaptation problems when fac- iii) applying personalized PageRank to improve ing word order differences. main predicate identification, and by The Universal Dependencies (UD) project iv) making the parsing entirely free of language- (Nivre et al., 2015; Nivre et al., 2016) offers a specific parameters by estimating adposition dependency formalism that aims at providing a attachment direction at runtime. consistent representation across languages, while We evaluate our system on 32 languages1 in three enforcing a few hard constraints. The arrival of setups, depending on the reliability of available such treebanks, expanded and improved on a reg- POS tags, and compare to a multi-source delexi- ular basis, provides a new milestone for cross- 1Out of 33 languages in UD v1.2. We exclude Japanese lingual dependency parsing research (McDonald because the treebank is distributed without word forms and et al., 2013). hence we can not provide results on predicted POS. 229 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 229–239, Valencia, Spain, April 3-7, 2017. c 2017 Association for Computational Linguistics calized transfer system. In addition, we evaluate word-splitting for truly low-resource languages. the systems’ sensitivity to domain change for a Further, Johannsen et al. (2016) introduced joint subset of UD languages for which domain infor- projection of POS and dependencies from multiple mation was retrievable. The results expose a solid sources while sharing the outlook on bias removal and competitive system for all UD languages. Our in real-world multilingual processing. unsupervised parser compares favorably to delex- icalized parsing, while being more robust to do- Rule-based parsing Cross-lingual methods, re- main change. alistic or not, depend entirely on the availability of data: for the sources, for the targets, or most 2 Related work often for both sets of languages. Moreover, they typically do not exploit constraints placed on lin- Cross-lingual learning Recent years have seen guistic structures through a formalism, and they do exciting developments in cross-lingual linguistic so by design. structure prediction based on transfer or projection With the emergence of UD as the practical stan- of POS and dependencies (Das and Petrov, 2011; dard for multilingual POS and syntactic depen- McDonald et al., 2011). These works mainly use dency annotation, we argue for an approach that supervised learning and domain adaptation tech- takes a fresh angle on both aspects. Specifically, niques for the target language. we propose a parser that i) requires no training The first group of approaches deals with anno- data, and in contrast ii) critically relies on exploit- tation projection (Yarowsky et al., 2001), whereby ing the UD constraints. parallel corpora are used to transfer annotations These two characteristics make our parser un- between resource-rich source languages and low- supervised. Data-driven unsupervised dependency resource target languages. Projection relies on the parsing is now a well-established discipline (Klein availability and quality of parallel corpora, source- and Manning, 2004; Spitkovsky et al., 2010a; side taggers and parsers, but also tokenizers, sen- Spitkovsky et al., 2010b). Still, the performance tence aligners, and word aligners for sources and of these parsers falls far behind the approaches in- targets. Hwa et al. (2005) were the first to volving any sort of supervision. project syntactic dependencies, and Tiedemann et Our work builds on the line of research on rule- al. (2014; 2016) improved on their projection al- aided unsupervised dependency parsing by Gillen- gorithm. Current state of the art in cross-lingual water et al. (2010) and Naseem et al. (2010), and dependency parsing involves leveraging parallel also relates to Søgaard’s (2012a; 2012b) work. corpora for annotation projection (Ma and Xia, Our parser, however, features two key differences: 2014; Rasooli and Collins, 2015). i) the usage of PageRank personalization (Lof- The second group of approaches deals with gren, 2015), and of transferring source parsing models to target lan- ii) two-step decoding to treat content and func- guages. Zeman and Resnik (2008) were the first tion words differently according to the UD to introduce the idea of delexicalization: removing formalism. lexical features by training and cross-lingually ap- Through these differences, even without any train- plying parsers solely on POS sequences. Søgaard ing data, we parse nearly as well as a delexicalized (2011) and McDonald et al. (2011) independently transfer parser, and with increased stability to do- extended the approach by using multiple sources, main change. requiring uniform POS and dependency represen- tations (McDonald et al., 2013). 3 Method Both model transfer and annotation projection rely on a large number of presumptions to derive Our approach does not use any training or unla- their competitive parsing models. By and large, beled data. We have used the English treebank these presumptions are unrealistic and exclusive to during development to assess the contribution of a group of very closely related, resource-rich Indo- individual head rules, and to tune PageRank pa- European languages. Agic´ et al. (2015; 2016) rameters (Sec. 3.1) and function-word directional- exposed some of these biases in their proposal ity (Sec. 3.2). Adposition direction is calculated for realistic cross-lingual tagging and parsing, as on the fly at runtime. We refer henceforth to our they emphasized the lack of perfect sentence- and UD parser as UDP. 230 1: H = ; D = 3.1 PageRank setup ∅ ∅ 2: C = c1, ...cm ; F = f1, ...fm h i h i Our system uses the PageRank (PR) algorithm 3: for c C do 4: if ∈H = 0 then (Page et al., 1999) to estimate the relevance of 5: |h |= root the content words of a sentence. PR uses a ran- 6: else 7: h =argminj H γ(j, c) δ(j, c) κ(j, c) dom walk to estimate which nodes in the graph are ∈ 8: end if { | ∧ } more likely to be visited often, and thus, it gives 9: H = H c higher rank to nodes with more incoming edges, as 10: D = D ∪ {(h,} c) ∪ { } well as to nodes connected to those.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us