Frame-Semantic Role Labeling with Heterogeneous Annotations

Total Page:16

File Type:pdf, Size:1020Kb

Frame-Semantic Role Labeling with Heterogeneous Annotations Frame-Semantic Role Labeling with Heterogeneous Annotations Meghana Kshirsagar∗ Sam Thomson∗ Nathan Schneider† Jaime Carbonell∗ Noah A. Smith∗ Chris Dyer∗ ∗School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA †School of Informatics, University of Edinburgh, Edinburgh, Scotland, UK Abstract DESIRING: eager.a hope.n hope.v interested.a itch.v Experiencer Focal_participant Event want.v wish.n wish.v … do you want me to hold off until I finish July and August ? ACTIVITY_FINISH: complete.v We consider the task of identifying and la- FrameNet Agent Activity conclude.v finish.v … Agent End_point Desirable_action: ∅ HOLDING_OFF_ON: hold off.v beling the semantic arguments of a predi- wait.v cate that evokes a FrameNet frame. This DESIRING: eager.a hope.n Figure 1: Part of a sentence from FrameNethope.v interested.a full-text itch.v an- Experiencer Focal_participant Event task is challenging because there are only notation. 3 frames and their arguments arewant.v shown: wish.n wish.vDESIR … - do youA0 wantAM-ADV me to hold off until A1I finish July and August ? want-v-01 ING is evoked by want, ACTIVITY_FINISH by AfinishCTIVITY_FINISH, and: complete.vHOLD - a few thousand fully annotated sentences FrameNet Agent Activity conclude.v finish.v … theING people_OFF really_ON wantby holdus to stay off the. Thin course horizontal and finish the linesjob . representing PropBank Agent End_point Desirable_action: ∅ HOLDING_OFF_ON: hold off.v for supervised training. Our approach aug- argument spans areA0 labeled with role names.A1wait.v (Notfinish-v-01 shown: July ments an existing model with features de- and August evokeA1CALENDRICA3 _UNIT and fill its Unitstay-v-01role.) rived from FrameNet and PropBank and A0 AM-ADV A1 want-v-01 with partially annotated exemplars from the people really want us to stay the course and finish the job . PropBank FrameNet. We observe a 4% absolute in- A0 A1 finish-v-01 A1 A3 stay-v-01 crease in F1 versus the original model. Figure 2: A PropBank-annotated sentence from OntoNotes 1 Introduction (Hovy et al., 2006). The PB lexicon defines rolesets (verb sense–specific frames) and their core roles: e.g., finish-v-01 ‘cause to stop’, A0 ‘intentional agent’, A1 ‘thing finishing’, and A2 ‘explicit instrument, thing finished with’. (finish-v-03, by Paucity of data resources is a challenge for contrast, means ‘apply a finish, as to wood’.) Clear similarities semantic analyses like frame-semantic parsing to the FrameNet annotations in figure1 are evident, though PB uses lexical frames rather than deep frames and makes (Gildea and Jurafsky, 2002; Das et al., 2014) using some different decisions about roles (e.g., want-v-01 has no the FrameNet lexicon (Baker et al., 1998; Fillmore analogue to Focal_participant). and Baker, 2009).1 Given a sentence, a frame- • a PropBank-style SRL system, by using guide semantic parse maps word tokens to frames they features (§3.4).2 evoke, and for each frame, finds and labels its ar- gument phrases with frame-specific roles. An ex- These expansions of the training corpus and the ample appears in figure1. feature set for supervised argument identification In this paper, we address this argument iden- are integrated into SEMAFOR (Das et al., 2014), tification subtask, a form of semantic role label- the leading open-source frame-semantic parser ing (SRL), a task introduced by Gildea and Juraf- for English. We observe a 4% F1 improvement sky(2002) using an earlier version of FrameNet. in argument identification on the FrameNet test Our contribution addresses the paucity of annotated set, leading to a 1% F1 improvement on the full data for training using standard domain adaptation frame-semantic parsing task. Our code and mod- techniques. We exploit three annotation sources: els are available at http://www.ark.cs.cmu.edu/ SEMAFOR/. • the frame-to-frame relations in FrameNet, by using hierarchical features to share statistical 2 FrameNet strength among related roles (§3.2), • FrameNet’s corpus of partially-annotated ex- FrameNet represents events, scenarios, and rela- emplar sentences, by using “frustratingly tionships with an inventory of frames (such as easy” domain adaptation (§3.3), and 2Preliminary experiments training on PropBank annota- tions mapped to FrameNet via SemLink 1.2.2c (Bonial et al., ‡ Corresponding author: [email protected] 2013) hurt performance, likely due to errors and coverage 1 http://framenet.icsi.berkeley.edu gaps in the mappings. Full-Text Exemplars SHOPPING and SCARCITY). Each frame is associ- train test train test ated with a set of roles (or frame elements) called Sentences 2,780 2,420 137,515 4,132 to mind in order to understand the scenario, and Frames 15,019 4,458 137,515 4,132 lexical predicates (verbs, nouns, adjectives, and Overt arguments 25,918 7,210 278,985 8,417 adverbs) capable of evoking the scenario. For ex- TYPES ample, the BODY_MOVEMENT frame has Agent and Frames 642 470 862 562 Body_part as its core roles, and lexical entries in- Roles 2,644 1,420 4,821 1,224 Unseen frames vs. train: 46 0 cluding verbs such as bend, blink, crane, and curtsy, Roles in unseen frames vs. train: 178 0 plus the noun use of curtsy. In FrameNet 1.5, there Unseen roles vs. train: 289 38 are over 1,000 frames and 12,000 lexical predi- Unseen roles vs. combined train: 103 32 cates. Table 1: Characteristics of the training and test data. (These statistics exclude the development set, which contains 4,463 2.1 Hierarchy frames over 746 sentences.) The FrameNet lexicon is organized as a network, and arguments to as many words as possible. Be- with several kinds of frame-to-frame relations ginning with the SemEval-2007 shared task on linking pairs of frames and (subsets of) their ar- FrameNet analysis, frame-semantic parsers have guments (Ruppenhofer et al., 2010). In this work, been trained and evaluated on the full-text data 3 we consider two kinds of frame-to-frame relations: (Baker et al., 2007; Das et al., 2014). The full-text Inheritance: E.g., ROBBERY inherits from documents represent a mix of genres, prominently COMMITTING_CRIME, which inherits from MIS- including travel guides and bureaucratic reports DEED. Crucially, roles in inheriting frames about weapons stockpiles. are mapped to corresponding roles in inher- Exemplars: To document a given predicate, lexi- ited frames: ROBBERY.Perpetrator links to cographers manually select corpus examples and COMMITTING_CRIME.Perpetrator, which links to annotate them only with respect to the predicate MISDEED.Wrongdoer, and so forth. in question. These singly-annotated sentences Subframe: This indicates a subevent within a from FrameNet are called lexicographic exem- complex event. E.g., the CRIMINAL_PROCESS frame plars. There are over 140,000 sentences containing groups together subframes ARREST, ARRAIGN- argument annotations and relative to the FT dataset, MENT and TRIAL. CRIMINAL_PROCESS.Defendant, these contain an order of magnitude more frame for instance, is mapped to ARREST.Suspect, annotations and over two orders of magnitude more TRIAL.Defendant, and SENTENCING.Convict. sentences. As these were manually selected, the We say that a parent of a role is one that has rate of overt arguments per frame is noticeably either the Inheritance or Subframe relation to it. higher than in the FT data. The exemplars formed There are 4,138 Inheritance and 589 Subframe the basis of early studies of frame-semantic role links among role types in FrameNet 1.5. labeling (e.g., Gildea and Jurafsky, 2002; Thomp- Prior work has considered various ways of group- son et al., 2003; Fleischman et al., 2003; Litkowski, ing role labels together in order to share statisti- 2004; Kwon et al., 2004). Exemplars have not yet cal strength. Matsubayashi et al.(2009) observed been exploited successfully to improve role label- 4 small gains from using the Inheritance relation- ing performance on the more realistic FT task. ships and also from grouping by the role name 2.3 PropBank (SEMAFOR already incorporates such features). Johansson(2012) reports improvements in SRL for PropBank (PB; Palmer et al., 2005) is a lexicon and Swedish, by exploiting relationships between both corpus of predicate–argument structures that takes frames and roles. Baldewein et al.(2004) learn a shallower approach than FrameNet. FrameNet latent clusters of roles and role-fillers, reporting frames cluster lexical predicates that evoke sim- mixed results. Our approach is described in §3.2. 3Though these were annotated at the document level, 2.2 Annotations and train/development/test splits are by document, the frame- semantic parsing is currently restricted to the sentence level. Statistics for the annotations appear in table1. 4Das and Smith(2011, 2012) investigated semi-supervised Full-text (FT): This portion of the FrameNet cor- techniques using the exemplars and WordNet for frame iden- tification. Hermann et al.(2014) also improve frame iden- pus consists of documents and has about 5,000 tification by mapping frames and predicates into the same sentences for which annotators assigned frames continuous vector space, allowing statistical sharing. ilar kinds of scenarios In comparison, PropBank statistical generalizations about the kinds of argu- frames are purely lexical and there are no formal ments seen in those roles. Our hypothesis is that relations between different predicates or their roles. this will be beneficial given the small number of PropBank’s sense distinctions are generally coarser- training examples for individual roles. grained than FrameNet’s. Moreover, FrameNet lex- All roles that have a common parent based on ical entries cover many different parts of speech, the Inheritance and Subframe relations will share while PropBank focuses on verbs and (as of re- a set of features in common. Specifically, for each cently) eventive noun and adjective predicates. An base feature f which is conjoined with the role r example with PB annotations is shown in figure2.
Recommended publications
  • A Collaborativeplatform for Multilingual Ontology
    PhD Dissertation International Doctorate School in Information and Communication Technologies DIT - University of Trento A COLLABORATIVE PLATFORM FOR MULTILINGUAL ONTOLOGY DEVELOPMENT Ahmed Maher Ahmed Tawfik Moustafa Advisor: Professor. Fausto Giunchiglia Università degli Studi di Trento Abstract The world is extremely diverse and its diversity is obvious in the cultural differences and the large number of spoken languages being used all over the world. In this sense, we need to collect and organize a huge amount of knowledge obtained from multiple resources differing from one another in many aspects. A possible approach for doing that is to think of designing effective tools for construction and maintenance of linguistic resources and localized domain ontologies based on well-defined knowledge representation methodologies capable of dealing with diversity and the continuous evolvement of human knowledge. In this thesis, we present a collaborative platform which allows for knowledge organization in a language-independent manner and provides the appropriate mapping from a language independent concept to one specific lexicalization per language. This representation ensures a smooth multilingual enrichment process for linguistic resources and a robust construction of ontologies using language-independent concepts. The collaborative platform is designed following a workflow-based development methodology that models linguistic resources as a set of collaborative objects and assigns a customizable workflow to build and maintain each collaborative object in a community driven manner, with extensive support of modern web 2.0 social and collaborative features. Keywords Knowledge Representation, Multilingual Resources, Ontology Development, Computer Supported Collaborative Work 2 Acknowledgments I am particularly grateful for my supervisor, Professor. Fausto Giunchiglia, for the guidance and advices he has provided throughout my time as a PhD student.
    [Show full text]
  • Latent Semantic Network Induction in the Context of Linked Example Senses
    Latent semantic network induction in the context of linked example senses Hunter Scott Heidenreich Jake Ryland Williams Department of Computer Science Department of Information Science College of Computing and Informatics College of Computing and Informatics [email protected] [email protected] Abstract of a network—much like the Princeton Word- Net (Miller, 1995; Fellbaum, 1998)—that is con- The Princeton WordNet is a powerful tool structed solely from the semi-structured data of for studying language and developing nat- ural language processing algorithms. With Wiktionary. This relies on the noisy annotations significant work developing it further, one of the editors of Wiktionary to naturally induce a line considers its extension through aligning network over the entirety of the English portion of its expert-annotated structure with other lex- Wiktionary. In doing so, the development of this ical resources. In contrast, this work ex- work produces: plores a completely data-driven approach to network construction, forming a wordnet us- • an induced network over Wiktionary, en- ing the entirety of the open-source, noisy, user- riched with semantically linked examples, annotated dictionary, Wiktionary. Compar- forming a directed acyclic graph (DAG); ing baselines to WordNet, we find compelling evidence that our network induction process • an exploration of the task of relationship dis- constructs a network with useful semantic structure. With thousands of semantically- ambiguation as a means to induce network linked examples that demonstrate sense usage construction; and from basic lemmas to multiword expressions (MWEs), we believe this work motivates fu- • an outline for directions of expansion, includ- ture research. ing increasing precision in disambiguation, cross-linking example usages, and aligning 1 Introduction English Wiktionary with other languages.
    [Show full text]
  • Semantic Role Labeling 2
    Semantic Role Labeling 2 Outline • Semantic role theory • Designing semantic role annotation project ▫ Granularity ▫ Pros and cons of different role schemas ▫ Multi-word expressions 3 Outline • Semantic role theory • Designing semantic role annotation project ▫ Granularity ▫ Pros and cons of different role schemas ▫ Multi-word expressions 4 Semantic role theory • Predicates tie the components of a sentence together • Call these components arguments • [John] opened [the door]. 5 Discovering meaning • Syntax only gets you so far in answering “Who did what to whom?” John opened the door. Syntax: NPSUB V NPOBJ The door opened. Syntax: NPSUB V 6 Discovering meaning • Syntax only gets you so far in answering “Who did what to whom?” John opened the door. Syntax: NPSUB V NPOBJ Semantic roles: Opener REL thing opened The door opened. Syntax: NPSUB V Semantic roles: thing opened REL 7 Can the lexicon account for this? • Is there a different sense of open for each combination of roles and syntax? • Open 1: to cause something to become open • Open 2: become open • Are these all the senses we would need? (1) John opened the door with a crowbar. Open1? (2) They tried the tools in John’s workshop one after the other, and finally the crowbar opened the door. Still Open1? 8 Fillmore’s deep cases • Correspondence between syntactic case and semantic role that participant plays • “Deep cases”: Agentive, Objective, Dative, Instrument, Locative, Factitive • Loosely associated with syntactic cases; transformations result in the final surface case 9 The door opened. Syntax: NPSUB V Semantic roles: Objective REL John opened the door. Syntax: NPSUB V NPOBJ Semantic roles: Agentive REL Objective The crowbar opened the door.
    [Show full text]
  • ACL 2019 Social Media Mining for Health Applications (#SMM4H)
    ACL 2019 Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task Proceedings of the Fourth Workshop August 2, 2019 Florence, Italy c 2019 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-950737-46-8 ii Preface Welcome to the 4th Social Media Mining for Health Applications Workshop and Shared Task - #SMM4H 2019. The total number of users of social media continues to grow worldwide, resulting in the generation of vast amounts of data. Popular social networking sites such as Facebook, Twitter and Instagram dominate this sphere. According to estimates, 500 million tweets and 4.3 billion Facebook messages are posted every day 1. The latest Pew Research Report 2, nearly half of adults worldwide and two- thirds of all American adults (65%) use social networking. The report states that of the total users, 26% have discussed health information, and, of those, 30% changed behavior based on this information and 42% discussed current medical conditions. Advances in automated data processing, machine learning and NLP present the possibility of utilizing this massive data source for biomedical and public health applications, if researchers address the methodological challenges unique to this media. In its fourth iteration, the #SMM4H workshop takes place in Florence, Italy, on August 2, 2019, and is co-located with the
    [Show full text]
  • Visualization Design for a Web Interface to the Large-Scale Linked Lexical Resource UBY
    Visualization Design for a Web Interface to the Large-Scale Linked Lexical Resource UBY We present the results of a collaboration of visualization experts and computational linguists which aimed at the re-design of the visualization component in the Web user interface (Web UI) to the large-scale linked lexical resource UBY. UBY combines a wide range of information from expert- constructed (e.g., WordNet, FrameNet, VerbNet) and collaboratively constructed (e.g., Wiktionary, Wikipedia) resources for English and German, see https://www.ukp.tu-darmstadt.de/uby. All resources contained in UBY distinguish not only different words but also their senses. A distinguishing feature of UBY is that the different resources are aligned to each other at the word sense level, i.e. there are links connecting equivalent word senses from different resources in UBY. For senses that are linked, information from the aligned resources can be accessed and the resulting enriched sense representations can be used to enhance the performance of Natural Language Processing tasks. Targeted user groups of the UBY Web UI are researchers in the field of Natural Language Processing and in the Digital Humanities (e.g., lexicographers, linguists). In the context of exploring the usually large number of senses for an arbitrary search word, the UBY Web UI should support these user groups in assessing the added value of sense links for particular applications. It is important to emphasize that this is an open research question for most applications. We will present the results of our detailed requirements analysis that revealed a number of central requirements a visualization of all the senses for a given search word and the links between them must meet in order to be useful for this purpose.
    [Show full text]
  • Syntax Role for Neural Semantic Role Labeling
    Syntax Role for Neural Semantic Role Labeling Zuchao Li Hai Zhao∗ Shanghai Jiao Tong University Shanghai Jiao Tong University Department of Computer Science and Department of Computer Science and Engineering Engineering [email protected] [email protected] Shexia He Jiaxun Cai Shanghai Jiao Tong University Shanghai Jiao Tong University Department of Computer Science and Department of Computer Science and Engineering Engineering [email protected] [email protected] Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument struc- ture of a sentence. Previous studies in terms of traditional models have shown syntactic informa- tion can make remarkable contributions to SRL performance; however, the necessity of syntactic information was challenged by a few recent neural SRL studies that demonstrate impressive performance without syntactic backbones and suggest that syntax information becomes much less important for neural semantic role labeling, especially when paired with recent deep neural network and large-scale pre-trained language models. Despite this notion, the neural SRL field still lacks a systematic and full investigation on the relevance of syntactic information in SRL, for both dependency and both monolingual and multilingual settings. This paper intends to quantify the importance of syntactic information for neural SRL in the deep learning framework. We introduce three typical SRL frameworks (baselines), sequence-based, tree-based, and graph-based, which are accompanied by two categories of exploiting syntactic information: syntax pruning- based and syntax feature-based. Experiments are conducted on the CoNLL-2005, 2009, and 2012 benchmarks for all languages available, and results show that neural SRL models can still benefit from syntactic information under certain conditions.
    [Show full text]
  • Semantic Role Labeling Tutorial NAACL, June 9, 2013
    Semantic Role Labeling Tutorial NAACL, June 9, 2013 Part 1: Martha Palmer, University of Colorado Part 2: Shumin Wu, University of Colorado Part 3: Ivan Titov, Universität des Saarlandes 1 Outline } Part 1 Linguistic Background, Resources, Annotation Martha Palmer, University of Colorado } Part 2 Supervised Semantic Role Labeling and Leveraging Parallel PropBanks Shumin Wu, University of Colorado } Part 3 Semi- , unsupervised and cross-lingual approaches Ivan Titov, Universität des Saarlandes, Universteit van Amsterdam 2 Motivation: From Sentences to Propositions Who did what to whom, when, where and how? Powell met Zhu Rongji battle wrestle join debate Powell and Zhu Rongji met consult Powell met with Zhu Rongji Proposition: meet(Powell, Zhu Rongji) Powell and Zhu Rongji had a meeting meet(Somebody1, Somebody2) . When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) Capturing semantic roles SUBJ } Dan broke [ the laser pointer.] SUBJ } [ The windows] were broken by the hurricane. SUBJ } [ The vase] broke into pieces when it toppled over. PropBank - A TreeBanked Sentence (S (NP-SBJ Analysts) S (VP have (VP been VP (VP expecting (NP (NP a GM-Jaguar pact) have VP (SBAR (WHNP-1 that) (S (NP-SBJ *T*-1) NP-SBJ been VP (VP would Analysts (VP give expectingNP (NP the U.S. car maker) SBAR (NP (NP an eventual (ADJP 30 %) stake) NP S (PP-LOC in (NP the British company)))))))))))) a GM-Jaguar WHNP-1 VP pact that NP-SBJ VP *T*-1 would NP give Analysts have been expecting a GM-Jaguar NP PP-LOC pact that would give the U.S.
    [Show full text]
  • A Large, Interlinked, Syntactically-Rich Lexical Resource for Ontologies
    Semantic Web 0 (0) 1 1 IOS Press lemonUby - a large, interlinked, syntactically-rich lexical resource for ontologies Judith Eckle-Kohler, a;∗ John Philip McCrae b and Christian Chiarcos c a Ubiquitous Knowledge Processing (UKP) Lab, Department of Computer Science, Technische Universität Darmstadt and Information Center for Education, German Institute for International Educational Research, Germany, http://www.ukp.tu-darmstadt.de b Cognitive Interaction Technology (CITEC), Semantic Computing Group, Universität Bielefeld, Germany, http://www.sc.cit-ec.uni-bielefeld.de c Applied Computational Linguistics (ACoLi), Department of Computer Science and Mathematics, Goethe-University Frankfurt am Main, Germany, http://acoli.cs.uni-frankfurt.de Abstract. We introduce lemonUby, a new lexical resource integrated in the Semantic Web which is the result of converting data extracted from the existing large-scale linked lexical resource UBY to the lemon lexicon model. The following data from UBY were converted: WordNet, FrameNet, VerbNet, English and German Wiktionary, the English and German entries of Omega- Wiki, as well as links between pairs of these lexicons at the word sense level (links between VerbNet and FrameNet, VerbNet and WordNet, WordNet and FrameNet, WordNet and Wiktionary, WordNet and German OmegaWiki). We linked lemonUby to other lexical resources and linguistic terminology repositories in the Linguistic Linked Open Data cloud and outline possible applications of this new dataset. Keywords: Lexicon model, lemon, UBY-LMF, UBY, OLiA, ISOcat, WordNet, VerbNet, FrameNet, Wiktionary, OmegaWiki 1. Introduction numerous mappings and linkings of lexica, as well as standards for representing lexical resources, such Recently, the language resource community has begun as the ISO 24613:2008 Lexical Markup Framework to explore the opportunities offered by the Semantic (LMF) [13].
    [Show full text]
  • Efficient Pairwise Document Similarity Computation in Big Datasets
    International Journal of Database Theory and Application Vol.8, No.4 (2015), pp.59-70 http://dx.doi.org/10.14257/ijdta.2015.8.4.07 Efficient Pairwise Document Similarity Computation in Big Datasets 1Papias Niyigena, 1Zhang Zuping*, 2Weiqi Li and 1Jun Long 1School of Information Science and Engineering, Central South University, Changsha, 410083, China 2School of Electronic and Information Engineering, Xi’an Jiaotong University Xian, 710049, China [email protected], * [email protected], [email protected], [email protected] Abstract Document similarity is a common task to a variety of problems such as clustering, unsupervised learning and text retrieval. It has been seen that document with the very similar content provides little or no new information to the user. This work tackles this problem focusing on detecting near duplicates documents in large corpora. In this paper, we are presenting a new method to compute pairwise document similarity in a corpus which will reduce the time execution and save space execution resources. Our method group shingles of all documents of a corpus in a relation, with an advantage of efficiently manage up to millions of records and ease counting and aggregating. Three algorithms are introduced to reduce the candidates shingles to be compared: one creates the relation of shingles to be considered, the second one creates the set of triples and the third one gives the similarity of documents by efficiently counting the shared shingles between documents. The experiment results show that our method reduces the number of candidates pairs to be compared from which reduce also the execution time and space compared with existing algorithms which consider the computation of all pairs candidates.
    [Show full text]
  • Cross-Lingual Semantic Role Labeling with Model Transfer Hao Fei, Meishan Zhang, Fei Li and Donghong Ji
    1 Cross-lingual Semantic Role Labeling with Model Transfer Hao Fei, Meishan Zhang, Fei Li and Donghong Ji Abstract—Prior studies show that cross-lingual semantic role transfer. The former is based on large-scale parallel corpora labeling (SRL) can be achieved by model transfer under the between source and target languages, where the source-side help of universal features. In this paper, we fill the gap of sentences are automatically annotated with SRL tags and cross-lingual SRL by proposing an end-to-end SRL model that incorporates a variety of universal features and transfer methods. then the source annotations are projected to the target-side We study both the bilingual transfer and multi-source transfer, sentences based on word alignment [7], [9], [10], [11], [12]. under gold or machine-generated syntactic inputs, pre-trained Annotation projection has received the most attention for cross- high-order abstract features, and contextualized multilingual lingual SRL. Consequently, a range of strategies have been word representations. Experimental results on the Universal proposed for enhancing the SRL performance of the target Proposition Bank corpus indicate that performances of the cross-lingual SRL can vary by leveraging different cross-lingual language, including improving the projection quality [13], joint features. In addition, whether the features are gold-standard learning of syntax and semantics information [7], iteratively also has an impact on performances. Precisely, we find that bootstrapping to reduce the influence of noise target corpus gold syntax features are much more crucial for cross-lingual [12], and joint translation-SRL [8]. SRL, compared with the automatically-generated ones.
    [Show full text]
  • Mining Translations from the Web of Open Linked Data
    Mining translations from the web of open linked data John Philip McCrae Philipp Cimiano University of Bielefeld University of Bielefeld [email protected] [email protected] Abstract OmegaWiki on the web of data should ameliorate In this paper we consider the prospect of the process of harvesting translations from these extracting translations for words from the resources. web of linked data. By searching for We consider two sources for translations from entities that have labels in both English linked data: firstly, we consider mining labels for and German we extract 665,000 transla- concepts from the data contained in the 2010 Bil- tions. We then also consider a linguis- lion Triple Challenge (BTC) data set, as well as tic linked data resource, lemonUby, from DBpedia (Auer et al., 2007) and FreeBase (Bol- which we extract a further 115,000 transla- lacker et al., 2008). Secondly, we mine transla- tions. We combine these translations with tions from lemonUby (Eckle-Kohler et al., 2013), the Moses statistical machine translation, a resource that integrates a number of distinct dic- and we show that the translations extracted tionary language resources in the lemon (Lexi- from the linked data can be used to im- con Model for Ontologies) format (McCrae et al., prove the translation of unknown words. 2012), which is a model for representing rich lex- ical information including forms, sense, morphol- 1 Introduction ogy and syntax of ontology labels. We then con- In recent years there has been a massive explo- sider the process of including these extra transla- sion in the amount and quality of data available tions into an existing translation system, namely as linked data on the web.
    [Show full text]
  • Natural Language Semantic Construction Based on Cloud Database
    Copyright © 2018 Tech Science Press CMC, vol.57, no.3, pp.603-619, 2018 Natural Language Semantic Construction Based on Cloud Database Suzhen Wang1, Lu Zhang1, Yanpiao Zhang1, Jieli Sun1, Chaoyi Pang2, Gang Tian3 and Ning Cao4, * Abstract: Natural language semantic construction improves natural language comprehension ability and analytical skills of the machine. It is the basis for realizing the information exchange in the intelligent cloud-computing environment. This paper proposes a natural language semantic construction method based on cloud database, mainly including two parts: natural language cloud database construction and natural language semantic construction. Natural Language cloud database is established on the CloudStack cloud-computing environment, which is composed by corpus, thesaurus, word vector library and ontology knowledge base. In this section, we concentrate on the pretreatment of corpus and the presentation of background knowledge ontology, and then put forward a TF-IDF and word vector distance based algorithm for duplicated webpages (TWDW). It raises the recognition efficiency of repeated web pages. The part of natural language semantic construction mainly introduces the dynamic process of semantic construction and proposes a mapping algorithm based on semantic similarity (MBSS), which is a bridge between Predicate-Argument (PA) structure and background knowledge ontology. Experiments show that compared with the relevant algorithms, the precision and recall of both algorithms we propose have been significantly improved. The work in this paper improves the understanding of natural language semantics, and provides effective data support for the natural language interaction function of the cloud service. Keywords: Natural language, cloud database, semantic construction. 1 Introduction The Internet of things and big data have promoted the rapid development of the artificial intelligence industry.
    [Show full text]