Conferenceabstracts
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Preparation and Exploitation of Bilingual Texts Dusko Vitas, Cvetana Krstev, Eric Laporte
Preparation and exploitation of bilingual texts Dusko Vitas, Cvetana Krstev, Eric Laporte To cite this version: Dusko Vitas, Cvetana Krstev, Eric Laporte. Preparation and exploitation of bilingual texts. Lux Coreana, 2006, 1, pp.110-132. hal-00190958v2 HAL Id: hal-00190958 https://hal.archives-ouvertes.fr/hal-00190958v2 Submitted on 27 Nov 2007 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Preparation and exploitation of bilingual texts Duško Vitas Faculty of Mathematics Studentski trg 16, CS-11000 Belgrade, Serbia Cvetana Krstev Faculty of Philology Studentski trg 3, CS-11000 Belgrade, Serbia Éric Laporte Institut Gaspard-Monge, Université de Marne-la-Vallée 5, bd Descartes, 77454 Marne-la-Vallée CEDEX 2, France Introduction A bitext is a merged document composed of two versions of a given text, usually in two different languages. An aligned bitext is produced by an alignment tool or aligner, that automatically aligns or matches the versions of the same text, generally sentence by sentence. A multilingual aligned corpus or collection of aligned bitexts, when consulted with a search tool, can be extremely useful for translation, language teaching and the investigation of literary text (Veronis, 2000). -
A Data-Driven Framework for Assisting Geo-Ontology Engineering Using a Discrepancy Index
University of California Santa Barbara A Data-Driven Framework for Assisting Geo-Ontology Engineering Using a Discrepancy Index A Thesis submitted in partial satisfaction of the requirements for the degree Master of Arts in Geography by Bo Yan Committee in charge: Professor Krzysztof Janowicz, Chair Professor Werner Kuhn Professor Emerita Helen Couclelis June 2016 The Thesis of Bo Yan is approved. Professor Werner Kuhn Professor Emerita Helen Couclelis Professor Krzysztof Janowicz, Committee Chair May 2016 A Data-Driven Framework for Assisting Geo-Ontology Engineering Using a Discrepancy Index Copyright c 2016 by Bo Yan iii Acknowledgements I would like to thank the members of my committee for their guidance and patience in the face of obstacles over the course of my research. I would like to thank my advisor, Krzysztof Janowicz, for his invaluable input on my work. Without his help and encour- agement, I would not have been able to find the light at the end of the tunnel during the last stage of the work. Because he provided insight that helped me think out of the box. There is no better advisor. I would like to thank Yingjie Hu who has offered me numer- ous feedback, suggestions and inspirations on my thesis topic. I would like to thank all my other intelligent colleagues in the STKO lab and the Geography Department { those who have moved on and started anew, those who are still in the quagmire, and those who have just begun { for their support and friendship. Last, but most importantly, I would like to thank my parents for their unconditional love. -
Arxiv:1908.07448V1
Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing Milan Straka and Jana Strakova´ and Jan Hajicˇ Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics {strakova,straka,hajic}@ufal.mff.cuni.cz Abstract Shared Task (Zeman et al., 2018). • We report our best results on UD 2.3. The We present an extensive evaluation of three addition of contextualized embeddings im- recently proposed methods for contextualized 25% embeddings on 89 corpora in 54 languages provements range from relative error re- of the Universal Dependencies 2.3 in three duction for English treebanks, through 20% tasks: POS tagging, lemmatization, and de- relative error reduction for high resource lan- pendency parsing. Employing the BERT, guages, to 10% relative error reduction for all Flair and ELMo as pretrained embedding in- UD 2.3 languages which have a training set. puts in a strong baseline of UDPipe 2.0, one of the best-performing systems of the 2 Related Work CoNLL 2018 Shared Task and an overall win- ner of the EPE 2018, we present a one-to- A new type of deep contextualized word repre- one comparison of the three contextualized sentation was introduced by Peters et al. (2018). word embedding methods, as well as a com- The proposed embeddings, called ELMo, were ob- parison with word2vec-like pretrained em- tained from internal states of deep bidirectional beddings and with end-to-end character-level word embeddings. We report state-of-the-art language model, pretrained on a large text corpus. results in all three tasks as compared to results Akbik et al. -
CUASI NOMÁS INGLÉS: PROSODY at the CROSSROADS of SPANISH and ENGLISH in 20TH CENTURY NEW MEXICO Jackelyn Van Buren Doctoral Student, Linguistics
University of New Mexico UNM Digital Repository Linguistics ETDs Electronic Theses and Dissertations Fall 11-15-2017 CUASI NOMÁS INGLÉS: PROSODY AT THE CROSSROADS OF SPANISH AND ENGLISH IN 20TH CENTURY NEW MEXICO Jackelyn Van Buren Doctoral Student, Linguistics Follow this and additional works at: https://digitalrepository.unm.edu/ling_etds Part of the Anthropological Linguistics and Sociolinguistics Commons, and the Phonetics and Phonology Commons Recommended Citation Van Buren, Jackelyn. "CUASI NOMÁS INGLÉS: PROSODY AT THE CROSSROADS OF SPANISH AND ENGLISH IN 20TH CENTURY NEW MEXICO." (2017). https://digitalrepository.unm.edu/ling_etds/55 This Dissertation is brought to you for free and open access by the Electronic Theses and Dissertations at UNM Digital Repository. It has been accepted for inclusion in Linguistics ETDs by an authorized administrator of UNM Digital Repository. For more information, please contact [email protected]. Jackelyn Van Buren Candidate Linguistics Department This dissertation is approved, and it is acceptable in quality and form for publication: Approved by the Dissertation Committee: Dr. Chris Koops, Chairperson Dr. Naomi Lapidus Shin Dr. Caroline Smith Dr. Damián Vergara Wilson i CUASI NOMÁS INGLÉS: PROSODY AT THE CROSSROADS OF SPANISH AND ENGLISH IN 20TH CENTURY NEW MEXICO by JACKELYN VAN BUREN B.A., Linguistics, University of Utah, 2009 M.A., Linguistics, University of Montana, 2012 DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Linguistics The University of New Mexico Albuquerque, New Mexico December 2017 ii Acknowledgments A dissertation is not written without the support of a community of peers and loved ones. Now that the journey has come to an end, and I have grown as a human and a scholar and a friend throughout this process (and have gotten married, become an aunt, bought a house, and gone through an existential crisis), I can reflect on the people who have been the foundation for every change I have gone through. -
The Iafor European Conference Series 2014 Ece2014 Ecll2014 Ectc2014 Official Conference Proceedings ISSN: 2188-1138
the iafor european conference series 2014 ece2014 ecll2014 ectc2014 Official Conference Proceedings ISSN: 2188-1138 “To Open Minds, To Educate Intelligence, To Inform Decisions” The International Academic Forum provides new perspectives to the thought-leaders and decision-makers of today and tomorrow by offering constructive environments for dialogue and interchange at the intersections of nation, culture, and discipline. Headquartered in Nagoya, Japan, and registered as a Non-Profit Organization 一般社( 団法人) , IAFOR is an independent think tank committed to the deeper understanding of contemporary geo-political transformation, particularly in the Asia Pacific Region. INTERNATIONAL INTERCULTURAL INTERDISCIPLINARY iafor The Executive Council of the International Advisory Board IAB Chair: Professor Stuart D.B. Picken IAB Vice-Chair: Professor Jerry Platt Mr Mitsumasa Aoyama Professor June Henton Professor Frank S. Ravitch Director, The Yufuku Gallery, Tokyo, Japan Dean, College of Human Sciences, Auburn University, Professor of Law & Walter H. Stowers Chair in Law USA and Religion, Michigan State University College of Law Professor David N Aspin Professor Emeritus and Former Dean of the Faculty of Professor Michael Hudson Professor Richard Roth Education, Monash University, Australia President of The Institute for the Study of Long-Term Senior Associate Dean, Medill School of Journalism, Visiting Fellow, St Edmund’s College, Cambridge Economic Trends (ISLET) Northwestern University, Qatar University, UK Distinguished Research Professor of Economics, -
Semi-Automated Ontology Based Question Answering System Open Access
Journal of Advanced Research in Computing and Applications 17, Issue 1 (2019) 1-5 Journal of Advanced Research in Computing and Applications Journal homepage: www.akademiabaru.com/arca.html ISSN: 2462-1927 Open Semi-automated Ontology based Question Answering System Access 1, Khairul Nurmazianna Ismail 1 Faculty of Computer Science, Universiti Teknologi MARA Melaka, Malaysia ABSTRACT Question answering system enable users to retrieve exact answer for questions submit using natural language. The demand of this system increases since it able to deliver precise answer instead of list of links. This study proposes ontology-based question answering system. Research consist explanation of question answering system architecture. This question answering system used semi-automatic ontology development (Ontology Learning) approach to develop its ontology. Keywords: Question answering system; knowledge Copyright © 2019 PENERBIT AKADEMIA BARU - All rights reserved base; ontology; online learning 1. Introduction In the era of WWW, people need to search and get information fast online or offline which make people rely on Question Answering System (QAS). One of the common and traditionally use QAS is Frequently Ask Question site which contains common and straight forward answer. Currently, QAS have emerge as powerful platform using various techniques such as information retrieval, Knowledge Base, Natural Language Processing and Hybrid Based which enable user to retrieve exact answer for questions posed in natural language using either pre-structured database or a collection of natural language documents [1,4]. The demand of QAS increases day by day since it delivers short, precise and question-specific answer [10]. QAS with using knowledge base paradigm are better in Restricted- domain QA system since it ability to focus [12]. -
Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format
Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format Alina Wróblewska Institute of Computer Science Polish Academy of Sciences ul. Jana Kazimierza 5 01-248 Warsaw, Poland [email protected] Abstract even for languages with rich morphology and rel- atively free word order, such as Polish. The paper presents the largest Polish Depen- The supervised learning methods require gold- dency Bank in Universal Dependencies for- mat – PDBUD – with 22K trees and 352K standard training data, whose creation is a time- tokens. PDBUD builds on its previous ver- consuming and expensive process. Nevertheless, sion, i.e. the Polish UD treebank (PL-SZ), dependency treebanks have been created for many and contains all 8K PL-SZ trees. The PL- languages, in particular within the Universal De- SZ trees are checked and possibly corrected pendencies initiative (UD, Nivre et al., 2016). in the current edition of PDBUD. Further The UD leaders aim at developing a cross- 14K trees are automatically converted from linguistically consistent tree annotation schema a new version of Polish Dependency Bank. and at building a large multilingual collection of The PDBUD trees are expanded with the en- hanced edges encoding the shared dependents dependency treebanks annotated according to this and the shared governors of the coordinated schema. conjuncts and with the semantic roles of some Polish is also represented in the Universal dependents. The conducted evaluation exper- Dependencies collection. There are two Polish iments show that PDBUD is large enough treebanks in UD: the Polish UD treebank (PL- for training a high-quality graph-based depen- SZ) converted from Składnica zalezno˙ sciowa´ 1 and dency parser for Polish. -
ELECTRONIC DICTIONARY Press Y to Select Alphabet Character Input Or Press N to Selecting a Menu Item 12 Select Japanese Input
ELECTRONIC DICTIONARY Press Y to select alphabet character input or press N to Selecting a menu item 12 select Japanese input. PW-AC890 The date/time settings screen is displayed. Press メニュー . QUICK REFERENCE 1 13 Select the date items using or , and then enter “年” Use or to select a category menu item. (year), “月” (month) and “日” (day) (e.g. June 23th, 2009 → 2 Or, use the numeric keys to enter the category number to Layout 09 06 23) using the number buttons on the handwriting pad. select the item. Utility keys for Display(Main display) dictionaries / functions Confirm that the cursor is on “AM(午前)” or “PM(午後)”, The individual menu for the selected category menu item is displayed. / touch pad and then select one of them using or . In the individual menu, use or to select the content/ Library key 3 Selection keys Press , select the time items using or and then function and then press 検索/決定 . for contents / functions enter “時” (hour) and “分” (minute) (e.g. 9:00 → 09 00). Or, use the numeric keys ( 1 to 9 ) to enter the number Charge lamp Stylus holder(side) Confirm that the information entered is correct and press in front of the content/function ( 1 to 9 ). Global search keys 14 検索/決定 . The selected content/function screen is displayed. Power ON/OFF key The menu display appears. ● The selected content/function screen can also be selected by touching the relevant item on the category menu or the individual menu. Menu key Function key Selecting a content in the menu display Touch operations AC adapter connector (side) Character size (large/small) The PW-AC890 can be operated by touching the main screen with the stylus. -
Kernerman Kdictionaries.Com/Kdn DICTIONARY News the European Network of E-Lexicography (Enel) Tanneke Schoonheim
Number 22 ● July 2014 Kernerman kdictionaries.com/kdn DICTIONARY News The European Network of e-Lexicography (ENeL) Tanneke Schoonheim On October 11th 2013, the kick-off meeting of the European production and reception of dictionaries. The internet offers Network of e-Lexicography (ENeL) project took place in entirely new possibilities for developing and presenting Brussels. This meeting was the outcome of an idea ventilated dictionary information, such as with the integration of sound, a year and a half earlier, in March 2012 in Berlin, at the maps or video, and various novel ways of interacting with European Workshop on Future Standards in Lexicography. dictionary users. For editors of scholarly dictionaries the new The workshop participants then confirmed the imperative to medium is not only a source of inspiration, it also generates coordinate and harmonise research in the field of (electronic) new and serious challenges that demand cooperation and lexicography across Europe, namely to share expertise relating standardization on various levels: to standards, discuss new methodologies in lexicography that a. Through the internet scholarly dictionaries can potentially fully exploit the possibilities of the digital medium, reflect on reach large audiences. However, at present scholarly the pan-European nature of the languages of Europe and attain dictionaries providing reliable information are often not easy a wider audience. to find and are hard to decode for a non-academic audience; A proposal was written by a team of researchers from -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
Semantic Role Labeling 2
Semantic Role Labeling 2 Outline • Semantic role theory • Designing semantic role annotation project ▫ Granularity ▫ Pros and cons of different role schemas ▫ Multi-word expressions 3 Outline • Semantic role theory • Designing semantic role annotation project ▫ Granularity ▫ Pros and cons of different role schemas ▫ Multi-word expressions 4 Semantic role theory • Predicates tie the components of a sentence together • Call these components arguments • [John] opened [the door]. 5 Discovering meaning • Syntax only gets you so far in answering “Who did what to whom?” John opened the door. Syntax: NPSUB V NPOBJ The door opened. Syntax: NPSUB V 6 Discovering meaning • Syntax only gets you so far in answering “Who did what to whom?” John opened the door. Syntax: NPSUB V NPOBJ Semantic roles: Opener REL thing opened The door opened. Syntax: NPSUB V Semantic roles: thing opened REL 7 Can the lexicon account for this? • Is there a different sense of open for each combination of roles and syntax? • Open 1: to cause something to become open • Open 2: become open • Are these all the senses we would need? (1) John opened the door with a crowbar. Open1? (2) They tried the tools in John’s workshop one after the other, and finally the crowbar opened the door. Still Open1? 8 Fillmore’s deep cases • Correspondence between syntactic case and semantic role that participant plays • “Deep cases”: Agentive, Objective, Dative, Instrument, Locative, Factitive • Loosely associated with syntactic cases; transformations result in the final surface case 9 The door opened. Syntax: NPSUB V Semantic roles: Objective REL John opened the door. Syntax: NPSUB V NPOBJ Semantic roles: Agentive REL Objective The crowbar opened the door. -
A Comparison of Knowledge Extraction Tools for the Semantic Web
A Comparison of Knowledge Extraction Tools for the Semantic Web Aldo Gangemi1;2 1 LIPN, Universit´eParis13-CNRS-SorbonneCit´e,France 2 STLab, ISTC-CNR, Rome, Italy. Abstract. In the last years, basic NLP tasks: NER, WSD, relation ex- traction, etc. have been configured for Semantic Web tasks including on- tology learning, linked data population, entity resolution, NL querying to linked data, etc. Some assessment of the state of art of existing Knowl- edge Extraction (KE) tools when applied to the Semantic Web is then desirable. In this paper we describe a landscape analysis of several tools, either conceived specifically for KE on the Semantic Web, or adaptable to it, or even acting as aggregators of extracted data from other tools. Our aim is to assess the currently available capabilities against a rich palette of ontology design constructs, focusing specifically on the actual semantic reusability of KE output. 1 Introduction We present a landscape analysis of the current tools for Knowledge Extraction from text (KE), when applied on the Semantic Web (SW). Knowledge Extraction from text has become a key semantic technology, and has become key to the Semantic Web as well (see. e.g. [31]). Indeed, interest in ontology learning is not new (see e.g. [23], which dates back to 2001, and [10]), and an advanced tool like Text2Onto [11] was set up already in 2005. However, interest in KE was initially limited in the SW community, which preferred to concentrate on manual design of ontologies as a seal of quality. Things started changing after the linked data bootstrapping provided by DB- pedia [22], and the consequent need for substantial population of knowledge bases, schema induction from data, natural language access to structured data, and in general all applications that make joint exploitation of structured and unstructured content.