Republic of Turkey Çukurova University Institute of Social Sciences Department of English Language Teaching

Total Page:16

File Type:pdf, Size:1020Kb

Republic of Turkey Çukurova University Institute of Social Sciences Department of English Language Teaching REPUBLIC OF TURKEY ÇUKUROVA UNIVERSITY INSTITUTE OF SOCIAL SCIENCES DEPARTMENT OF ENGLISH LANGUAGE TEACHING A CORPUS-BASED STUDY ON TURKISH EFL LEARNERS’ WRITTEN ENGLISH: THE USE OF ADVERBIAL CONNECTORS BY TURKISH LEARNERS M. Pınar BABANOĞLU A PHD DISSERTATION ADANA, 2012 REPUBLIC OF TURKEY ÇUKUROVA UNIVERSITY INSTITUTE OF SOCIAL SCIENCES DEPARTMENT OF ENGLISH LANGUAGE TEACHING A CORPUS-BASED STUDY ON TURKISH EFL LEARNERS’ WRITTEN ENGLISH: THE USE OF ADVERBIAL CONNECTORS BY TURKISH LEARNERS M. Pınar BABANOĞLU Supervisor: Asst. Prof. Dr. Cem CAN A PHD DISSERTATION ADANA, 2012 To the Directorship of the Institute of Social Sciences, Çukurova University We certify that this dissertation is satisfactory for the award of the Degree of Doctor of Philosophy. Supervisor: Asst. Prof. Dr. Cem CAN Member of Examining Committee: Prof. Dr. Hatice SOFU Member of Examining Committee: Assoc. Prof. Dr. Ahmet DOĞANAY Member of Examining Committee: Asst. Prof. Dr. Hülya YUMRU Member of Examining Committee: Asst. Prof. Dr. Hasan BEDİR I certify that this dissertation confirms to the formal standards of the Institute of Social Sciences. ...../...... /2012 Prof. Dr. Azmi YALÇIN Director of the Institute Note: The uncited usage of the reports, charts, figures, photographs in this thesis, whether original or quoted from other sources, is subject to the law of Works of Art and Thought No: 5846. Not: Bu tezde kullanılan özgün ve başka kaynaktan yapılan bildirişlerin, çizelge, şekil ve fotoğrafların kaynak gösterilmeden kullanımı, 5846 Sayılı Fikir ve Sanat Eserleri Kanunu’ndaki hükümlere tabidir. iii ÖZET YABANCI DİL OLARAK İNGİLİZE ÖĞRENEN TÜRK ÖĞRENENLERİN YAZILI ANLATIMLARINDA DERLEME DAYALI BİR ÇALIŞMA: TÜRK ÖĞRENCİLER TARAFINDAN ZARF BAĞLAÇLARIN KULLANIMI M. Pınar BABANOĞLU Doktora Tezi, İngiliz Dili Eğitimi Anabilim Dalı Danışman: Yrd. Doç. Dr. Cem CAN Haziran 2012, 202 Sayfa Aradil, uzun zamandır ikinci/yabancı dil edinimi araştırmalarının önemli bir konusu olmuştur. Birincil amaç ikinci dil edinimi ve süreci ile ilgili daha iyi tanımlamalar yapmaktır. Öğrenen dili ile ilgili yeni bir düşünce olan Bilgisayarlı Aradil Derlemi (Computer Learner Corpus), ikinci dil edinimi alanında tanımlayıcı ve dikkate değer deneysel bir öğrenen veri kaynağı sunmaktadır (Granger, 2004). Aradil derlemi, yabancı dil öğrenenlerce üretilen dilden oluşturulan bilgisayar kaynaklı veri tabanıdır (Leech, 1998). Bu aradil derlemi, İngilizce öğrenenlerin dilbilgisi, sözcük düzlemi ve bir kompozisyon yazarken karşlaştıkları zorlukları araştırmak için öğrenenlerin yazılı ürünlerinden oluşan güvenilir bir veri sağlamaktadır. Aradil hakkında daha iyi bir anlayışa sahip olmak için aradil derlemi yoluyla aradil ile aradil araştırması üzerine kurulu birçok derleme dayalı çalışma yapılmıştır (Altenber & Tapper, 1998; Granger & Tyson, 1998; Aijmer, 2002; Housen; 2002; Neff et al., 2003; Narita et. al., 2004). Bu çalışmada, ikinci dil olarak İngilizce öğrenen Türk öğrenenlerin İngilizce metinlerinde ki zarf bağlaç kullanımı araştırılmıştır. Zarf bağlaçlar, bu kullanımın eğer varsa olası bir anadil aktarımından ve farklı anadil artlanlarından gelen öğrenenler arasında ortak aradil özelliklerinin bulunup bulunmadığı açısından incelenmiştir. Çalışmada, zarf bağlaç kullanımında farklı öğrenenler arasında bazı ortak aradil özelliklerine rastlanmıştır. Ayrıca, Türk öğrenenlerin zarf bağlaç kullanmlarında anadil aktarımı adına bazı anadil etkileri bulunmuştur. Anahtar Kelimeler: Bilgisayarlı Aradil Derlemi, Zarf Bağlaçlar, Uluslararası İngilizce Öğrenen Derlemi, Uluslararası Türk Öğrenenler Derlemi. iv ABSTRACT A CORPUS-BASED STUDY ON TURKISH EFL LEARNERS’ WRITTEN ENGLISH: THE USE OF ADVERBIAL CONNECTORS BY TURKISH LEARNERS M. Pınar BABANOĞLU Ph.D. Dissertation, English Language Teaching Department Supervisor: Asst. Prof. Dr. Cem CAN June 2012, 202 Pages Investigation of interlanguage has long been an important subject of second and foreign language acquisition research. The primary goal is to provide better descriptions for SLA and its process. Computer Learner Corpus (CLC), which is a new way of thinking about learner language (Granger, 2004), offers a source of learner data suggesting empirical base for a remarkable and descriptive contributions in the field of SLA. Learner corpus is the computer texture database formed by the language produced by foreign language learners (Leech, 1998). This interlanguage corpora provides a reliable data of learners written production in order to examine the learner grammar and lexis and the main difficulties experienced by learners of English when writing an essay. Many corpus-based studies have been conducted on interlanguage investigation through learner corpora (Altenber & Tapper, 1998; Granger & Tyson, 1998; Aijmer, 2002; Housen; 2002; Neff et al., 2003; Narita et. al., 2004) to gain insight for a better understanding of learner language. In the present study, the use of adverbial connector in L2 writings of Turkish adult learners has been investigated. Adverbial connectors have been examined whether, such usage is effected by a possible transfer from mother tongue and there is a common interlanguage properties among learners from different mother tongue backgrounds. In the study, some common interlanguage properties among different EFL learners have been identified in use of adverbial connectors. In addition, some features in the use of adverbial connectors by Turkish EFL learners have been found in respect of L1 transfer. v Keywords: Computer Learner Corpus (CLC), Adverbial Connectors, Turkish International Corpus of Learner English (TICLE), International Corpus of Learner English (ICLE). vi ACKNOWLEDGEMENTS This dissertation represents a milestone in my educational and professional career. During the process of preparation and completing of this project, I have received valuable support anf help from a number of people to whom I owe thanks and would like to acknowledge. First of all, I would like to express my deepest gratitude to my supervisor, Professor Cem CAN who has provided tremendious support and mentorship througout my doctoral time . His guidiance not only has led to the complation of this study, but also he fostered my academic development. My most sincere thanks go to him. I would like to present my great thanks to Professor Hatice Sofu who has provided valuable support for writing this dissertation with her profound knowledge. Her insightful comments and constructive suggestions on each draft have facilitated the creation of this dissertation. I would also like to extend my great thanks to Professor Ahmet Doğanay for his invaluable comments and suggestions especially in statistical considerations of the dissertation. I am also very greatful to him for his patience and kindness. I would also like to thank to Professor Hülya Yumru and Professor Hasan Bedir for accepting to be a comitee member. My special thanks go to Professor Mike Scott who provided stimulating comments and important suggestions for the dissertations during my research visit at Aston University. I am fortunate for receiving his co-supervision and great advices. I am also thankful to my close friend Eliz Can for her valuable friendship and for encouraging me during the process with her generous kindness. Finally, my special thanks go to my precious family member who deserve my special thanks most. I eternally gratitude to my dear mom Fatma Babanoğlu for her constant love, patience and invaluable support. Without her precious parentship, I do not belive that I can go on in my life and I owe her so much. I am also greatful to my dad for his strong supports and always being there. I truly thank to my brother Onur Babanoğlu for his valuable helps as well as to my brother Cumhur Babanoğlu for being there as my family member. vii TABLE OF CONTENTS Pages ÖZET ........................................................................................................................ iii ABSTRACT....................................... ........................................................................ iv ACKNOWLEDGEMENTS ........................ ................................................................ v LIST OF ABBREVIATIONS ............................. ........................................................ x LIST OF TABLES .................................................. .................................................. xii LIST OF FIGURES ...................................................... ............................................ xv LIST OF APPENDICES ................................................... ...................................... xvii CHAPTER I INTRODUCTION 1.1. General Background .............................................................................................. 1 1.1.1. Corpus Linguistics and Corpus Research ...................................................... 1 1.1.2. Corpus Approaches in Language Studies ..................................................... 2 1.2. Research Background ............................................................................................ 4 1.3. Statement of the Problem........................................................................................ 6 1.4. Purpose of the Study .............................................................................................. 7 1.5. Research Questions ............................................................................................... 7 1.6. Limitations ...........................................................................................................
Recommended publications
  • Machine-Translation Inspired Reordering As Preprocessing for Cross-Lingual Sentiment Analysis
    Master in Cognitive Science and Language Master’s Thesis September 2018 Machine-translation inspired reordering as preprocessing for cross-lingual sentiment analysis Alejandro Ramírez Atrio Supervisor: Toni Badia Abstract In this thesis we study the effect of word reordering as preprocessing for Cross-Lingual Sentiment Analysis. We try different reorderings in two target languages (Spanish and Catalan) so that their word order more closely resembles the one from our source language (English). Our original expectation was that a Long Short Term Memory classifier trained on English data with bilingual word embeddings would internalize English word order, resulting in poor performance when tested on a target language with different word order. We hypothesized that the more the word order of any of our target languages resembles the one of our source language, the better the overall performance of our sentiment classifier would be when analyzing the target language. We tested five sets of transformation rules for our Part of Speech reorderings of Spanish and Catalan, extracted mainly from two sources: two papers by Crego and Mariño (2006a and 2006b) and our own empirical analysis of two corpora: CoStEP and Tatoeba. The results suggest that the bilingual word embeddings that we are training our Long Short Term Memory model with do not improve any English word order learning by part of the model when used cross-lingually. There is no improvement when reordering the Spanish and Catalan texts so that their word order more closely resembles English, and no significant drop in result score even when applying a random reordering to them making them almost unintelligible, neither when classifying between 2 options (positive-negative) nor between 4 (strongly positive, positive, negative, strongly negative).
    [Show full text]
  • FERSIWN GYMRAEG ISOD the National Corpus of Contemporary
    FERSIWN GYMRAEG ISOD The National Corpus of Contemporary Welsh Project Report, October 2020 Authors: Dawn Knight1, Steve Morris2, Tess Fitzpatrick2, Paul Rayson3, Irena Spasić and Enlli Môn Thomas4. 1. Introduction 1.1. Purpose of this report This report provides an overview of the CorCenCC project and the online corpus resource that was developed as a result of work on the project. The report lays out the theoretical underpinnings of the research, demonstrating how the project has built on and extended this theory. We also raise and discuss some of the key operational questions that arose during the course of the project, outlining the ways in which they were answered, the impact of these decisions on the resource that has been produced and the longer-term contribution they will make to practices in corpus-building. Finally, we discuss some of the applications and the utility of the work, outlining the impact that CorCenCC is set to have on a range of different individuals and user groups. 1.2. Licence The CorCenCC corpus and associated software tools are licensed under Creative Commons CC-BY-SA v4 and thus are freely available for use by professional communities and individuals with an interest in language. Bespoke applications and instructions are provided for each tool (for links to all tools, refer to section 10 of this report). When reporting information derived by using the CorCenCC corpus data and/or tools, CorCenCC should be appropriately acknowledged (see 1.3). § To access the corpus visit: www.corcencc.org/explore § To access the GitHub site: https://github.com/CorCenCC o GitHub is a cloud-based service that enables developers to store, share and manage their code and datasets.
    [Show full text]
  • Distributional Properties of Verbs in Syntactic Patterns Liam Considine
    Early Linguistic Interactions: Distributional Properties of Verbs in Syntactic Patterns Liam Considine The University of Michigan Department of Linguistics April 2012 Advisor: Nick Ellis Acknowledgements: I extend my sincerest gratitude to Nick Ellis for agreeing to undertake this project with me. Thank you for cultivating, and partaking in, some of the most enriching experiences of my undergraduate education. The extensive time and energy you invested here has been invaluable to me. Your consistent support and amicable demeanor were truly vital to this learning process. I want to thank my second reader Ezra Keshet for consenting to evaluate this body of work. Other thanks go out to Sarah Garvey for helping with precision checking, and Jerry Orlowski for his R code. I am also indebted to Mary Smith and Amanda Graveline for their participation in our weekly meetings. Their presence gave audience to the many intermediate challenges I faced during this project. I also need to thank my roommate Sean and all my other friends for helping me balance this great deal of work with a healthy serving of fun and optimism. Abstract: This study explores the statistical distribution of verb type-tokens in verb-argument constructions (VACs). The corpus under investigation is made up of longitudinal child language data from the CHILDES database (MacWhinney 2000). We search a selection of verb patterns identified by the COBUILD pattern grammar project (Francis, Hunston, Manning 1996), these include a number of verb locative constructions (e.g. V in N, V up N, V around N), verb object locative caused-motion constructions (e.g.
    [Show full text]
  • The National Corpus of Contemporary Welsh 1. Introduction
    The National Corpus of Contemporary Welsh Project Report, October 2020 1. Introduction 1.1. Purpose of this report This report provides an overview of the CorCenCC project and the online corpus resource that was developed as a result of work on the project. The report lays out the theoretical underpinnings of the research, demonstrating how the project has built on and extended this theory. We also raise and discuss some of the key operational questions that arose during the course of the project, outlining the ways in which they were answered, the impact of these decisions on the resource that has been produced and the longer-term contribution they will make to practices in corpus-building. Finally, we discuss some of the applications and the utility of the work, outlining the impact that CorCenCC is set to have on a range of different individuals and user groups. 1.2. Licence The CorCenCC corpus and associated software tools are licensed under Creative Commons CC-BY-SA v4 and thus are freely available for use by professional communities and individuals with an interest in language. Bespoke applications and instructions are provided for each tool (for links to all tools, refer to section 10 of this report). When reporting information derived by using the CorCenCC corpus data and/or tools, CorCenCC should be appropriately acknowledged (see 1.3). § To access the corpus visit: www.corcencc.org/explore § To access the GitHub site: https://github.com/CorCenCC o GitHub is a cloud-based service that enables developers to store, share and manage their code and datasets.
    [Show full text]
  • Actes Des 2Èmes Journées Scientifiques Du Groupement De Recherche Linguistique Informatique Formelle Et De Terrain (LIFT)
    Actes des 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT). Thierry Poibeau, Yannick Parmentier, Emmanuel Schang To cite this version: Thierry Poibeau, Yannick Parmentier, Emmanuel Schang. Actes des 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT).. Poibeau, Thierry and Parmentier, Yannick and Schang, Emmanuel. Dec 2020, Montrouge, France. CNRS, 2020. hal-03066031 HAL Id: hal-03066031 https://hal.archives-ouvertes.fr/hal-03066031 Submitted on 3 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. LIFT 2020 Actes des 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT) Thierry Poibeau, Yannick Parmentier, Emmanuel Schang (Éds.) 10-11 décembre 2020 Montrouge, France (Virtuel) Comités Comité d’organisation Thierry Poibeau, LATTICE/CNRS Yannick Parmentier, LORIA/Université de Lorraine Emmanuel Schang, LLL/Université d’Orléans Comité de programme Angélique Amelot LPP/CNRS
    [Show full text]
  • Projecto Em Engenharia Informatica
    UNIVERSIDADE DE LISBOA Faculdade de Cienciasˆ Departamento de Informatica´ SPEECH-TO-SPEECH TRANSLATION TO SUPPORT MEDICAL INTERVIEWS Joao˜ Antonio´ Santos Gomes Rodrigues PROJETO MESTRADO EM ENGENHARIA INFORMATICA´ Especializac¸ao˜ em Interacc¸ao˜ e Conhecimento 2013 UNIVERSIDADE DE LISBOA Faculdade de Cienciasˆ Departamento de Informatica´ SPEECH-TO-SPEECH TRANSLATION TO SUPPORT MEDICAL INTERVIEWS Joao˜ Antonio´ Santos Gomes Rodrigues PROJETO MESTRADO EM ENGENHARIA INFORMATICA´ Especializac¸ao˜ em Interacc¸ao˜ e Conhecimento Projeto orientado pelo Prof. Doutor Antonio´ Manuel Horta Branco 2013 Agradecimentos Agradec¸o ao meu orientador, Antonio´ Horta Branco, por conceder a oportunidade de realizar este trabalho e pela sua sagacidade e instruc¸ao˜ oferecida para a concretizac¸ao˜ do mesmo. Agradec¸o aos meus parceiros do grupo NLX, por terem aberto os caminhos e cri- ado inumeras´ ferramentas e conhecimento para o processamento de linguagem natural do portugues,ˆ das quais fac¸o uso neste trabalho. Assim como pela imprescind´ıvel partilha e companhia. Agradec¸o a` minha fam´ılia e a` Joana, pelo apoio inefavel.´ Este trabalho foi apoiado pela Comissao˜ Europeia no ambitoˆ do projeto METANET4U (ECICTPSP:270893), pela Agenciaˆ de Inovac¸ao˜ no ambitoˆ do projeto S4S (S4S / PLN) e pela Fundac¸ao˜ para a Cienciaˆ e Tecnologia no ambitoˆ do projeto DP4LT (PTDC / EEISII / 1940 / 2012). As` tresˆ expresso a minha sincera gratidao.˜ iii Resumo Este relatorio´ apresenta a criac¸ao˜ de um sistema de traduc¸ao˜ fala-para-fala. O sistema consiste na captac¸ao˜ de voz na forma de sinal audio´ que de seguida e´ interpretado, tra- duzido e sintetizado para voz. Tendo como entrada um enunciado numa linguagem de origem e como sa´ıda um enunciado numa linguagem destino.
    [Show full text]
  • Edited Proceedings 1July2016
    UK-CLC 2016 Conference Proceedings UK-CLC 2016 preface Preface UK-CLC 2016 Bangor University, Bangor, Gwynedd, 18-21 July, 2016. This volume contains the papers and posters presented at UK-CLC 2016: 6th UK Cognitive Linguistics Conference held on July 18-21, 2016 in Bangor (Gwynedd). Plenary speakers Penelope Brown (Max Planck Institute for Psycholinguistics, Nijmegen, NL) Kenny Coventry (University of East Anglia, UK) Vyv Evans (Bangor University, UK) Dirk Geeraerts (University of Leuven, BE) Len Talmy (University at Buffalo, NY, USA) Dedre Gentner (Northwestern University, IL, USA) Organisers Chair Thora Tenbrink Co-chairs Anouschka Foltz Alan Wallington Local management Javier Olloqui Redondo Josie Ryan Local support Eleanor Bedford Advisors Christopher Shank Vyv Evans Sponsors and support Sponsors: John Benjamins, Brill Poster prize by Tracksys Student prize by De Gruyter Mouton June 23, 2016 Thora Tenbrink Bangor, Gwynedd Anouschka Foltz Alan Wallington Javier Olloqui Redondo Josie Ryan Eleanor Bedford UK-CLC 2016 Programme Committee Programme Committee Michel Achard Rice University Panos Athanasopoulos Lancaster University John Barnden The University of Birmingham Jóhanna Barðdal Ghent University Ray Becker CITEC - Bielefeld University Silke Brandt Lancaster University Cristiano Broccias Univeristy of Genoa Sam Browse Sheffield Hallam University Gareth Carrol University of Nottingham Paul Chilton Lancaster University Alan Cienki Vrije Universiteit (VU) Timothy Colleman Ghent University Louise Connell Lancaster University Seana Coulson
    [Show full text]
  • The Spoken British National Corpus 2014
    The Spoken British National Corpus 2014 Design, compilation and analysis ROBBIE LOVE ESRC Centre for Corpus Approaches to Social Science Department of Linguistics and English Language Lancaster University A thesis submitted to Lancaster University for the degree of Doctor of Philosophy in Linguistics September 2017 Contents Contents ............................................................................................................. i Abstract ............................................................................................................. v Acknowledgements ......................................................................................... vi List of tables .................................................................................................. viii List of figures .................................................................................................... x 1 Introduction ............................................................................................... 1 1.1 Overview .................................................................................................................................... 1 1.2 Research aims & structure ....................................................................................................... 3 2 Literature review ........................................................................................ 6 2.1 Introduction ..............................................................................................................................
    [Show full text]
  • New Kazakh Parallel Text Corpora with On-Line Access
    New Kazakh parallel text corpora with on-line access Zhandos Zhumanov1, Aigerim Madiyeva2 and Diana Rakhimova3 al-Farabi Kazakh National University, Laboratory of Intelligent Information Systems, Almaty, Kazakhstan [email protected], [email protected], [email protected] Abstract. This paper presents a new parallel resource – text corpora – for Ka- zakh language with on-line access. We describe 3 different approaches to col- lecting parallel text and how much data we managed to collect using them, par- allel Kazakh-English text corpora collected from various sources and aligned on sentence level, and web accessible corpus management system that was set up using open source tools – corpus manager Mantee and web GUI KonText. As a result of our work we present working web-accessible corpus management sys- tem to work with collected corpora. Keywords: parallel text corpora, Kazakh language, corpus management system 1 Introduction Linguistic text corpora are large collections of text used for different language studies. They are used in linguistics and other fields that deal with language studies as an ob- ject of study or as a resource. Text corpora are needed for almost any language study since they are basically a representation of the language itself. In computer linguistics text corpora are used for various parsing, machine translation, speech recognition, etc. As shown in [1] text corpora can be classified by many categories: Size: small (to 1 million words); medium (from 1 to 10 million words); large (more than 10 million words). Number of Text Languages: monolingual, bilingual, multilingual. Language of Texts: English, German, Ukrainian, etc. Mode: spoken; written; mixed.
    [Show full text]
  • The Corpus of Contemporary American English As the First Reliable Monitor Corpus of English
    The Corpus of Contemporary American English as the first reliable monitor corpus of English ............................................................................................................................................................ Mark Davies Brigham Young University, Provo, UT, USA ....................................................................................................................................... Abstract The Corpus of Contemporary American English is the first large, genre-balanced corpus of any language, which has been designed and constructed from the ground up as a ‘monitor corpus’, and which can be used to accurately track and study recent changes in the language. The 400 million words corpus is evenly divided between spoken, fiction, popular magazines, newspapers, and academic journals. Most importantly, the genre balance stays almost exactly the same from year to year, which allows it to accurately model changes in the ‘real world’. After discussing the corpus design, we provide a number of concrete examples of how the corpus can be used to look at recent changes in English, including morph- ology (new suffixes –friendly and –gate), syntax (including prescriptive rules, Correspondence: quotative like, so not ADJ, the get passive, resultatives, and verb complementa- Mark Davies, Linguistics and tion), semantics (such as changes in meaning with web, green, or gay), and lexis–– English Language, Brigham including word and phrase frequency by year, and using the corpus architecture Young University.
    [Show full text]
  • Multilingual Semantic Role Labeling with Unified Labels
    POLYGLOT: Multilingual Semantic Role Labeling with Unified Labels Alan Akbik Yunyao Li IBM Research Almaden Research Center 650 Harry Road, San Jose, CA 95120, USA fakbika,[email protected] Abstract Semantic role labeling (SRL) identifies the predicate-argument structure in text with semantic labels. It plays a key role in un- derstanding natural language. In this pa- per, we present POLYGLOT, a multilingual semantic role labeling system capable of semantically parsing sentences in 9 differ- Figure 1: Example of POLYGLOT predicting English Prop- ent languages from 4 different language Bank labels for a simple German sentence: The verb “kaufen” is correctly identified to evoke the BUY.01 frame, while “ich” groups. The core of POLYGLOT are SRL (I) is recognized as the buyer,“ein neues Auto”(a new car) models for individual languages trained as the thing bought, and “dir”(for you) as the benefactive. with automatically generated Proposition Banks (Akbik et al., 2015). The key fea- Not surprisingly, enabling SRL for languages ture of the system is that it treats the other than English has received increasing atten- semantic labels of the English Proposi- tion. One relevant key effort is to create Proposi- tion Bank as “universal semantic labels”: tion Bank-style resources for different languages, Given a sentence in any of the supported such as Chinese (Xue and Palmer, 2005) and languages, POLYGLOT applies the corre- Hindi (Bhatt et al., 2009). However, the conven- sponding SRL and predicts English Prop- tional approach of manually generating such re- Bank frame and role annotation. The re- sources is costly in terms of time and experts re- sults are then visualized to facilitate the quired, hindering the expansion of SRL to new tar- understanding of multilingual SRL with get languages.
    [Show full text]
  • Language, Communities & Mobility
    The 9th Inter-Varietal Applied Corpus Studies (IVACS) Conference Language, Communities & Mobility University of Malta June 13 – 15, 2018 IVACS 2018 Wednesday 13th June 2018 The conference is to be held at the University of Malta Valletta Campus, St. Paul’s Street, Valletta. 12:00 – 12:45 Conference Registration 12:45 – 13:00 Conferencing opening welcome and address University of Malta, Dean of the Faculty of Arts: Prof. Dominic Fenech Room: Auditorium 13:00 – 14:00 Plenary: Dr Robbie Love, University of Leeds Overcoming challenges in corpus linguistics: Reflections on the Spoken BNC2014 Room: Auditorium Rooms Auditorium (Level 2) Meeting Room 1 (Level 0) Meeting Room 2 (Level 0) Meeting Room 3 (Level 0) 14:00 – 14:30 Adverb use in spoken ‘Yeah, no, everyone seems to On discourse markers in Lexical Bundles in the interaction: insights and be saying that’: ‘New’ Lithuanian argumentative Description of Drug-Drug implications for the EFL pragmatic markers as newspaper discourse: a Interactions: A Corpus-Driven classroom represented in fictionalized corpus-based study Study - Pascual Pérez-Paredes & Irish English - Anna Ruskan - Lukasz Grabowski Geraldine Mark - Ana Mª Terrazas-Calero & Carolina Amador-Moreno 1 14:30 – 15:00 “Um so yeah we’re <laughs> So you have to follow The Study of Ideological Bias Using learner corpus data to you all know why we’re different methods and through Corpus Linguistics in inform the development of a here”: hesitation in student clearly there are so many Syrian Conflict News from diagnostic language tool and
    [Show full text]