NWAV 46 Booklet-Oct29
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Transfer Learning for Singlish Universal Dependencies Parsing and POS Tagging
From Genesis to Creole language: Transfer Learning for Singlish Universal Dependencies Parsing and POS Tagging HONGMIN WANG, University of California Santa Barbara, USA JIE YANG, Singapore University of Technology and Design, Singapore YUE ZHANG, West Lake University, Institute for Advanced Study, China Singlish can be interesting to the computational linguistics community both linguistically as a major low- resource creole based on English, and computationally for information extraction and sentiment analysis of regional social media. In our conference paper, Wang et al. [2017], we investigated part-of-speech (POS) tagging and dependency parsing for Singlish by constructing a treebank under the Universal Dependencies scheme, and successfully used neural stacking models to integrate English syntactic knowledge for boosting Singlish POS tagging and dependency parsing, achieving the state-of-the-art accuracies of 89.50% and 84.47% for Singlish POS tagging and dependency respectively. In this work, we substantially extend Wang et al. [2017] by enlarging the Singlish treebank to more than triple the size and with much more diversity in topics, as well as further exploring neural multi-task models for integrating English syntactic knowledge. Results show that the enlarged treebank has achieved significant relative error reduction of 45.8% and 15.5% on the base model, 27% and 10% on the neural multi-task model, and 21% and 15% on the neural stacking model for POS tagging and dependency parsing respectively. Moreover, the state-of-the-art Singlish POS tagging and dependency parsing accuracies have been improved to 91.16% and 85.57% respectively. We make our treebanks and models available for further research. -
Recognition of Nasalized and Non- Nasalized Vowels
RECOGNITION OF NASALIZED AND NON- NASALIZED VOWELS Submitted By: BILAL A. RAJA Advised by: Dr. Carol Espy Wilson and Tarun Pruthi Speech Communication Lab, Dept of Electrical & Computer Engineering University of Maryland, College Park Maryland Engineering Research Internship Teams (MERIT) Program 2006 1 TABLE OF CONTENTS 1- Abstract 3 2- What Is Nasalization? 3 2.1 – Production of Nasal Sound 3 2.2 – Common Spectral Characteristics of Nasalization 4 3 – Expected Results 6 3.1 – Related Previous Researches 6 3.2 – Hypothesis 7 4- Method 7 4.1 – The Task 7 4.2 – The Technique Used 8 4.3 – HMM Tool Kit (HTK) 8 4.3.1 – Data Preparation 9 4.3.2 – Training 9 4.3.3 – Testing 9 4.3.4 – Analysis 9 5 – Results 10 5.1 – Experiment 1 11 5.2 – Experiment 2 11 6 – Conclusion 12 7- References 13 2 1 - ABSTRACT When vowels are adjacent to nasal consonants (/m,n,ng/), they often become nasalized for at least some part of their duration. This nasalization is known to lead to changes in perceived vowel quality. The goal of this project is to verify if it is beneficial to first recognize nasalization in vowels and treat the three groups of vowels (those occurring before nasal consonants, those occurring after nasal consonants, and the oral vowels which are vowels that are not adjacent to nasal consonants) separately rather than collectively for recognizing the vowel identities. The standard Mel-Frequency Cepstral Coefficients (MFCCs) and the Hidden Markov Model (HMM) paradigm have been used for this purpose. The results show that when the system is trained on vowels in general, the recognition of nasalized vowels is 17% below that of oral vowels. -
Universidaddesonora
U N I V E R S I D A D D E S O N O R A División de Humanidades y Bellas Artes Maestría en Lingüística Nominal and Adjectival Predication in Yoreme/Mayo of Sonora and Sinaloa TESIS Que para optar por el grado de Maestra en Lingüística presenta Rosario Melina Rodríguez Villanueva 2012 Universidad de Sonora Repositorio Institucional UNISON Excepto si se seala otra cosa, la licencia del tem se describe como openAccess CONTENTS DEDICATION……………………………………………………………………….. 4 ACKNOWLEDGEMENTS………………………………………………………….. 5 ABBREVIATIONS………………………………………………………………….. 7 INTRODUCTION…………………………………………………………………… 12 CHAPTER 1: The Yoreme/Mayo and their language……………………………….. 16 1.1 Ethnographic and Sociolinguistic Context………………………………………. 16 1.1.1 Geographic Location of the Yoreme/Mayo…………………………………… 16 1.1.2 Social Organization of the Yoreme/Mayo……………………………………... 20 1.1.3 Economy and Working Trades………………………………………………… 21 1.1.4 Religion and Cosmogony……………………………………………………… 22 1.2 The Yoreme/Mayo language…………………………………………………….. 23 1.2.1 Geographical location and genetic affiliation………………………………….. 23 1.2.2 Phonology……………………………………………………………………… 27 1.2.2.1 Consonants………………………………………………………………….. 27 1.2.2.2 Vowels……………………………………………………………………….. 30 1.2.3 Typological Characteristics……………………………………………………. 33 1.2.3.1 Classification………………………………………………………………… 33 1.2.3.2 Marking: head or dependent?........................................................................... 35 1.2.3.3 Word Order…………………………………………………………………. 40 1.2.3.4 Case-Marking………………………………………………………………. 43 1.3 Previous Documentation and Description of Yoreme/Mayo……………………. 48 1 CHAPTER 2: Theoretical Preliminaries…………………………………………….. 52 2.0 Introduction……………………………………………………………………… 52 2.1 Predication: Verbal and Non-verbal 53 2.1.1 Definition………………………………………………………………………. 53 2.1.2 Verbal Predication……………………………………………………………. 56 2.1.3 Non-verbal Predication………………………………………………………… 61 2.2. The Syntax of Non-verbal Predication………………………………………… 71 2.2.1 Nominal Predication…………………………………………………………. -
Grammatical Gender in the German Multiethnolect Peter Auer & Vanessa Siegel
1 Grammatical gender in the German multiethnolect Peter Auer & Vanessa Siegel Contact: Deutsches Seminar, Universität Freiburg, D-79089 Freiburg [email protected], [email protected] Abstract While major restructurations and simplifications have been reported for gender systems of other Germanic languages in multiethnolectal speech, the article demonstrates that the three-fold gender distinction of German is relatively stable among young speakers of immigrant background. We inves- tigate gender in a German multiethnolect, based on a corpus of appr. 17 hours of spontaneous speech by 28 young speakers in Stuttgart (mainly of Turkish and Balkan backgrounds). German is not their second language, but (one of) their first language(s), which they have fully acquired from child- hood. We show that the gender system does not show signs of reduction in the direction of a two gender system, nor of wholesale loss. We also argue that the position of gender in the grammar is weakened by independent processes, such as the frequent use of bare nouns determiners in grammatical contexts where German requires it. Another phenomenon that weakens the position of gender is the simplification of adjective/noun agreement and the emergence of a generalized, gender-neutral suffix for pre-nominal adjectives (i.e. schwa). The disappearance of gender/case marking in the adjective means that the grammatical cat- egory of gender is lost in A + N phrases (without determiner). 1. Introduction Modern German differs from most other Germanic languages -
A Grammar of Gyeli
A Grammar of Gyeli Dissertation zur Erlangung des akademischen Grades doctor philosophiae (Dr. phil.) eingereicht an der Kultur-, Sozial- und Bildungswissenschaftlichen Fakultät der Humboldt-Universität zu Berlin von M.A. Nadine Grimm, geb. Borchardt geboren am 28.01.1982 in Rheda-Wiedenbrück Präsident der Humboldt-Universität zu Berlin Prof. Dr. Jan-Hendrik Olbertz Dekanin der Kultur-, Sozial- und Bildungswissenschaftlichen Fakultät Prof. Dr. Julia von Blumenthal Gutachter: 1. 2. Tag der mündlichen Prüfung: Table of Contents List of Tables xi List of Figures xii Abbreviations xiii Acknowledgments xv 1 Introduction 1 1.1 The Gyeli Language . 1 1.1.1 The Language’s Name . 2 1.1.2 Classification . 4 1.1.3 Language Contact . 9 1.1.4 Dialects . 14 1.1.5 Language Endangerment . 16 1.1.6 Special Features of Gyeli . 18 1.1.7 Previous Literature . 19 1.2 The Gyeli Speakers . 21 1.2.1 Environment . 21 1.2.2 Subsistence and Culture . 23 1.3 Methodology . 26 1.3.1 The Project . 27 1.3.2 The Construction of a Speech Community . 27 1.3.3 Data . 28 1.4 Structure of the Grammar . 30 2 Phonology 32 2.1 Consonants . 33 2.1.1 Phonemic Inventory . 34 i Nadine Grimm A Grammar of Gyeli 2.1.2 Realization Rules . 42 2.1.2.1 Labial Velars . 43 2.1.2.2 Allophones . 44 2.1.2.3 Pre-glottalization of Labial and Alveolar Stops and the Issue of Implosives . 47 2.1.2.4 Voicing and Devoicing of Stops . 51 2.1.3 Consonant Clusters . -
Recurrent Neural Networks in Speech Disfluency Detection And
Recurrent Neural Networks in Speech Disfluency Detection and Punctuation Prediction Master’s Thesis Matthias Reisser 1538923 At the Department of Informatics Interactive Systems Lab (ISL) Institute of Anthropomatics and Robotics Reviewer: Prof. Dr. Alexander Waibel Second reviewer: Prof. Dr. J. Marius Zöllner Advisor: Kevin Kilgour, Ph.D Second advisor: Eunah Cho 15th of November 2015 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association www.kit.edu Abstract With the increased performance of automatic speech recognition systems in recent years, applications that process spoken speech transcripts become in- creasingly relevant. These applications, such as automated machine transla- tion systems, dialogue systems or information extraction systems, usually are trained on large amount of text corpora. Since acquiring, manually transcribing and annotating spoken language transcripts is prohibitively expensive, natural language processing systems are trained on existing text corpora of well-written texts. However, since spoken language transcripts, as they are generated by automatic speech recognition systems, are full of speech disfluencies and lack punctuation marks as well as sentence boundaries, there is a mismatch be- tween the training corpora and the actual use case. In order to achieve high performance on spoken language transcripts, it is necessary to detect and re- move these disfluencies, as well as insert punctuation marks prior to processing them. The focus of this thesis therefore lies in addressing the tasks of disfluency de- tection and punctuation prediction on two data sets of spontaneous speech: The multi-party meeting data set (Cho, Niehues, & Waibel, 2014) and the switchboard data set of telephone conversations (Godfrey et al., 1992). -
Identification of Zero Copulas in Hungarian Using
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 4802–4810 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC Much Ado About Nothing Identification of Zero Copulas in Hungarian Using an NMT Model Andrea Dömötör1;2, Zijian Gyoz˝ o˝ Yang1, Attila Novák1 1MTA-PPKE Hungarian Language Technology Research Group, Pázmány Péter Catholic University, Faculty of Information Technology and Bionics Práter u. 50/a, 1083 Budapest, Hungary 2Pázmány Péter Catholic University, Faculty of Humanities and Social Sciences Egyetem u. 1, 2087 Piliscsaba, Hungary {surname.firstname}@itk.ppke.hu Abstract The research presented in this paper concerns zero copulas in Hungarian, i.e. the phenomenon that nominal predicates lack an explicit verbal copula in the default present tense 3rd person indicative case. We created a tool based on the state-of-the-art transformer architecture implemented in Marian NMT framework that can identify and mark the location of zero copulas, i.e. the position where an overt copula would appear in the non-default cases. Our primary aim was to support quantitative corpus-based linguistic research by creating a tool that can be used to compile a corpus of significant size containing examples of nominal predicates including the location of the zero copulas. We created the training corpus for our system transforming sentences containing overt copulas into ones containing zero copula labels. However, we first needed to disambiguate occurrences of the massively ambiguous verb van ‘exist/be/have’. We performed this using a rule-base classifier relying on English translations in the English-Hungarian parallel subcorpus of the OpenSubtitles corpus. -
A Corpus-Based Study of Speech Fluency Across English Dialects
1 A CORPUS-BASED STUDY OF SPEECH FLUENCY ACROSS ENGLISH DIALECTS A thesis submitted in partial fulfillment of the Requirements for the degree of Master of Science in Speech and Language Sciences In the University of Canterbury By Nandini Shanmukha University of Canterbury 2017 2 Table of Contents List of Figures .................................................................................................................... 5 List of Tables ...................................................................................................................... 6 Acknowledgements ............................................................................................................ 7 Abstract: ............................................................................................................................. 8 1. BACKGROUND ........................................................................................................... 9 1.1. Speech Fluency vs Disfluency ........................................................................... 10 1.2. Normal disfluencies Vs Stuttering-like disfluencies: ...................................... 11 1.2.1. Normative fluency data ................................................................................. 11 1.3. Contributing factors to speech disfluency: ...................................................... 13 1.3.1. Situational factors: ........................................................................................ 14 1.3.2. Topic familiarity:.......................................................................................... -
Problematizing Ethnolects: Naming Linguistic Practices in an Antwerp Secondary School Jürgen Jaspers University of Antwerp
‘International Journal of Bilingualism’ • Volume 12 • ProblematizingNumber 1 &2• 2008, ethnolects 85–103|| 85 Problematizing ethnolects: Naming linguistic practices in an Antwerp secondary school Jürgen Jaspers University of Antwerp Acknowledgments The author wishes to thank Michael Meeuwis, Frank Brisard and two anonymous referees for their useful comments. Abstract Key words This article argues that naming linguistic practices “ethnolectal” is a adolescents praxis with ideological consequences that sociolinguists fail sufficiently to address. It suggests that a transformation of linguistic differences Belgium into ethnolect-codes quickly falls prey to homogenizing groups and their language use, obscures speakers’ styling practices as well as the rela- Dutch tions between “ethnolect” and standard language speakers. Furthermore, “ethnolect” as an analytical concept buttresses the idea that linguistic practices are caused by ethnicity, when it is more likely to assume language ethnolect use is shaped by how speakers interpret prevailing representations of ethnicity and style their language use in relation to that. As an alternative, identity I argue that ethnolects be viewed as representations of particular ways of speaking that do not necessarily correspond to systematic linguistic representation practices. Sociolinguists therefore need to investigate how local and general perceptions of ways of speaking lead to specific styling practices, and styling integrate these into their descriptions. In addition, they need to be aware that their own work is social action as well, which requires taking into account the concerns of who gets labeled. This is illustrated with data from a case study showing how Belgian adolescents of Moroccan background resist an ethnolectal categorization of their routine Dutch. 1Introduction Inner-city ethnic minority youth in western societies have in recent decades drawn increasing attention as a problematic and somewhat dangerous group. -
Exploring the Role of Fillers in the Automatic Prediction of a Speaker's
How Confident are You? Exploring the Role of Fillers in the Automatic Prediction of a Speaker’s Confidence Tanvi Dinkar, Ioana Vasilescu, Catherine Pelachaud, Chloé Clavel To cite this version: Tanvi Dinkar, Ioana Vasilescu, Catherine Pelachaud, Chloé Clavel. How Confident are You? Ex- ploring the Role of Fillers in the Automatic Prediction of a Speaker’s Confidence. Workshop sur les Affects, Compagnons artificiels et Interactions, CNRS, Université Toulouse Jean Jaurès, Université de Bordeaux, Jun 2020, Saint Pierre d’Oléron, France. hal-02933476 HAL Id: hal-02933476 https://hal.inria.fr/hal-02933476 Submitted on 8 Sep 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. How Confident are You? Exploring the Role of Fillers in the Automatic Prediction of a Speaker’s Confidence Tanvi Dinkar Ioana Vasilescu [email protected] [email protected] Institut Mines-Telecom, Telecom Paris, CNRS-LTCI, Paris, LIMSI, CNRS, Université Paris-Saclay, Orsay, France France Catherine Pelachaud Chloé Clavel [email protected] [email protected] CNRS-ISIR, UPMC, Paris, France Institut Mines-Telecom, Telecom Paris, CNRS-LTCI, Paris, France ABSTRACT structure of their utterance, such as in their (difficulties of) selec- “Fillers", example “um" in English, have been linked to the “Feel- tion of appropriate vocabulary while maintaining their turn (in ing of Another’s Knowing (FOAK)" or the listener’s perception of dialogue). -
UC San Diego UC San Diego Electronic Theses and Dissertations
UC San Diego UC San Diego Electronic Theses and Dissertations Title Divination & Decision-Making: Ritual Techniques of Distributed Cognition in the Guatemalan Highlands Permalink https://escholarship.org/uc/item/2v42d4sh Author McGraw, John Joseph Publication Date 2016 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO Divination and Decision-Making: Ritual Techniques of Distributed Cognition in the Guatemalan Highlands A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Anthropology and Cognitive Science by John J. McGraw Committee in charge: Professor Steven Parish, Chair Professor David Jordan, Co-Chair Professor Paul Goldstein Professor Edwin Hutchins Professor Craig McKenzie 2016 Copyright John J. McGraw, 2016 All rights reserved. The dissertation of John J. McGraw is approved, and it is acceptable in quality and form for publication on microfilm and electronically: ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ ___________________________________________________________ Co-chair ___________________________________________________________ Chair University of California, San Diego 2016 iii TABLE OF CONTENTS Signature Page …....……………………………………………………………… iii Table of Contents ………………….……………………………….…………….. iv List of Figures ….…………………………………………………….…….……. -
Kent Academic Repository Full Text Document (Pdf)
Kent Academic Repository Full text document (pdf) Citation for published version Rehman, Ishrat (2019) Urdu Vowel System and Perception of English Vowels by Punjabi-Urdu Speakers. Doctor of Philosophy (PhD) thesis, University of Kent,. DOI Link to record in KAR https://kar.kent.ac.uk/80850/ Document Version UNSPECIFIED Copyright & reuse Content in the Kent Academic Repository is made available for research purposes. Unless otherwise stated all content is protected by copyright and in the absence of an open licence (eg Creative Commons), permissions for further reuse of content should be sought from the publisher, author or other copyright holder. Versions of research The version in the Kent Academic Repository may differ from the final published version. Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the published version of record. Enquiries For any further enquiries regarding the licence status of this document, please contact: [email protected] If you believe this document infringes copyright then please contact the KAR admin team with the take-down information provided at http://kar.kent.ac.uk/contact.html Urdu Vowel System and Perception of English Vowels by Punjabi-Urdu Speakers Thesis submitted in partial fulfilment of requirements for the degree of Doctor of Philosophy to Department of English Language and Linguistics (ELL) School of European Culture and Languages (SECL) University of Kent, Canterbury by Ishrat Rehman 2019 Abstract A well-defined vocalic and consonantal system is a prerequisite when investigating the perception and production of a second language. The lack of a well-defined Urdu vowel system in the multilingual context of Pakistan motivated investigation of the acoustic and phonetic properties of Urdu vowels.