The Relationship Between Learners' Lexical

Total Page:16

File Type:pdf, Size:1020Kb

The Relationship Between Learners' Lexical Université Sétif2 PEOPLE’S DEMOCRATIC REPUBLIC OF ALGERIA Ministry of Higher Education and Scientific Research Ferhat Abbas University, Setif Faculty of Letters and Languages Department of English Language and Literature THE RELATIONSHIP BETWEEN LEARNERS’ LEXICAL COVERAGE AND THE READABILITY LEVELS OF THE ALGERIAN ENGLISH TEXTBOOKS By Saad TORKI Thesis submitted in candidature for the degree of doctorate “ès-sciences” in Applied Linguistics Supervisor: Prof. H. SAADI Co-Supervisor: Prof. M. MEHRACH Board of Examiners: Chairman: Prof. S. KESKES Prof. University of Setif Supervisor: Prof. H. SAADI Prof. University of Constantine Co-Supervisor: Prof. M. MEHRACH Prof. University of Tetouan, Morocco Member: Dr. H. HAMADA M.C. University of Constantine Member: Dr. S. LARABA M.C. University of Constantine Member: Dr. A. NEMOUCHI M.C. University of Constantine 2012 Université Sétif2 THE RELATIONSHIP BETWEEN LEARNERS’ LEXICAL COVERAGE AND THE READABILITY LEVELS OF THE ALGERIAN ENGLISH TEXTBOOKS Saad TORKI Abstract Key words: readability – lexical coverage –reading comprehension – EFL textbooks In Algerian schools, textbooks provide the major if not the only written lexical input for students in classrooms. This study examined the seven English as a Foreign Language textbooks in use in these schools in order to investigate their lexical coverage and readability. The research project was designed to i) compile a textbook corpus ii) compare lexical coverage of the textbooks, and iii) assess readability. The main concern was to determine whether learners’ lexical coverage was at the textbook readability level (Independent Reading Leve), above it (Instructional Reading Level), or below it (Frustrational Reading Level). Furthermore, the list of all lexical items occurring in the seven textbooks was compared to West's General Service List, and Coxhead’s Academic Word List to assess whether textbooks provide sufficient, useful and appropriate vocabulary items. Another purpose was to provide English teachers and educationalists in general with a means (computer software) for comparing the vocabulary levels of reading materials and textbooks destined to Algerian learners of English in order to determine what the readability and vocabulary levels are, and what additional vocabulary is required for students to reach the 95% rate of comprehension. The methodology adopted to explore the lexical coveragewas characterised by a multi- instrument computer-based approach involving computer software. The sets of textbooks were processed to generate textbook word lists. Results have shown that all the EFL textbooks in use have low lexical coverage and readability, putting them at the frustrational level. Moreover, except for the first three textbooks there was a total discrepancy in terms of lexical coverage between the other four textbooks as the rate of common vocabulary across the textbooks was very low. Comparison of the lexical coverage of the seven textbooks to standard vocabulary lists have revealed that Algerian students are not learning sufficient, useful and appropriate vocabulary, as Algerian learners are exposed to a low proportion of high frequency words. The study ends with implications and recommendations as to how remedy to the problem. Université Sétif2 ملخص ذعرثش انكرة انًذسسُح نرذسَس اﻻَجهُزَح فٍ انجزائش انًصذس اﻷهى إٌ نى ذكٍ انىحُذ نرعهى اﻷنفاظ. نزا ذُاول هزا انثحث دساسح هزِ انكرة وانرٍ عذدها سثعح يٍ َاحُح انرغطُح انًعجًُح وانًقشوئُح. فصًى هزا انثحث نرشكُم )أ( يجًىعح يعجًُح، )ب( يقاسَح انكرة يٍ َاحُح انرغطُح انًعجًُح و )ج( ذحذَذ انًقشوئُح. كاٌ انهذف اﻷساسٍ وانشئُس هى اٌ كاٌ يسرىي يقشوئُح انكرة َرًاشً و انرغطُح انًعجًُح أٌ ذحذَذ إٌ كاٌ فٍ انًسرىي انقشائٍ انًسرقم، أو انًسرىي انقشائٍ انرعهًٍُ ٌ أو انًسرىي انقشائٍ اﻹحثاطٍ. اضافح انً رنك ذى يقاسَح قائًح كم اﻷنفاظ انىاسدج قٍ انكرة انسثع نرقُُى يا إرا كاَد انكرة انًذسسُح قادسج عهً ذىفُش يفشداخ Academic Word List و General Service Listنقىائى اﻷنفاظ كافُح ويفُذج ويُاسثح. وكاٌ آخش هذف هى ذزوَذ يذسسٍ انهغح اﻹَجهُزَح تىسُهح )انثشيجُاخ( نًقاسَح انًسرىَاخ حرً َرسًُ ذحذَذ انًفشداخ انًطهىتح نركًُهح نهىصىل انً َسثح 95 ٪ يٍ انفهى. وقذ أظهشخ انُرائج أٌ ذرًُز انًُهجُح انًعرًذج ترعذد انىسائم )تشايج انكًثُىذش( حُث ذى اسرخشاج قىائى كم يفشداخ انكرة انًذسوسح. يسرىي انًقشوئُح وانرغطُح انًعجًُح فٍ جًُع انكرة يُخفض. كًا ذثٍُ يٍ انرغطُح انًعجًُح أٌ فشسح ذعهى يفشداخ كافُح ويفُذج ويُاسثح غُش يرىفشج نهطهثح. وخهصد انذساسح تىصُاخ تشأٌ كُفُح يعانجح هزِ انًشكهح. Résumé Mots-clé: lisibilité - couverture lexicale - compréhension de la lecture - manuels d’anglais langue étrangère. En Algérie, les manuels d’anglais langue étrangère sont la principale, sinon la seule source d’entrée lexicale pour les élèves. Cette recherche a pour but d'examiner les sept manuels en usage dans ces écoles afin d'en déterminer la couverture lexicale et la lisibilité. Le projet de recherche a été conçu pour i) compiler un corpus des manuel, ii) comparer leur couverture lexicale des manuels scolaires, et iii) d'évaluer leur lisibilité. La principale préoccupation était de déterminer si le niveau de lisibilité des manuels correspondait ou non à la plage de la couverture lexicale de l'étudiant. Par ailleurs, les listes des items lexicaux ont été comparées à la General Service List et à la Academic Word List pour évaluer si les manuels scolaires sont à même de fournir un vocabulaire suffisant, utile et approprié. Un autre objectif était de fournir aux enseignants d'anglais et aux pédagogues en général un moyen (logiciel) pour comparer les niveaux de vocabulaire des textes de lecture destinés aux apprenants algériens de l'anglais afin d’en déterminer les niveaux de lisibilité et de vocabulaire, et quel vocabulaire supplémentaire était requis pour ces apprenants atteignent le taux de 95% de compréhension. La méthodologie adoptée pour explorer la couverture lexicale a été caractérisée par une approche multi- instrument impliquant des programmes informatique. Les ensembles des manuels scolaires ont été traités pour générer des listes de mots qui y figurent. Les résultats ont montré que tous les manuels en usage ont aussi bien un faible niveau de couverture lexicale et de lisibilité, les plaçant au niveau de frustration. Hormis les trois premiers manuels, il existe une divergence totale en termes de couverture lexicale entre les quatre autres manuels vu que le taux de vocabulaire commun à travers les manuels scolaires est très faible. La comparaison de la couverture lexicale des sept manuels aux listes de vocabulaire ont révélé que les élèves algériens ne sont pas exposés a un vocabulaire suffisant, utile et approprié, étant donne qu’ils sont exposés à une faible proportion de mots de haute fréquence. L'étude se termine avec des implications et des recommandations quant à la façon de remédier au problème. PEOPLE’S DEMOCRATIC REPUBLIC OF ALGERIA MinistryUniversité of Higher Education Sétif2and Scientific Research Ferhat Abbas University, Setif Faculty of Letters and Languages Department of English Language and Literature THE RELATIONSHIP BETWEEN LEARNERS’ LEXICAL COVERAGE AND THE READABILITY LEVELS OF THE ALGERIAN ENGLISH TEXTBOOKS By Saad TORKI Thesis submitted in candidature for the degree of doctorate “ès-sciences” in Applied Linguistics Supervisor: Prof. H. SAADI Co-Supervisor: Prof. M. MEHRACH Board of Examiners: Chairman: Prof. S. KESKES Prof. University of Setif Supervisor: Prof. H. SAADI Prof. University of Constantine Co-Supervisor: Prof. M. MEHRACH Prof. University of Tetouan, Morocco Member: Dr. H. HAMADA M.C. University of Constantine Member: Dr. S. LARABA M.C. University of Constantine Member: Dr. A. NEMOUCHI M.C. University of Constantine 2012 PEOPLE’SUniversité DEMOCRATIC REPUBLICSétif2 OF ALGERIA Ministry of Higher Education and Scientific Research Ferhat Abbas University, Setif Faculty of Letters and Languages Department of English Language and Literature THE RELATIONSHIP BETWEEN LEARNERS’ LEXICAL COVERAGE AND THE READABILITY LEVELS OF THE ALGERIAN ENGLISH TEXTBOOKS By Saad TORKI Thesis submitted in candidature for the degree of doctorate “ès-sciences” in Applied Linguistics Supervisor: Prof. H. SAADI Co-Supervisor: Prof. M. MEHRACH Board of Examiners: Chairman: Prof. S. KESKES Prof. University of Setif Supervisor: Prof. H. SAADI Prof. University of Constantine Co-Supervisor: Prof. M. MEHRACH Prof. University of Tetouan, Morocco Member: Dr. H. HAMADA M.C. University of Constantine Member: Dr. S. LARABA M.C. University of Constantine Member: Dr. A. NEMOUCHI M.C. University of Constantine 2012 UniversitéDEDICATIONSétif2 To My mother To The memory of my father whose last words as he lay dying were: “My son, never quit learning”. The memory of my father and mother-in-law who helped me to heed my father's advice. To my sisters, To Afef - Zineb, Mohamed–Abdallah , Taki -Eddine, and their Mum To My brothers and sisters-in-law Université Sétif2 ACKNOWLEDGEMENTS First of all, I would like to express my deepest gratitude to Allah for His innumerable graces on me, one of which is the completion of this work. Second, my most profound feelings and unspeakable, ineffable, and immense gratitude go to my mother who, though widowed at an early age, sacrificed her youth, life and energy and went out to work for my education. Third, the journey that produced this study owes much to many people. Some of them can be named, others cannot, but all of them have my deepest appreciation and deserve to be acknowledged for their support in this effort. I wish to convey my gratitude to my supervisor Professor. Hacene Saadi, Mentouri University, Constantine. I
Recommended publications
  • Segmentability Differences Between Child-Directed and Adult-Directed Speech: a Systematic Test with an Ecologically Valid Corpus
    Report Segmentability Differences Between Child-Directed and Adult-Directed Speech: A Systematic Test With an Ecologically Valid Corpus Alejandrina Cristia 1, Emmanuel Dupoux1,2,3, Nan Bernstein Ratner4, and Melanie Soderstrom5 1Dept d’Etudes Cognitives, ENS, PSL University, EHESS, CNRS 2INRIA an open access journal 3FAIR Paris 4Department of Hearing and Speech Sciences, University of Maryland 5Department of Psychology, University of Manitoba Keywords: computational modeling, learnability, infant word segmentation, statistical learning, lexicon ABSTRACT Previous computational modeling suggests it is much easier to segment words from child-directed speech (CDS) than adult-directed speech (ADS). However, this conclusion is based on data collected in the laboratory, with CDS from play sessions and ADS between a parent and an experimenter, which may not be representative of ecologically collected CDS and ADS. Fully naturalistic ADS and CDS collected with a nonintrusive recording device Citation: Cristia A., Dupoux, E., Ratner, as the child went about her day were analyzed with a diverse set of algorithms. The N. B., & Soderstrom, M. (2019). difference between registers was small compared to differences between algorithms; it Segmentability Differences Between Child-Directed and Adult-Directed reduced when corpora were matched, and it even reversed under some conditions. Speech: A Systematic Test With an Ecologically Valid Corpus. Open Mind: These results highlight the interest of studying learnability using naturalistic corpora Discoveries in Cognitive Science, 3, 13–22. https://doi.org/10.1162/opmi_ and diverse algorithmic definitions. a_00022 DOI: https://doi.org/10.1162/opmi_a_00022 INTRODUCTION Supplemental Materials: Although children are exposed to both child-directed speech (CDS) and adult-directed speech https://osf.io/th75g/ (ADS), children appear to extract more information from the former than the latter (e.g., Cristia, Received: 15 May 2018 2013; Shneidman & Goldin-Meadow,2012).
    [Show full text]
  • 3 Corpus Tools for Lexicographers
    Comp. by: pg0994 Stage : Proof ChapterID: 0001546186 Date:14/5/12 Time:16:20:14 Filepath:d:/womat-filecopy/0001546186.3D31 OUP UNCORRECTED PROOF – FIRST PROOF, 14/5/2012, SPi 3 Corpus tools for lexicographers ADAM KILGARRIFF AND IZTOK KOSEM 3.1 Introduction To analyse corpus data, lexicographers need software that allows them to search, manipulate and save data, a ‘corpus tool’. A good corpus tool is the key to a comprehensive lexicographic analysis—a corpus without a good tool to access it is of little use. Both corpus compilation and corpus tools have been swept along by general technological advances over the last three decades. Compiling and storing corpora has become far faster and easier, so corpora tend to be much larger than previous ones. Most of the first COBUILD dictionary was produced from a corpus of eight million words. Several of the leading English dictionaries of the 1990s were produced using the British National Corpus (BNC), of 100 million words. Current lexico- graphic projects we are involved in use corpora of around a billion words—though this is still less than one hundredth of one percent of the English language text available on the Web (see Rundell, this volume). The amount of data to analyse has thus increased significantly, and corpus tools have had to be improved to assist lexicographers in adapting to this change. Corpus tools have become faster, more multifunctional, and customizable. In the COBUILD project, getting concordance output took a long time and then the concordances were printed on paper and handed out to lexicographers (Clear 1987).
    [Show full text]
  • Multimedia Corpora (Media Encoding and Annotation) (Thomas Schmidt, Kjell Elenius, Paul Trilsbeek)
    Multimedia Corpora (Media encoding and annotation) (Thomas Schmidt, Kjell Elenius, Paul Trilsbeek) Draft submitted to CLARIN WG 5.7. as input to CLARIN deliverable D5.C­3 “Interoperability and Standards” [http://www.clarin.eu/system/files/clarin­deliverable­D5C3_v1_5­finaldraft.pdf] Table of Contents 1 General distinctions / terminology................................................................................................................................... 1 1.1 Different types of multimedia corpora: spoken language vs. speech vs. phonetic vs. multimodal corpora vs. sign language corpora......................................................................................................................................................... 1 1.2 Media encoding vs. Media annotation................................................................................................................... 3 1.3 Data models/file formats vs. Transcription systems/conventions.......................................................................... 3 1.4 Transcription vs. Annotation / Coding vs. Metadata ............................................................................................. 3 2 Media encoding ............................................................................................................................................................... 5 2.1 Audio encoding ..................................................................................................................................................... 5 2.2
    [Show full text]
  • Gold Standard Annotations for Preposition and Verb Sense With
    Gold Standard Annotations for Preposition and Verb Sense with Semantic Role Labels in Adult-Child Interactions Lori Moon Christos Christodoulopoulos Cynthia Fisher University of Illinois at Amazon Research University of Illinois at Urbana-Champaign [email protected] Urbana-Champaign [email protected] [email protected] Sandra Franco Dan Roth Intelligent Medical Objects University of Pennsylvania Northbrook, IL USA [email protected] [email protected] Abstract This paper describes the augmentation of an existing corpus of child-directed speech. The re- sulting corpus is a gold-standard labeled corpus for supervised learning of semantic role labels in adult-child dialogues. Semantic role labeling (SRL) models assign semantic roles to sentence constituents, thus indicating who has done what to whom (and in what way). The current corpus is derived from the Adam files in the Brown corpus (Brown, 1973) of the CHILDES corpora, and augments the partial annotation described in Connor et al. (2010). It provides labels for both semantic arguments of verbs and semantic arguments of prepositions. The semantic role labels and senses of verbs follow Propbank guidelines (Kingsbury and Palmer, 2002; Gildea and Palmer, 2002; Palmer et al., 2005) and those for prepositions follow Srikumar and Roth (2011). The corpus was annotated by two annotators. Inter-annotator agreement is given sepa- rately for prepositions and verbs, and for adult speech and child speech. Overall, across child and adult samples, including verbs and prepositions, the κ score for sense is 72.6, for the number of semantic-role-bearing arguments, the κ score is 77.4, for identical semantic role labels on a given argument, the κ score is 91.1, for the span of semantic role labels, and the κ for agreement is 93.9.
    [Show full text]
  • A Massively Parallel Corpus: the Bible in 100 Languages
    Lang Resources & Evaluation DOI 10.1007/s10579-014-9287-y ORIGINAL PAPER A massively parallel corpus: the Bible in 100 languages Christos Christodouloupoulos • Mark Steedman Ó The Author(s) 2014. This article is published with open access at Springerlink.com Abstract We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysis of the corpora collected and a detailed comparison between the English translation and other English corpora. Keywords Parallel corpus Á Multilingual corpus Á Comparative corpus linguistics 1 Introduction Parallel corpora are a valuable resource for linguistic research and natural language processing (NLP) applications. One of the main uses of the latter kind is as training material for statistical machine translation (SMT), where large amounts of aligned data are standardly used to learn word alignment models between the lexica of two languages (for example, in the Giza?? system of Och and Ney 2003). Another interesting use of parallel corpora in NLP is projected learning of linguistic structure. In this approach, supervised data from a resource-rich language is used to guide the unsupervised learning algorithm in a target language. Although there are some techniques that do not require parallel texts (e.g. Cohen et al. 2011), the most successful models use sentence-aligned corpora (Yarowsky and Ngai 2001; Das and Petrov 2011). C. Christodouloupoulos (&) Department of Computer Science, UIUC, 201 N.
    [Show full text]
  • The Relationship Between Transitivity and Caused Events in the Acquisition of Emotion Verbs
    Love Is Hard to Understand: The Relationship Between Transitivity and Caused Events in the Acquisition of Emotion Verbs The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Hartshorne, Joshua K., Amanda Pogue, and Jesse Snedeker. 2014. Citation Love Is Hard to Understand: The Relationship Between Transitivity and Caused Events in the Acquisition of Emotion Verbs. Journal of Child Language (June 19): 1–38. Published Version doi:10.1017/S0305000914000178 Accessed January 17, 2017 12:55:19 PM EST Citable Link http://nrs.harvard.edu/urn-3:HUL.InstRepos:14117738 This article was downloaded from Harvard University's DASH Terms of Use repository, and is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#OAP (Article begins on next page) Running head: TRANSITIVITY AND CAUSED EVENTS Love is hard to understand: The relationship between transitivity and caused events in the acquisition of emotion verbs Joshua K. Hartshorne Massachusetts Institute of Technology Harvard University Amanda Pogue University of Waterloo Jesse Snedeker Harvard University In press at Journal of Child Language Acknowledgements: The authors wish to thank Timothy O’Donnell for assistance with the corpus analysis as well as Alfonso Caramazza, Susan Carey, Steve Pinker, Mahesh Srinivasan, Nathan Winkler- Rhoades, Melissa Kline, Hugh Rabagliati, members of the Language and Cognition workshop, and three anonymous reviewers for comments and discussion. This material is based on work supported by a National Defense Science and Engineering Graduate Fellowship to JKH and a grant from the National Science Foundation to Jesse Snedeker (0623845).
    [Show full text]
  • LDL-2014 3Rd Workshop on Linked Data in Linguistics
    3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing Workshop Programme 08:30 - 09:00 – Opening and Introduction by Workshop Chair(s) 09:00 – 10:00 – Invited Talk Piek Vossen, The Collaborative Inter-Lingual-Index for harmonizing wordnets 10:00 – 10:30 – Session 1: Modeling Lexical-Semantic Resources with lemon Andon Tchechmedjiev, Gilles Sérasset, Jérôme Goulian and Didier Schwab, Attaching Translations to Proper Lexical Senses in DBnary 10:30 – 11:00 Coffee break 11:00-11:20– Session 1: Modeling Lexical-Semantic Resources with lemon John Philip McCrae, Christiane Fellbaum and Philipp Cimiano, Publishing and Linking WordNet using lemon and RDF 11:20-11:40– Session 1: Modeling Lexical-Semantic Resources with lemon Andrey Kutuzov and Maxim Ionov, Releasing genre keywords of Russian movie descriptions as Linguistic Linked Open Data: an experience report 11:40-12:00– Session 2: Metadata Matej Durco and Menzo Windhouwer, From CLARIN Component Metadata to Linked Open Data 12:00-12:20– Session 2: Metadata Gary Lefman, David Lewis and Felix Sasaki, A Brief Survey of Multimedia Annotation Localisation on the Web of Linked Data 12:20-12:50– Session 2: Metadata Daniel Jettka, Karim Kuropka, Cristina Vertan and Heike Zinsmeister, Towards a Linked Open Data Representation of a Grammar Terms Index 12:50-13:00 – Poster slam – Data Challenge 13:00 – 14:00 Lunch break 14:00 – 15:00 – Invited Talk Gerard de Mello, From Linked Data to Tightly Integrated Data 15:00 – 15:30 – Section 3: Crosslinguistic
    [Show full text]
  • “Lexicography in the Digital World” Krabi, Thailand 8
    ASIALEX 2018 “Lexicography in the Digital World” Krabi, Thailand 8th – 10th June, 2018 Programme and Abstracts http://www.kmitl.ac.th/asialex/ Organisers The Asian Association for Lexicography (ASIALEX) King Mongkut’s Institute of Technology Ladkrabang (KMITL) WELCOME MESSAGE On behalf of ASIALEX 2018 Organising Committee, we are delighted to welcome you to the conference in Krabi during the 8th to 10th of June, 2018. We would like to give a Thai traditional greeting ‘Sawasdee’ to you all of our honoured guests. The 12th International Conference of the Asian Association for Lexicography (ASIALEX 2018) is supported by King Mongkut’s Institute of Technology Ladkrabang, Thailand. This year, the theme of the conference is Lexicography in the digital World. We are grateful to have five distinguished keynote speakers: Pedro A. Fuertes-Olivera from University of Valladolid (Spain), Pam Peters from Macquarie University (Australia), John Simpson from University of Oxford (England), Shigeru Yamada from Waseda University (Japan), and Virach Sornlertlamvanich from Thammasart University (Thailand). A total number of 43 papers have been submitted from around the world. All papers have been peer reviewed by Scientific Committee. Apart from participating in the conference, we invite you to explore the most beautiful beaches and city of Krabi on the west coast of southern Thailand. It offers not only the nature, but also a unique treasure trove of cultural attractions such as temples and traditional ways of life of the southern part of Thailand. We hope that you have a fabulous experiencing net-working during the conference and enjoy spending time on fantastic white-sand beaches and turquoise water in Krabi, the most beautiful venue in Thailand.
    [Show full text]
  • Distributional Properties of Verbs in Syntactic Patterns Liam Considine
    Early Linguistic Interactions: Distributional Properties of Verbs in Syntactic Patterns Liam Considine The University of Michigan Department of Linguistics April 2012 Advisor: Nick Ellis Acknowledgements: I extend my sincerest gratitude to Nick Ellis for agreeing to undertake this project with me. Thank you for cultivating, and partaking in, some of the most enriching experiences of my undergraduate education. The extensive time and energy you invested here has been invaluable to me. Your consistent support and amicable demeanor were truly vital to this learning process. I want to thank my second reader Ezra Keshet for consenting to evaluate this body of work. Other thanks go out to Sarah Garvey for helping with precision checking, and Jerry Orlowski for his R code. I am also indebted to Mary Smith and Amanda Graveline for their participation in our weekly meetings. Their presence gave audience to the many intermediate challenges I faced during this project. I also need to thank my roommate Sean and all my other friends for helping me balance this great deal of work with a healthy serving of fun and optimism. Abstract: This study explores the statistical distribution of verb type-tokens in verb-argument constructions (VACs). The corpus under investigation is made up of longitudinal child language data from the CHILDES database (MacWhinney 2000). We search a selection of verb patterns identified by the COBUILD pattern grammar project (Francis, Hunston, Manning 1996), these include a number of verb locative constructions (e.g. V in N, V up N, V around N), verb object locative caused-motion constructions (e.g.
    [Show full text]
  • The Field of Phonetics Has Experienced Two
    The field of phonetics has experienced two revolutions in the last century: the advent of the sound spectrograph in the 1950s and the application of computers beginning in the 1970s. Today, advances in digital multimedia, networking and mass storage are promising a third revolution: a movement from the study of small, individual datasets to the analysis of published corpora that are thousands of times larger. These new bodies of data are badly needed, to enable the field of phonetics to develop and test hypotheses across languages and across the many types of individual, social and contextual variation. Allied fields such as sociolinguistics and psycholinguistics ought to benefit even more. However, in contrast to speech technology research, speech science has so far taken relatively little advantage of this opportunity, because access to these resources for phonetics research requires tools and methods that are now incomplete, untested, and inaccessible to most researchers. Our research aims to fill this gap by integrating, adapting and improving techniques developed in speech technology research and database research. The intellectual merit: The most important innovation is robust forced alignment of digital audio with phonetic representations derived from orthographic transcripts, using HMM methods developed for speech recognition technology. Existing forced-alignment techniques must be improved and validated for robust application to phonetics research. There are three basic challenges to be met: orthographic ambiguity; pronunciation variation; and imperfect transcripts (especially the omission of disfluencies). Reliable confidence measures must be developed, so as to allow regions of bad alignment to be identified and eliminated or fixed. Researchers need an easy way to get a believable picture of the distribution of transcription and measurement errors, so as to estimate confidence intervals, and also to determine the extent of any bias that may be introduced.
    [Show full text]
  • Conception D'une Chaîne De Traitement De La Langue Naturelle Pour Un Agent Conversationnel Assistant
    Conception d’une chaîne de traitement de la langue naturelle pour un agent conversationnel assistant François Bouchet To cite this version: François Bouchet. Conception d’une chaîne de traitement de la langue naturelle pour un agent conver- sationnel assistant. Informatique [cs]. Université Paris Sud - Paris XI, 2010. Français. tel-00607298 HAL Id: tel-00607298 https://tel.archives-ouvertes.fr/tel-00607298 Submitted on 8 Jul 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Conception d’une Chaîne de Traitement de la Langue Naturelle pour un Agent Conversationnel Assistant François BOUCHET LIMSI-CNRS École Doctorale d’Informatique Université Paris-Sud 11 – Orsay Soutenue le 29 juin 2010 devant le jury composé de : Rapporteurs : Guy LAPALME Professeur – Université de Montréal, RALI Sylvie PESTY Professeur – Université Pierre Mendès-France, LIG Examinateurs : Catherine PELACHAUD Directeur de Recherche – LTCI, Télécom ParisTech Anne VILNAT Professeur – Université Paris-Sud 11, LIMSI-CNRS Directeur : Jean-Paul SANSONNET Directeur de Recherche – LIMSI-CNRS Remerciements Bien qu’issu d’une longue phase rédactionnelle en solitaire, ce manuscrit n’aurait jamais pu voir le jour sans les années qui l’ont précédé, et je tiens donc à rendre hommage à toutes les personnes avec qui j’ai pu être amené à interagir au cours de cette période.
    [Show full text]
  • Constitution D'un Corpus Oral Defle : Enjeux Théoriques Et Méthodologiques - 2015 Arbach, Najib
    Constitution d'un corpus oral deFLE : enjeux th´eoriques et m´ethodologiques Najib Arbach To cite this version: Najib Arbach. Constitution d'un corpus oral deFLE : enjeux th´eoriqueset m´ethodologiques. Linguistique. Universit´eRennes 2, 2015. Fran¸cais. <NNT : 2015REN20014>. <tel-01147632> HAL Id: tel-01147632 https://tel.archives-ouvertes.fr/tel-01147632 Submitted on 30 Apr 2015 HAL is a multi-disciplinary open access L'archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destin´eeau d´ep^otet `ala diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publi´esou non, lished or not. The documents may come from ´emanant des ´etablissements d'enseignement et de teaching and research institutions in France or recherche fran¸caisou ´etrangers,des laboratoires abroad, or from public or private research centers. publics ou priv´es. THÈSE / UNIVERSITÉ RENNES 2 présentée par sous le sceau de l’Université européenne de Bretagne Najib Arbach pour obtenir le titre de DOCTEUR DE L’UNIVERSITÉ RENNES 2 Discipline : Linguistique EA 3874 LIDILE École doctorale - Arts, Lettres, Langues UFR Langues Thèse soutenue le 6 février 2015 Constitution d’un corpus oral devant le jury composé de : Marie-Claude LE BOT de FLE Professeure, Université Rennes 2 / Directrice de thèse Paul CAPPEAU Enjeux théoriques et méthodologiques Professeur, Université de Poitiers / Rapporteur Dominique LEGALLOIS Maître de Conférences-HDR, Université de Caen / Rapporteur Élisabeth RICHARD Maître de Conférences-HDR, Université Rennes 2 / Examinatrice Arbach, Najib. Constitution d'un corpus oral deFLE : enjeux théoriques et méthodologiques - 2015 Arbach, Najib. Constitution d'un corpus oral deFLE : enjeux théoriques et méthodologiques - 2015 RÉSUMÉ Les méthodologies de constitution de corpus linguistiques ont été amplement étudiées, mais sont moins abondantes quand il s’agit de corpus oraux ; ces méthodologies sont encore plus rares en ce qui concerne l’interlangue orale.
    [Show full text]