Research Methods in Linguistics Edited by Robert J

Total Page:16

File Type:pdf, Size:1020Kb

Research Methods in Linguistics Edited by Robert J Cambridge University Press 978-1-107-01433-6 - Research Methods in Linguistics Edited by Robert J. Podesva and Devyani Sharma Index More information Index 1 Advancement Exclusiveness Law, 416 bit, 171 absolute amplitude, 379 block design, 119 acceptability judgment (also see grammaticality box plot, 298, 299, 330, 331 judgment), 6, 27–47, 68, 102–3, 126, 257, British National Corpus (BNC), 267, 272 307, 308, 320 broker, 201–2, 203 access, 19, 24, 65, 69, 76, 80, 99, 181, 193, 196, Brown Corpus, 264, 265, 266, 269, 270, 295 198, 199–200, 201–4, 224, 229, 239, 480, 500 Buckeye Corpus, 88, 273 acoustic analysis, 3, 5, 6, 58, 171, 289, 396 fricatives, 171, 302, 380, 389, 391, 392 CallHome American English Speech Corpus, 262 laterals, 394 carrier phrase, 88, 175 nasals, 389, 394 carryover effects, 120 place of articulation, 391–2 categorical context, 444 stops, 379–80, 389, 391, 392 categorical variable, 290, 307, 322, 324, 325, trills, 380–1, 389, 391, 392, 393 339, 431 vowels, 171, 376–9, 389, 390–1, 394 categorization, 398, 404, 405–6, 408, 410, 419 aerodynamic measures, 187–8 central tendency (also see mean, median, mode), 45, African American Vernacular English, 103, 446, 288, 295–8, 310, 328–35 471, 475, 476 children, 3, 17, 20–1, 89–90, 131, 394, 505 age, 17, 68, 75, 78, 79, 81, 89, 90, 98, 99, 104, 118, chi-squared test, 311–13, 327 121, 122, 123, 260, 261, 263, 271, 288, 349, CLAN, 249, 250 452, 496, 498, 499, 501, 502, 503, 507, 511 classification, 338, 361–7 age grading, 66, 502, 503, 505, 507, 508, 509 clipping, 172, 175 agent-based models, 430–1, 435 cluster analysis, 45, 321 air pressure, 187, 375, 376, 386 code-switching, 5–6, 29, 128, 456, 457 air sacs, 426, 427 collaborator, 13, 23, 211 alternative hypothesis, 317, 335, 398, 400, 402, collinearity, 348–9, 451 406, 418, 445 collocation, 222, 276–7, 444 analysis of variance (ANOVA), 339, 340, 341, 344 community (also see speech community, analysis window, 382, 390 community of practice), 5, 11, 13, 14, 16, anchor, 37 19–24, 51, 52, 58, 59, 74, 79, 109, 195, 196, annotation, 63, 64, 65, 66, 70, 180, 182, 209, 223, 199, 200, 201–4, 205, 210, 211, 213, 236, 496, 248–51, 259, 264–9, 277, 279, 335 500, 501, 502, 508, 511 anonymization, 14, 22, 24, 238, 253, 263 community artefacts, 211 apparent time, 496–9, 503, 506, 507, 511 community of practice, 81, 199 Archer Corpus, 85 compensation, 23 archiving data (also see data management), 69, 253 competence (also see performance), 12, 46, 47, 85, argumentation, 6, 421 216, 224, 225–7, 228, 455, 457, 497 association, 302–7, 310–13, 364, 366 compression, 171 autocorrelation, 382–3, 384 computational linguistics, 367, 422–39, 456, 457 average (see mean) computational models, 367, 422–39 concordance, 64, 222, 250, 277–8, 279, 280, 282 behavioral methods, 28, 30, 137–57 conditional inference trees, 364–6 between-subjects design, 106, 117–18, 121, 122 confidence interval, 323, 335, 345, 346 bilingualism, 29, 55, 106, 128, 237, 247, 313, 498 connectionist models, 433 519 © in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-01433-6 - Research Methods in Linguistics Edited by Robert J. Podesva and Devyani Sharma Index More information 520 Index consent, 14–17, 20, 21, 24, 131, 207, 209 dependent variable (also see response, linguistic constraint hierarchy, 451 variable), 117, 119, 120, 122, 251, 303, 305, consultant (also see informant, native speaker), 5, 6, 320, 321, 322, 323, 325, 328, 331, 333, 362, 13, 51, 56, 178, 212 445, 450, 498 contextualization cue, 239, 469, 470, 471, 472, 474, descriptive statistics, 288–315 475, 476, 477 digital manipulation, 99, 107, 375 continuous variable, 289–90, 292, 294, 295, 297, digital signal processing, 170–2 298, 300, 307, 310, 339 digitization contrast, 404–8 audio, 173 convergent parallel design, 125 text, 222, 223, 251 conversation, 6, 51, 55, 62, 63, 66, 75, 96, 101, 105, direct approach (to time) (also see indirect 108, 111, 127, 176, 178, 179, 198, 222, 236, approach), 495, 496, 503, 511–12 243, 244, 249, 250, 259, 261, 262, 263, 271, discourse, 22, 62, 63, 127, 143, 158, 169, 176, 197, 273, 462, 468–9 212, 218, 236, 242–6, 270, 398, 422, 460–93 Conversation Analysis, 55, 242–6, 468–9 discourse analysis, 169, 243, 246, 460–93 corpus, 222–4, 257–87 dispersion, 276, 298–300, 308, 309, 310, 322 analysis of, 5, 6, 29, 126, 127, 222, 226, 227, Distributed Morphology, 456 274–83, 316, 327, 398, 433 distribution, 43, 288, 290–5, 296, 297, 298, 299, balanced, 258 300, 301, 302, 306, 307, 309, 316, 319, 320, machine-readable, 258 322, 344, 348, 444 multimodal, 268 distribution, linguistic, 5, 404–8 representative, 221, 258 dummy coding, 340, 341, 344, 350, 363 size, 29, 88, 239, 259, 270 Dundee Corpus, 144 types, 63, 66, 85, 169, 220, 221, 222, 224, duration, 138, 337, 376, 378–9 274, 349 Corpus of Contemporary American English Early ME Corpus, 225 (COCA), 273 ELAN, 249–50, 268 Corpus of Early English Correspondences, 221 electroglottography (EGG), 187 Corpus of Historical American English electromagnetic midsagittal articulometry (COHA), 273 (EMMA), 191, 192 corrected mean, 451 (also see input) electromyography (EMG), 193 correlation, 303–7, 310, 317, 322, 326, 351, 352, elicitation, 2, 3, 5, 6, 30, 57, 59–68, 88, 99–107, 354, 454–5 127, 128, 179, 210 Kendall’s tau, 306, 307 direct, 99–102 Pearson’s r, 306, 454 indirect, 100, 101–2, 103–4, 104–7 Spearman’s rho, 306, 307, 454 limitations, 66, 102–3 counterbalancing, 120 questionnaire-driven, 59–62 complete, 120 text-driven, 63–4, 66 partial, 120 embedded design, 125 courtroom discourse, 461–84 emic, 79, 196, 502 Critical Discourse Analysis, 461, 483–4 endoscopy, 188–9 cross-fertilization, 1–2, 4–5, 6–7, 66, 84–5, 91, 169, envelope of variation (also see variable context), 442 193, 197, 254, 375, 422, 458, 460, 484, 497 ERP, 157–60 cross-sectional study, 90, 495–7, 500, 506, 508, 512 ethics, 5, 6, 11–26, 90, 131, 182, 198–9, 201, cross-tabulation, 288, 311 209, 238 ethnography, 2, 5, 15, 16, 20, 23, 79–80, 81, 86, data analysis, 4, 7, 373, 498–8 195–215, 470, 471, 475 data collection, 3–4, 7, 9, 180, 232 etic, 79, 80, 196 multi-wave, 498, 504, 505 evolution of speech, 423, 429 one-wave, 498, 504 example research projects, 5–6 data gaps, 225, 228–9 experimental designs data management, 24–5, 64–5, 68–9, 180, 209, between-subjects (see between-subjects design) 251, 268 convergent parallel (see convergent parallel data processing, 4, 7, 233 design) deception, 15–16 embedded (see embedded design) demographic information, 90, 98, 99, 104, 108, 111, factorial (see factorial design) 199, 237, 260 Latin squares (see Latin squares) © in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-01433-6 - Research Methods in Linguistics Edited by Robert J. Podesva and Devyani Sharma Index More information Index 521 mixed (see mixed design) GoldVarb, 449, 450 pretest-posttest (see pretest-posttest design) goodness-of-fit, 313, 320, 324, 325, 347, 352, switching replications (see switching replications 357, 453 design) gradience, 46–7, 88, 197 time-series (see time-series design) grammars within-subjects (see within-subjects design) descriptive, 51, 52, 55, 66, 70, 221 expert participant, 42, 111 sketch, 55 explanatory models, 424, 425, 427, 428, 436 grammatical description, 51, 59–66 extraneous variables, 116, 120, 124 grammaticality judgment (also see acceptability eye dialect, 240 judgment), 27, 60, 96, 97, 117, 118, 130, 322 eye movements, 142–3 eye-tracking, 135, 141, 143, 145, 154, 155, 181 harmonic, 387, 388, 389 Heritage Language Variation and Change (HLVC) factor analysis, 321, 455 project, 237, 250, 251, 252 factor group (also see independent variable, histogram, 288, 292, 293, 294, 296, 298, 299, predictors), 445–54 307, 333 coding, 447–9, 450 historical linguistics, 97, 98, 193, 216–32, 258, 273, operationalization, 446–7 425, 436, 440, 494–518 factor weight, 450, 451 historical sociolinguistics, 221–2 factorial design, 38, 121–3 hypercorrection, 211 factors, 338 hypothesis testing, 319, 320 field notes, 55, 68, 206–7 field session, 56, 61, 64 ideology, language, 12, 478, 483, 484 fieldwork, 5, 19, 20, 23, 51–73, 76, 79, 80, 86, 98, implicational scale, 313 179, 184, 195–215, 471 independent variable (also see predictors, factor Fieldworks Language Explorer (FLEx), 65, 250 group), 117, 118, 119, 120, 121, 122, 123, 124, first language acquisition, 6, 17, 29, 67, 83, 90, 126, 290, 305, 307, 320, 321, 325, 328, 329, 331, 127, 131, 153, 169, 193, 249, 257, 280, 422, 333, 336, 338, 440, 450, 451, 453, 454, 498, 430, 432, 434, 440, 494, 495, 496, 497, 498 499 Fisher-Yates exact test, 277, 327 indirect approach (to time) (also see direct fixed effect, 123, 453, 454 approach), 496, 497, 500, 512 fMRI, 16, 126, 161 informant (also see consultant, native speaker), 13, footing, 472 82, 196, 200, 203, 204, 207, 210, 211, 212 forced-choice task, 31–2, 325 input (also see corrected mean), 451, 457 formant, 177, 289, 293, 298, 302, 304, 377, 378, institutional review board, 14, 17, 18–19, 200 379, 390, 391, 394, 395 institutional talk, 464 transitions, 392 instructions, 36–7, 131, 137, 211, 447–9 formant tracks, 394, 395 intensity, 375, 379, 385 frame analysis, 470 intensity curve, 384–6 framework, 54, 81, 87, 242, 316, 399, 401, 409, interaction, statistical, 78, 121, 320, 321, 338, 416, 417, 420, 423, 434, 446, 456, 479 341, 342, 343, 345, 347, 359, 361, 364, 366, Freiburg Brown Corpus (FROWN), 270 452, 453 Freiburg LOB Corpus (FLOB), 270 Interactional Sociolinguistics,
Recommended publications
  • FREQUENCY EFFECTS on ESL COMPOSITIONAL MULTI-WORD SEQUENCE PROCESSING by Sarut Supasiraprapa a DISSERTATION Submitted to Michiga
    FREQUENCY EFFECTS ON ESL COMPOSITIONAL MULTI-WORD SEQUENCE PROCESSING By Sarut Supasiraprapa A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Second Language Studies – Doctor of Philosophy 2017 ABSTRACT FREQUENCY EFFECTS ON ESL COMPOSITIONAL MULTI-WORD SEQUENCE PROCESSING By Sarut Supasiraprapa The current study investigated whether adult native English speakers and English- as-a-second-language (ESL) learners exhibit sensitivity to compositional English multi- word sequences, which have a meaning derivable from word parts (e.g., don’t have to worry as opposed to sequences like He left the US for good, where for good cannot be taken apart to derive its meaning). In the current study, a multi-word sequence specifically referred to a word sequence beyond the bigram (two-word) level. The investigation was motivated by usage-based approaches to language acquisition, which predict that first (L1) and second (L2) speakers should process more frequent compositional phrases faster than less frequent ones (e.g., Bybee, 2010; Ellis, 2002; Gries & Ellis, 2015). This prediction differs from the prediction in the mainstream generative linguistics theory, according to which frequency effects should be observed from the processing of items stored in the mental lexicon (i.e., bound morphemes, single words, and idioms), but not from compositional phrases (e.g., Prasada & Pinker, 1993; Prasada, Pinker, & Snyder, 1990). The present study constituted the first attempt to investigate frequency effects on multi-word sequences in both language comprehension and production in the same L1 and L2 speakers. The study consisted of two experiments. In the first, participants completed a timed phrasal-decision task, in which they decided whether four-word target phrases were possible English word sequences.
    [Show full text]
  • The Field of Phonetics Has Experienced Two
    The field of phonetics has experienced two revolutions in the last century: the advent of the sound spectrograph in the 1950s and the application of computers beginning in the 1970s. Today, advances in digital multimedia, networking and mass storage are promising a third revolution: a movement from the study of small, individual datasets to the analysis of published corpora that are thousands of times larger. These new bodies of data are badly needed, to enable the field of phonetics to develop and test hypotheses across languages and across the many types of individual, social and contextual variation. Allied fields such as sociolinguistics and psycholinguistics ought to benefit even more. However, in contrast to speech technology research, speech science has so far taken relatively little advantage of this opportunity, because access to these resources for phonetics research requires tools and methods that are now incomplete, untested, and inaccessible to most researchers. Our research aims to fill this gap by integrating, adapting and improving techniques developed in speech technology research and database research. The intellectual merit: The most important innovation is robust forced alignment of digital audio with phonetic representations derived from orthographic transcripts, using HMM methods developed for speech recognition technology. Existing forced-alignment techniques must be improved and validated for robust application to phonetics research. There are three basic challenges to be met: orthographic ambiguity; pronunciation variation; and imperfect transcripts (especially the omission of disfluencies). Reliable confidence measures must be developed, so as to allow regions of bad alignment to be identified and eliminated or fixed. Researchers need an easy way to get a believable picture of the distribution of transcription and measurement errors, so as to estimate confidence intervals, and also to determine the extent of any bias that may be introduced.
    [Show full text]
  • (Or, the Raising of Baby Mondegreen) Dissertation
    PRESERVING SUBSEGMENTAL VARIATION IN MODELING WORD SEGMENTATION (OR, THE RAISING OF BABY MONDEGREEN) DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Christopher Anton Rytting, B.A. ***** The Ohio State University 2007 Dissertation Committee: Approved by Dr. Christopher H. Brew, Co-Advisor Dr. Eric Fosler-Lussier, Co-Advisor Co-Advisor Dr. Mary Beckman Dr. Brian Joseph Co-Advisor Graduate Program in Linguistics ABSTRACT Many computational models have been developed to show how infants break apart utterances into words prior to building a vocabulary—the “word segmenta- tion task.” Most models assume that infants, upon hearing an utterance, represent this input as a string of segments. One type of model uses statistical cues calcu- lated from the distribution of segments within the child-directed speech to locate those points most likely to contain word boundaries. However, these models have been tested in relatively few languages, with little attention paid to how different phonological structures may affect the relative effectiveness of particular statistical heuristics. This dissertation addresses this is- sue by comparing the performance of two classes of distribution-based statistical cues on a corpus of Modern Greek, a language with a phonotactic structure signif- icantly different from that of English, and shows how these differences change the relative effectiveness of these cues. Another fundamental issue critically examined in this dissertation is the practice of representing input as a string of segments. Such a representation im- plicitly assumes complete certainty as to the phonemic identity of each segment.
    [Show full text]
  • The Phonetic Analysis of Speech Corpora
    The Phonetic Analysis of Speech Corpora Jonathan Harrington Institute of Phonetics and Speech Processing Ludwig-Maximilians University of Munich Germany email: [email protected] Wiley-Blackwell 2 Contents Relationship between International and Machine Readable Phonetic Alphabet (Australian English) Relationship between International and Machine Readable Phonetic Alphabet (German) Downloadable speech databases used in this book Preface Notes of downloading software Chapter 1 Using speech corpora in phonetics research 1.0 The place of corpora in the phonetic analysis of speech 1.1 Existing speech corpora for phonetic analysis 1.2 Designing your own corpus 1.2.1 Speakers 1.2.2 Materials 1.2.3 Some further issues in experimental design 1.2.4 Speaking style 1.2.5 Recording setup 1.2.6 Annotation 1.2.7 Some conventions for naming files 1.3 Summary and structure of the book Chapter 2 Some tools for building and querying labelling speech databases 2.0 Overview 2.1 Getting started with existing speech databases 2.2 Interface between Praat and Emu 2.3 Interface to R 2.4 Creating a new speech database: from Praat to Emu to R 2.5 A first look at the template file 2.6 Summary 2.7 Questions Chapter 3 Applying routines for speech signal processing 3.0 Introduction 3.1 Calculating, displaying, and correcting formants 3.2 Reading the formants into R 3.3 Summary 3.4 Questions 3.5 Answers Chapter 4 Querying annotation structures 4.1 The Emu Query Tool, segment tiers and event tiers 4.2 Extending the range of queries: annotations from the same
    [Show full text]
  • Pdf Field Stands on a Separate Tier
    ComputEL-2 Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages March 6–7, 2017 Honolulu, Hawai‘i, USA Graphics Standards Support: THE SIGNATURE What is the signature? The signature is a graphic element comprised of two parts—a name- plate (typographic rendition) of the university/campus name and an underscore, accompanied by the UH seal. The UH M ¯anoa signature is shown here as an example. Sys- tem and campus signatures follow. Both vertical and horizontal formats of the signature are provided. The signature may also be used without the seal on communications where the seal cannot be clearly repro- duced, space is limited or there is another compelling reason to omit the seal. How is color incorporated? When using the University of Hawai‘i signature system, only the underscore and seal in its entirety should be used in the specifed two-color scheme. When using the signature alone, only the under- score should appear in its specifed color. Are there restrictions in using the signature? Yes. Do not • alter the signature artwork, colors or font. c 2017 The Association for Computational Linguistics • alter the placement or propor- tion of the seal respective to the nameplate. • stretch, distort or rotate the signature. • box or frame the signature or use over a complexOrder background. copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 5 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ii Preface These proceedings contain the papers presented at the 2nd Workshop on the Use of Computational Methods in the Study of Endangered languages held in Honolulu, March 6–7, 2017.
    [Show full text]
  • Research Methods in Linguistics
    Research Methods in Linguistics A comprehensive guide to conducting research projects in linguistics, this book provides a complete training in state-of-the-art data collection, processing, and analysis techniques. The book follows the structure of a research project, guiding the reader through the steps involved in collecting and processing data, and providing a solid foundation for linguistic analysis. All major research methods are covered, each by a leading expert. Rather than focusing on narrow specializations, the text fosters inter- disciplinarity, with many chapters focusing on shared methods such as sampling, experimental design, transcription, and constructing an argu- ment. Highly practical, the book offers helpful tips on how and where to get started, depending on the nature of the research question. The only book that covers the full range of methods used across the field, this student- friendly text is also a helpful reference source for the more experienced researcher and current practitioner. robert j. podesva is an Assistant Professor in the Department of Linguistics at Stanford University. devyani sharma is a Senior Lecturer in Linguistics at Queen Mary University of London. Research Methods in Linguistics edited by ROBERT J. PODESVA Stanford University and DEVYANI SHARMA Queen Mary University of London University Printing House, Cambridge CB2 8BS, United Kingdom Published in the United States of America by Cambridge University Press, New York Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781107696358 © Cambridge University Press 2013 This publication is in copyright.
    [Show full text]
  • Download Book
    Adrian-Horia Dediu Carlos Martín-Vide Klára Vicsi (Eds.) Statistical Language LNAI 9449 and Speech Processing Third International Conference, SLSP 2015 Budapest, Hungary, November 24–26, 2015 Proceedings 123 Lecture Notes in Artificial Intelligence 9449 Subseries of Lecture Notes in Computer Science LNAI Series Editors Randy Goebel University of Alberta, Edmonton, Canada Yuzuru Tanaka Hokkaido University, Sapporo, Japan Wolfgang Wahlster DFKI and Saarland University, Saarbrücken, Germany LNAI Founding Series Editor Joerg Siekmann DFKI and Saarland University, Saarbrücken, Germany More information about this series at http://www.springer.com/series/1244 Adrian-Horia Dediu • Carlos Martín-Vide Klára Vicsi (Eds.) Statistical Language and Speech Processing Third International Conference, SLSP 2015 Budapest, Hungary, November 24–26, 2015 Proceedings 123 Editors Adrian-Horia Dediu Klára Vicsi Research Group on Mathematical Department of Telecommunications Linguistics and Media Informatics Rovira i Virgili University Budapest University of Technology Tarragona and Economics Spain Budapest Hungary Carlos Martín-Vide Research Group on Mathematical Linguistics Rovira i Virgili University Tarragona Spain ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Artificial Intelligence ISBN 978-3-319-25788-4 ISBN 978-3-319-25789-1 (eBook) DOI 10.1007/978-3-319-25789-1 Library of Congress Control Number: 2015952770 LNCS Sublibrary: SL7 – Artificial Intelligence Springer Cham Heidelberg New York Dordrecht London © Springer International Publishing Switzerland 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
    [Show full text]
  • Mining a Year of Speech”1
    Mining Years and Years of Speech Final report of the Digging into Data project “Mining a Year of Speech”1 John Coleman Mark Liberman Greg Kochanski Jiahong Yuan Sergio Grau Chris Cieri Ladan Baghai-Ravary Lou Burnard University of Oxford University of Pennsylvania Phonetics Laboratory Linguistic Data Consortium 1. Introduction In the “Mining a Year of Speech” project we assessed the challenges of working with very large digital audio collections of spoken language: an aggregation of large corpora of US and UK English held in the Linguistic Data Consortium (LDC), University of Pennsylvania, and the British Library Sound Archive. The UK English material is entirely from the audio recordings of the Spoken British National Corpus (BNC), which Oxford and the British Library have recently digitized. This is a 7½ million-word2 broad sample of mostly unscripted vernacular speech from many contexts, including meetings, radio phone-ins, oral history recordings, and a substantial portion of naturally-occurring everyday conversations captured by volunteers (Crowdy 1993). The US English material is drawn together from a variety of corpora held by the Linguistic Data Consortium, including conversational and telephone speech, audio books, news broadcasts, US Supreme Court oral arguments, political speeches/statements, and sociolinguistic interviews. Further details are given in section 2, below. Any one of these corpora by itself presents two interconnected challenges: (a) How does a researcher find audio segments containing material of interest? (b) How do providers or curators of spoken audio collections mark them up in order to facilitate searching and browsing? The scale of the corpora we are bringing together presents a third substantial problem: it is not cost-effective or practical to follow the standard model of access to data (i.e.
    [Show full text]
  • 39. Jahrestagung Der Deutschen Gesellschaft Für Sprachwissenschaft ---- 08.–10
    0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 1 0 0 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 0 A 0 1 0 0 1 1 1 1 0 1 0 0 0 0 1 1 N 1 1 U 0 1 R 0 1 1 0 1 1 1 0 1 0 0 1 0 D 1 0 1 1 1 0 1 F 0 0 1 1 R U 0 O 0 0 A I 1 1 N 1 1 1 R 0 0 1 D 1 0 L 0 1 0 1 R T 1 U 1 R 0 N 1 0 1 F E 0 1 O F O A D 1 N 1 H N L R O 0 U 1 R 0 N H A 0 R T 1 N S 1 F E C M H U H 1 O F O A I D K N F E D K N 1 C M R O 0 H U N L R 0 D A U I H 1 1 0 R T O F O A I 1 N U I H C O F O A I I F E C M N L R G N D K N I D A O R T N N S H R O I O I A I I F E N N N S L R G T H R I H C O N D K N D A O U S C M N P P N U O I E N I F N D K O U R G O I A I H N S C T H R M N D A N S D I E R U N G O R M A I C H A C H L N S P H U N D N F O K O I T & K O D I E R U N G I N F O R M A T I O N L I C H S P R A C H K O D I E R U N G I N F O R M A T I O N S P R A C H L I C H E & & I N F O R M A T I O N K O D I E R U N G S P R A C H L I C H E I N F O R M A T I O N & S P R A C H L I C H E K O D I E R U N G 39.
    [Show full text]
  • Using the Audio Features of Bncweb
    ICAME Journal, Volume 45, 2021, DOI: 10.2478/icame-2021-0004 Better data for more researchers – using the audio features of BNCweb Sebastian Hoffmann and Sabine Arndt-Lappe, Trier University Abstract In spite of the wide agreement among linguists as to the significance of spoken language data, actual speech data have not formed the basis of empirical work on English as much as one would think. The present paper is intended to con- tribute to changing this situation, on a theoretical and on a practical level. On a theoretical level, we discuss different research traditions within (English) lin- guistics. Whereas speech data have become increasingly important in various linguistic disciplines, major corpora of English developed within the corpus-lin- guistic community, carefully sampled to be representative of language usage, are usually restricted to orthographic transcriptions of spoken language. As a result, phonological phenomena have remained conspicuously understudied within traditional corpus linguistics. At the same time, work with current speech corpora often requires a considerable level of specialist knowledge and tailor- made solutions. On a practical level, we present a new feature of BNCweb (Hoffmann et al. 2008), a user-friendly interface to the British National Corpus, which gives users access to audio and phonemic transcriptions of more than five million words of spontaneous speech. With the help of a pilot study on the vari- ability of intrusive r we illustrate the scope of the new possibilities. 1 Introduction The aim of this paper is threefold: First, on a rather basic level, the paper is intended to provide an overview of the functionality of a new feature of BNCweb (Hoffmann et al.
    [Show full text]