The Spoken British National Corpus 2014 Design, compilation and analysis ROBBIE LOVE ESRC Centre for Corpus Approaches to Social Science Department of Linguistics and English Language Lancaster University A thesis submitted to Lancaster University for the degree of Doctor of Philosophy in Linguistics September 2017 Contents Contents ............................................................................................................. i Abstract ............................................................................................................. v Acknowledgements ......................................................................................... vi List of tables .................................................................................................. viii List of figures .................................................................................................... x 1 Introduction ............................................................................................... 1 1.1 Overview .................................................................................................................................... 1 1.2 Research aims & structure ....................................................................................................... 3 2 Literature review ........................................................................................ 6 2.1 Introduction ............................................................................................................................... 6 2.2 The Spoken British National Corpus 1994 & other spoken corpora ............................... 7 2.2.1 The Spoken BNC1994 ...................................................................................................... 7 2.2.2 Other British English corpora containing spoken data ................................................ 9 2.2.3 Summary and justification for the Spoken BNC2014 ................................................ 11 2.3 Review of research based on spoken English corpora ...................................................... 12 2.3.1 Introduction ...................................................................................................................... 12 2.3.2 Corpus linguistics journals .............................................................................................. 13 2.3.3 Grammars ......................................................................................................................... 18 2.4 Summary .................................................................................................................................. 20 3 Corpus design .......................................................................................... 22 3.1 Introduction ............................................................................................................................. 22 3.2 Design & speaker recruitment .............................................................................................. 22 3.2.1 ‘Demographic’ corpus data ............................................................................................. 22 3.2.2 Design of the Spoken BNC1994DS ............................................................................. 23 3.2.3 Speaker recruitment in the Spoken BNC1994 ............................................................. 27 i 3.2.4 Design & recruitment in other spoken corpora .......................................................... 29 3.2.5 General approach to design in the Spoken BNC2014 ............................................... 31 3.2.6 Speaker recruitment in the Spoken BNC2014 ............................................................. 33 3.3 Speaker and text metadata ..................................................................................................... 37 3.3.1 Introduction ...................................................................................................................... 37 3.3.2 Metadata in the Spoken BNC1994 ................................................................................ 38 3.3.3 Metadata collection in the Spoken BNC2014: procedure and ethics ....................... 41 3.3.4 Speaker & text metadata collection procedures in the Spoken BNC2014 .............. 46 3.3.5 Age, linguistic region & socio-economic status: discussion ...................................... 48 3.4 Collection of audio data ......................................................................................................... 70 3.4.1 Introduction ...................................................................................................................... 70 3.4.2 Audio recording in the Spoken BNC1994 ................................................................... 70 3.4.3 Audio recording in the Spoken BNC2014 ................................................................... 70 3.5 Chapter summary .................................................................................................................... 76 4 Transcription ........................................................................................... 77 4.1 Introduction ............................................................................................................................. 77 4.2 Transcription: human vs. machine ....................................................................................... 77 4.3 Approach to human transcription ........................................................................................ 83 4.4 The Spoken BNC1994 transcription scheme ..................................................................... 87 4.5 The Spoken BNC2014 transcription scheme: main features ........................................... 88 4.6 The Spoken BNC2014 transcription process ..................................................................... 98 4.7 Chapter summary .................................................................................................................. 100 5 Speaker identification ............................................................................. 102 5.1 Introduction ........................................................................................................................... 102 5.2 Speaker identification ........................................................................................................... 102 5.3 Pilot study .............................................................................................................................. 106 5.3.1 Pilot study (A): Certainty (pilot study recordings) .................................................... 106 ii 5.3.2 Pilot study (B): Certainty (Spoken BNC2014 recording) ......................................... 107 5.3.3 Pilot study (C): Inter-rater agreement (Spoken BNC2014 recording) ................... 107 5.4 Research Questions .............................................................................................................. 108 5.5 Methodological approach .................................................................................................... 109 5.5.1 Introduction .................................................................................................................... 109 5.5.2 Main study (A): a Spoken BNC2014 recording ......................................................... 110 5.5.3 Main study (B): the gold standard recording.............................................................. 111 5.6 Results .................................................................................................................................... 113 5.6.1 Main study (A1): Certainty in a Spoken BNC2014 recording ................................. 113 5.6.2 Main study (A2): Inter-rater agreement in a Spoken BNC2014 recording............ 114 5.6.3 Main study (B1): Certainty in the gold standard recording ...................................... 115 5.6.4 Main study (B2): Inter-rater agreement in the gold standard recording ................ 116 5.6.5 Main study (B3): Accuracy in the gold standard recording ...................................... 118 5.7 Discussion: what does this mean for the Spoken BNC2014? ........................................ 119 5.7.1 Individual transcriber variation .................................................................................... 120 5.7.2 Automated speaker identification ................................................................................ 120 5.7.3 The use of phonetic expertise ...................................................................................... 122 5.8 Affected texts and solutions implemented ........................................................................ 123 5.9 Chapter summary .................................................................................................................. 125 6 Corpus processing and dissemination ................................................... 128 6.1 Introduction ........................................................................................................................... 128 6.2 XML conversion ................................................................................................................... 128 6.3 Annotation ............................................................................................................................. 133 6.4 Corpus dissemination ..........................................................................................................
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages290 Page
-
File Size-