The Spoken British National Corpus 2014

The Spoken British National Corpus 2014

The Spoken British National Corpus 2014 Design, compilation and analysis ROBBIE LOVE ESRC Centre for Corpus Approaches to Social Science Department of Linguistics and English Language Lancaster University A thesis submitted to Lancaster University for the degree of Doctor of Philosophy in Linguistics September 2017 Contents Contents ............................................................................................................. i Abstract ............................................................................................................. v Acknowledgements ......................................................................................... vi List of tables .................................................................................................. viii List of figures .................................................................................................... x 1 Introduction ............................................................................................... 1 1.1 Overview .................................................................................................................................... 1 1.2 Research aims & structure ....................................................................................................... 3 2 Literature review ........................................................................................ 6 2.1 Introduction ............................................................................................................................... 6 2.2 The Spoken British National Corpus 1994 & other spoken corpora ............................... 7 2.2.1 The Spoken BNC1994 ...................................................................................................... 7 2.2.2 Other British English corpora containing spoken data ................................................ 9 2.2.3 Summary and justification for the Spoken BNC2014 ................................................ 11 2.3 Review of research based on spoken English corpora ...................................................... 12 2.3.1 Introduction ...................................................................................................................... 12 2.3.2 Corpus linguistics journals .............................................................................................. 13 2.3.3 Grammars ......................................................................................................................... 18 2.4 Summary .................................................................................................................................. 20 3 Corpus design .......................................................................................... 22 3.1 Introduction ............................................................................................................................. 22 3.2 Design & speaker recruitment .............................................................................................. 22 3.2.1 ‘Demographic’ corpus data ............................................................................................. 22 3.2.2 Design of the Spoken BNC1994DS ............................................................................. 23 3.2.3 Speaker recruitment in the Spoken BNC1994 ............................................................. 27 i 3.2.4 Design & recruitment in other spoken corpora .......................................................... 29 3.2.5 General approach to design in the Spoken BNC2014 ............................................... 31 3.2.6 Speaker recruitment in the Spoken BNC2014 ............................................................. 33 3.3 Speaker and text metadata ..................................................................................................... 37 3.3.1 Introduction ...................................................................................................................... 37 3.3.2 Metadata in the Spoken BNC1994 ................................................................................ 38 3.3.3 Metadata collection in the Spoken BNC2014: procedure and ethics ....................... 41 3.3.4 Speaker & text metadata collection procedures in the Spoken BNC2014 .............. 46 3.3.5 Age, linguistic region & socio-economic status: discussion ...................................... 48 3.4 Collection of audio data ......................................................................................................... 70 3.4.1 Introduction ...................................................................................................................... 70 3.4.2 Audio recording in the Spoken BNC1994 ................................................................... 70 3.4.3 Audio recording in the Spoken BNC2014 ................................................................... 70 3.5 Chapter summary .................................................................................................................... 76 4 Transcription ........................................................................................... 77 4.1 Introduction ............................................................................................................................. 77 4.2 Transcription: human vs. machine ....................................................................................... 77 4.3 Approach to human transcription ........................................................................................ 83 4.4 The Spoken BNC1994 transcription scheme ..................................................................... 87 4.5 The Spoken BNC2014 transcription scheme: main features ........................................... 88 4.6 The Spoken BNC2014 transcription process ..................................................................... 98 4.7 Chapter summary .................................................................................................................. 100 5 Speaker identification ............................................................................. 102 5.1 Introduction ........................................................................................................................... 102 5.2 Speaker identification ........................................................................................................... 102 5.3 Pilot study .............................................................................................................................. 106 5.3.1 Pilot study (A): Certainty (pilot study recordings) .................................................... 106 ii 5.3.2 Pilot study (B): Certainty (Spoken BNC2014 recording) ......................................... 107 5.3.3 Pilot study (C): Inter-rater agreement (Spoken BNC2014 recording) ................... 107 5.4 Research Questions .............................................................................................................. 108 5.5 Methodological approach .................................................................................................... 109 5.5.1 Introduction .................................................................................................................... 109 5.5.2 Main study (A): a Spoken BNC2014 recording ......................................................... 110 5.5.3 Main study (B): the gold standard recording.............................................................. 111 5.6 Results .................................................................................................................................... 113 5.6.1 Main study (A1): Certainty in a Spoken BNC2014 recording ................................. 113 5.6.2 Main study (A2): Inter-rater agreement in a Spoken BNC2014 recording............ 114 5.6.3 Main study (B1): Certainty in the gold standard recording ...................................... 115 5.6.4 Main study (B2): Inter-rater agreement in the gold standard recording ................ 116 5.6.5 Main study (B3): Accuracy in the gold standard recording ...................................... 118 5.7 Discussion: what does this mean for the Spoken BNC2014? ........................................ 119 5.7.1 Individual transcriber variation .................................................................................... 120 5.7.2 Automated speaker identification ................................................................................ 120 5.7.3 The use of phonetic expertise ...................................................................................... 122 5.8 Affected texts and solutions implemented ........................................................................ 123 5.9 Chapter summary .................................................................................................................. 125 6 Corpus processing and dissemination ................................................... 128 6.1 Introduction ........................................................................................................................... 128 6.2 XML conversion ................................................................................................................... 128 6.3 Annotation ............................................................................................................................. 133 6.4 Corpus dissemination ..........................................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    290 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us