Introducing Corpora
Total Page:16
File Type:pdf, Size:1020Kb
Masaryk University Faculty of Arts Department of English and American Studies English Language and Literature Jaroslav Blecha Compiling a Glossary of Terminology Used in Shoemaking Bachelor’s Diploma Thesis Supervisor: PhDr. Jarmila Fictumová 2009 1 I declare that I have worked on this thesis independently, using only the primary and secondary sources listed in the bibliography. …………………………………………….. Author’s signature 2 3 Acknowledgement I would like to express many thanks to my supervisor, PhDr. Jarmila Fictumová, for her kind and patient supervision, valuable comments, helpful suggestions and the time that she devoted to supervising my thesis. 4 Table of Contents Introduction ................................................................................................................... 7 1. Introducing Corpora .................................................................................................. 9 1.1. Early history and pre-electronic era .................................................................... 9 1.2. ‘The First Generation Corpora’ ........................................................................ 10 1.3. ‘The Second Generation Corpora’ .................................................................... 11 1.4. The British National Corpus (BNC) ................................................................. 11 2. What is a corpus? ..................................................................................................... 13 2.1. Origins of the word ........................................................................................... 13 2.2. Main features of a corpus ................................................................................. 13 2.3. Four criteria of a corpus .................................................................................... 14 2.4. Authenticity ..................................................................................................... 14 2.5. Interactiveness .................................................................................................. 14 2.6. Size .................................................................................................................. 16 2.7. Representativeness ............................................................................................ 17 3. Types of corpora ...................................................................................................... 19 3.1. General [reference] corpora .............................................................................. 19 3.2. Monitor corpora ................................................................................................ 20 3.3. Special purpose corpora .................................................................................... 20 3.4. Monolingual corpora ........................................................................................ 21 3.5. Bi- and multilingual corpora ............................................................................. 21 3.5.1. Comparable corpora ................................................................................ 22 3.5.2. Parallel corpora ........................................................................................ 22 3.6. Diachronic corpora ........................................................................................... 23 3.7. Synchronic corpora ........................................................................................... 23 3.8. Open corpora .................................................................................................... 24 3.9. Closed corpora .................................................................................................. 24 3.10. Written corpora ............................................................................................... 25 3.11. Sample corpora and full text corpora ............................................................. 25 3.12. Spoken corpora ............................................................................................... 25 3.13. Learner corpora ............................................................................................... 26 4. Corpus processing tools ........................................................................................... 27 4.1. Word lister ........................................................................................................ 27 4.2. Concordancer .................................................................................................... 28 4.3. Other tools ........................................................................................................ 28 4.3.1. Lemmatizer .............................................................................................. 28 4.3.2. Tagger ...................................................................................................... 29 4.3.3. Parser ....................................................................................................... 29 4.3.4 Aligner ...................................................................................................... 29 5. Terminology ............................................................................................................ 30 5.1. Emergence of terminology ............................................................................... 30 5.2. What is terminology? ........................................................................................ 31 5.3. Traditional approach ......................................................................................... 31 5.4. Pragmatic approach .......................................................................................... 33 5 6. Brief history of footwear and shoemaking .............................................................. 34 6.1. Early history .................................................................................................... 34 6.2. First records ...................................................................................................... 34 6.3. The Middle Ages .............................................................................................. 35 6.4. Modern era ........................................................................................................ 36 6.5. 1800-1945 ......................................................................................................... 36 6.6. Situation today .................................................................................................. 38 7. Corpus design and compilation ............................................................................... 38 7.1. Size ................................................................................................................... 38 7.2. Authenticity ...................................................................................................... 39 7.3. Open or closed? ................................................................................................ 39 7.4. Choice of sources .............................................................................................. 40 7.5. Types of texts .................................................................................................... 40 7.6. Obtaining the texts ............................................................................................ 41 7.7. Conversion into electronic form ....................................................................... 42 8. Compiling the glossary ............................................................................................ 43 8.1. Choice of term candidates and their verification .............................................. 43 8.2. Term selection .................................................................................................. 44 8.3. Terms in English ............................................................................................... 45 8.4. Verification of terms in the corpus .................................................................. 46 8.5. Adding the definitions ...................................................................................... 48 8.6. Context .............................................................................................................. 49 9. Corpus findings in selected terms ............................................................................ 49 9.1. Shoemaker – Cordwainer – Cobbler ................................................................ 50 9.2. Goodyear welted shoes ..................................................................................... 51 9.3. Veldtschoen – stitchdown - flexible shoes ....................................................... 53 9.4. Leather .............................................................................................................. 53 9.5. Loafer ................................................................................................................ 54 Conclusion ................................................................................................................... 56 Sources ......................................................................................................................... 58 Primary Sources ....................................................................................................... 58 Secondary Sources ..................................................................................................