Corpus Resources
Total Page:16
File Type:pdf, Size:1020Kb
Corpus Linguistics: Tools and Resources IT Services Course Hilary Term 2015 Tips Please feel free to get in touch with Ylva Berglund Prytz ([email protected]) or Martin Wynne ([email protected]) with any questions. Explore the Corpora mailing list http://www.hit.uib.no/corpora/. You can sign up and ask a question on the list, or search the archive for questions and answers in the past. A software application which you can use for doing corpus linguistics with texts and corpora on your own computer is AntConc (http://www.antlab.sci.waseda.ac.jp/). It is free, and it is very simple to find, download and install. It has the main functions such as concordance, collocation, wordlists, etc., and built-in support for many languages and writing systems. There are versions for Windows, Mac and Linux. RESOURCES For modern European languages in particular, the Virtual Language Observatory at http://www.clarin.eu/vlo/ is increasingly becoming the one-stop shop, and is constantly added to and kept up to date. Here is a selection of corpora available online: English Brigham Young Corpora (BNC, American English, Time) http://corpora.byu.edu/ British National Corpus http://ota.oerc.ox.ac.uk/bncweb-cgi/BNCweb.pl/ (full access for Oxford users), http://www.natcorp.ox.ac.uk/, http://bncweb.info/ The Compleat Lexical Tutor concordances http://www.lextutor.ca/conc/ ELISA (interviews on film + transcription) http://www.uni-tuebingen.de/elisa/ MICASE Michigan Corpus of Academic Spoken English http://www.lsa.umich.edu/eli/micase/ Oxford English Corpus (more than 2 billion words and counting) http://dws-sketch.uk.oup.com/bonito/home.html (log-in required - ask Martin Wynne) Phrases in English (multiword expressions in the BNC) http://phrasesinenglish.org/ Chinese The Lancaster Corpus of Mandarin Chinese (download from OTA) http://www.ota.ox.ac.uk/headers/2474.xml Czech Czech National Corpus http://ucnk.ff.cuni.cz/ Finnish Korp – access to various corpora https://korp.csc.fi/ French ABU: la Bibliothèque Universelle (Online texts) http://abu.cnam.fr/ Ylva Berglund Prytz ([email protected]) and Martin Wynne ([email protected]) Corpus français (Université de Leipzig) http://wortschatz.uni-leipzig.de/ws_fra/ Online Concordancers at The Compleat Lexical Tutor French and English corpora with online concordancer http://www.lextutor.ca/concordancers/ German Das digitale Wörterbuch der deutschen Sprache http://www.dwds.de/ Institut fűr Deutsche Sprache http://corpora.ids-mannheim.de/ Italian MultiSemCor English and Italian parallel corpus http://multisemcor.itc.it/ Portuguese Corpus do Português http://www.corpusdoportugues.org/ COMPARA – parallel Portuguese-English http://www.linguateca.pt/COMPARA/ Russian Russian National Corpus (Национальный корпус русского языка) http://ruscorpora.ru/ Swedish Språkbanken (Swedish corpora) http://spraakbanken.gu.se/ Spanish Corpus del Español http://www.corpusdelespanol.org/ SOL – Spanish Online Concordancias españolas en la Web http://spraakbanken.gu.se/lb/konk/rom2/ Multi-Lingual Corpuseye Danish project with resources in different languages http://corp.hum.sdu.dk/ Intellitext Online interface to corpora in English, Chinese, Arabic, French, German, Italian, Japanese http://corpus.leeds.ac.uk/it/ KWICfinder make concordances of webpages http://www.kwicfinder.com/ SACODEYL multi-media, teenagers http://www.um.es/sacodeyl/ WebCorp concordances of from online texts http://www.webcorp.org.uk/ ARCHIVES: TEXT, CORPORA, MEDIA American Rhetoric project Text, audio and (streaming) video. http://www.americanrhetoric.com Internet Archive Text, audio, video http://www.archive.org Oxford Text Archive http://ota.ox.ac.uk/ (see 'Catalogue' and 'Oxford' pages) OxLip+ for electronic text collections http://oxlip-plus.bodleian.ox.ac.uk/ Ylva Berglund Prytz ([email protected]) and Martin Wynne ([email protected]) .