Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC
Piotr Bański, Marc Kupietz, Harald Lüngen, Paul Rayson, Hanno Biber, Evelyn Breiteneder, Simon Clematide, John Mariani, Mark Stevenson, Theresa Sick (editors) Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section Birmingham, 24 July 2017 Challenges in the Management of Large Corpora and Big Data and Natural Language Processing 2017 Workshop Programme 24 July 2017 WAC-XI guest session (11.00 - 12.30) Convenors: Adrien Barbaresi (ICLTT Vienna), Felix Bildhauer (IDS Mannheim), Roland Schäfer (FU Berlin) Chair: Stefan Evert (Friedrich-Alexander-Universität Erlangen-Nürnberg) Edyta Jurkiewicz-Rohrbacher, Zrinka Kolaković, Björn Hansen Web Corpora – the best possible solution for tracking rare phenomena in underresourced languages – Clitics in Bosnian, Croatian and Serbian Vladimir Benko – Are Web Corpora Inferior? The Case of Czech and Slovak Vit Suchomel – Removing Spam from Web Corpora Through Supervised Learning Using FastText Lunch (12:30-13:30) CMLC-5+BigNLP main section: (13:30 –17:00) Welcome and Introduction 13:30-13:40 National Corpora Talks 13:40 – 15:40 Dawn Knight, Tess Fitzpatrick, Steve Morris, Jeremy Evas, Paul Rayson, Irena Spasic, Mark Stonelake, Enlli Môn Thomas, Steven Neale, Jennifer Needs, Scott Piao, Mair Rees, Gareth Watkins, Laurence Anthony, Thomas Michael Cobb, Margaret Deuchar, Kevin Donnelly, Michael McCarthy, Kevin Scannell, Creating CorCenCC (Corpws Cenedlaethol
[Show full text]