Best Practices for Speech Corpora in Linguistic Research

Best Practices for Speech Corpora in Linguistic Research Workshop Programme 21 May 2012 14:00 – Case Studies: Corpora & Methods Janne Bondi Johannessen, Øystein Alexander Vangsnes, Joel Priestley and Kristin Hagen: A linguistics-based speech corpus Adriana Slavcheva and Cordula Meißner: GeWiss – a comparable corpus of academic German, English and Polish Elena Grishina, Svetlana Savchuk and Dmitry Sichinava: Multimodal Parallel Russian Corpus (MultiPARC): Multimodal Parallel Russian Corpus (MultiPARC): Main Tasks and General Structure Sukriye Ruhi and E. Eda Isik Tas: Constructing General and Dialectal Corpora for Language Variation Research: Two Case Studies from Turkish Theodossia-Soula Pavlidou: The Corpus of Spoken Greek: goals, challenges, perspectives Ines Rehbein, Sören Schalowski and Heike Wiese: Annotating spoken language Seongsook Choi and Keith Richards: Turn-taking in interdisciplinary scientific research meetings: Using ‘R’ for a simple and flexible tagging system 16:30 – 17:00 Coffee break i 17:00 – Panel: Best Practices Pavel Skrelin and Daniil Kocharov Russian Speech Corpora Framework for Linguistic Purposes Peter M. Fischer and Andreas Witt Developing Solutions for Long-Term Archiving of Spoken Language Data at the Institut für Deutsche Sprache Christoph Draxler Using a Global Corpus Data Model for Linguistic and Phonetic Research Brian MacWhinney, Leonid Spektor, Franklin Chen and Yvan Rose TalkBank XML as a Best Practices Framework Christopher Cieri and Malcah Yaeger-Dror Toward the Harmonization of Metadata Practice for Spoken Languages Resources Sebastian Drude, Daan Broeder, Peter Wittenburg and Han Sloetjes Best practices in the design, creation and dissemination of speech corpora at The Language Archive 18:30 Final Discussion ii Editors Michael Haugh Griffith University, Australia Sukryie Ruhi Middle Easter Technical University, Ankara Thomas Schmidt Institut für Deutsche Sprache, Mannheim Kai Wörner Hamburger Zentrum für Sprachkorpora Workshop Organizers/Organizing Committee Michael Haugh Griffith University, Australia Sukryie Ruhi Middle Easter Technical University, Ankara Thomas Schmidt Institut für Deutsche Sprache, Mannheim Kai Wörner Hamburger Zentrum für Sprachkorpora Workshop Programme Committee Yeşim Aksan Mersin University Dawn Archer University of Central Lancashire Steve Cassidy Macquarie University, Sydney Chris Christie Loughborough University Arnulf Deppermann Institute for the German Language, Mannheim Ulrike Gut University of Münster Iris Hendrickx Linguistics Center of the University of Lisboa Alper Kanak Turkish Science and Technology Institute – TÜBİTAK Kemal Oflazer Carnegie Mellon at Qatar Antonio Pareja-Lora ILSA-UCM / ATLAS-UNED Petr Pořízka Univerzita Palackého |Jesus Romero-Trillo Universidad Autonoma de Madrid Yvan Rose Memorial University of Newfoundland Martina Schrader-Kniffki University of Bremen Deniz Zeyrek Middle East Technical University iii Table of contents Janne Bondi Johannessen, Øystein Alexander Vangsnes, Joel Priestley and Kristin Hagen A linguistics-based speech corpus ....................................................................................................... 1 Adriana Slavcheva and Cordula Meißner GeWiss – a comparable corpus of academic German, English and Polish .......................................... 7 Elena Grishina, Svetlana Savchuk and Dmitry Sichinava Multimodal Parallel Russian Corpus (MultiPARC): Multimodal Parallel Russian Corpus (MultiPARC): Main Tasks and General Structure ............................................................................. 13 Sukriye Ruhi and E. Eda Isik Tas Constructing General and Dialectal Corpora for Language Variation Research: Two Case Studies from Turkish ...................................................................................................................................... 17 Theodossia-Soula Pavlidou The Corpus of Spoken Greek: goals, challenges, perspectives .......................................................... 23 Ines Rehbein, Sören Schalowski and Heike Wiese Annotating spoken language .............................................................................................................. 29 Seongsook Choi and Keith Richards Turn-taking in interdisciplinary scientific research meetings: Using ‘R’ for a simple and flexible tagging system.................................................................................................................................... 37 Pavel Skrelin and Daniil Kocharov Russian Speech Corpora Framework for Linguistic Purposes ........................................................... 43 Peter M. Fischer and Andreas Witt Developing Solutions for Long-Term Archiving of Spoken Language Data at the Institut für Deutsche Sprache ............................................................................................................................... 47 Christoph Draxler Using a Global Corpus Data Model for Linguistic and Phonetic Research....................................... 51 Brian MacWhinney, Leonid Spektor, Franklin Chen and Yvan Rose TalkBank XML as a Best Practices Framework ................................................................................ 57 Christopher Cieri and Malcah Yaeger-Dror Toward the Harmonization of Metadata Practice for Spoken Languages Resources ........................ 61 Sebastian Drude, Daan Broeder, Peter Wittenburg and Han Sloetjes Best practices in the design, creation and dissemination of speech corpora at The Language Archive ............................................................................................................................................................ 67 iv Author Index Broeder, Daan .................................................................................................................................... 67 Chen, Franklin .................................................................................................................................... 57 Choi, Seongsook ................................................................................................................................ 37 Cieri, Christopher ............................................................................................................................... 61 Draxler, Christoph .............................................................................................................................. 51 Drude, Sebastian ................................................................................................................................ 67 Fischer, Peter M. ................................................................................................................................ 47 Grishina, Elena ................................................................................................................................... 13 Hagen, Kristin ...................................................................................................................................... 1 Isik Tas, E. Eda .................................................................................................................................. 17 Johannessen, Janne Bondi .................................................................................................................... 1 Kocharov, Daniil ................................................................................................................................ 43 MacWhinney, Brian ........................................................................................................................... 57 Meißner, Cordula ................................................................................................................................. 7 Pavlidou, Theodossia-Soula ............................................................................................................... 23 Priestley, Joel ....................................................................................................................................... 1 Rehbein, Ines ...................................................................................................................................... 29 Richards, Keith................................................................................................................................... 37 Rose, Yvan ......................................................................................................................................... 57 Ruhi, Sukriye ..................................................................................................................................... 17 Savchuk, Svetlana .............................................................................................................................. 13 Schalowski, Sören .............................................................................................................................. 29 Sichinava, Dmitry .............................................................................................................................. 13 Skrelin, Pavel ..................................................................................................................................... 43 Slavcheva, Adriana .............................................................................................................................. 7 Sloetjes, Han .....................................................................................................................................

Best Practices for Speech Corpora in Linguistic Research

Preparation and Exploitation of Bilingual Texts Dusko Vitas, Cvetana Krstev, Eric Laporte

Talk Bank: a Multimodal Database of Communicative Interaction

The Iafor European Conference Series 2014 Ece2014 Ecll2014 Ectc2014 Official Conference Proceedings ISSN: 2188-1138

Kernerman Kdictionaries.Com/Kdn DICTIONARY News the European Network of E-Lexicography (Enel) Tanneke Schoonheim

TANGO: Bilingual Collocational Concordancer

The Procedure of Lexico-Semantic Annotation of Składnica Treebank

Child Language

Multimedia Corpora (Media Encoding and Annotation) (Thomas Schmidt, Kjell Elenius, Paul Trilsbeek)

Conference Abstracts

E-Content Submission to INFLIBNET Subject Name Linguistics Paper Name Corpus Linguistics Paper Coordinator Name & Contact Mo

Conferenceabstracts

Workshopabstracts