FIN-CLARIN and CLARIN

Infrastructure for Digital Humanities

Krister Lindén, National Coordinator of FIN-CLARIN CLARIN ERIC European Research Infrastructure Consortium • The founded on February 29, 2012 • CLARIN • • DLU www.clarin.eu / NL • Kielipankki / FIN-CLARIN • Språkbanken / SWE-CLARIN • IDS / CLARIN.DE • … • International cooperation NTU / Dutch CLARIN Center • UK and sharing of resources • USA / CMU

2 FIN-CLARIN partners www.kielipankki.fi: • University of Helsinki Coordinate the activity and provide access to large centrally acquired resources and tools • CSC – IT Center for Science

• KOTUS – Institute for the Languages of Finland • Aalto University • University of Eastern Finland • University of Jyväskylä Provide access to resources and tools developed locally by individual researchers or • University of Oulu research groups • University of Tampere • University of Turku • University of Vaasa

FIN-CLARIN Corpora for access or download Gw = billion words, Mw = million words, h = hours

Resources 2017 2022 Text Magazines and newspapers 1770- (NLF and Web publ.) 12 Gw 20 Gw Social media and similar sources 2000- (Suomi24, Ylilauta, …) 4 Gw 10 Gw Literature and manuscripts (Gutenberg, Fennica, archives) 60 Mw 70 Mw Speech News broadcasts (YLE) 10000 h Currently, Video sessions from the Finnish Parliament 2008-2016 500 h 1000 h FIN-CLARIN has approx. 19 GW in Dialect and everyday speech (Kotus, Turku) 500 h 1000 h >1400 databases Sign language resources (Aalto, Kuurojen liitto) 20 h 500 h Multilingual and Other Resources Multilingual Resources (EuroParl, laws, Bible, subtitles, …) 3 Gw 10 Gw Learner’s resources (Oulu, Jyväskylä, Kotus, Aalto) 2 Mw 5 Mw Open source lexicons and terminologies (Helsinki, Tromssa) 300 Kw 400 Kw Web Tools

Text, Speech and Lexical data

Visualizations of search results Trend Diagrams Links to audio and video data Links to audio and video data

Links to pictures and manuscripts Social Network Analysis Training and Expertise

Training and Education http://clarin.eu/content/knowledge-centres: the CLARIN Knowledge Sharing Infrastructures take care of the sharing of knowledge and expertise, education, training and dissemination, e.g. • Phonogrammarchiv - Institute for audio-visual Research and Documentation (Austrian Academy of Science) • Audio-visual fieldwork and long-time preservation of audio-visual data • CLARIN-SPEECH - KTH Royal Institute of Technology, Dept. of Speech Music and Hearing • Technical advice on speech analysis • Language Learning Analysis - TalkBank Project @ Carnegie Mellon University (USA) • Technical advice on tools and methods for the study of first and second language learning, and recovery from aphasia and other language disabilities • Humanities Lab - Lund University • Advice on multimodal and sensor-based methods, including EEG, eye-tracking, articulography, virtual reality, motion capture, av-recording ... Training and User Involvement

Community Engagement https://www.clarin.eu/events: CLARIN PLUS runs and regularly organizes expert seminars and workshops as well as researcher exchange programs

Past workshops: • Exploring Spoken Word Data in Oral History Archives, 18-19 April 2016, Oxford (UK) • Working with Digital Collections of Newspapers, 19-21 September 2016, Leuven • Working with Parliamentary Records, 27-29 March 2017, Sofia (Bulgaria) • Creation and Use of Social Media Resources, 18-19 May 2017, Kaunas (Lithuania) • Workshop on interoperability of L2 resources and tools, 6-8 December, Gothenburg Coming workshops: • CLARIN Workshop on Translation memories, corpora, termbases: Bridges between translation studies and research infrastructures, 8-9 February 2018, Vienna (Austria) • Parlimentary Records (ParlaCLARIN@LREC2018), 7 May 2018, Miyazaki (Japan) FIN-CLARIN

Training and Education • Basics of Text Analytics and Corpus Linguistics, 5 cr, (Information retrieval with Korp) • Basics of Speech Annotation and Analysis, 5 cr, (Praat and ELAN) • Corpus Clinic, 5 cr (Data management, Annotation methods and tools, RStudio)

FIN-CLARIN roadshows and events https://www.kielipankki.fi/tapahtumat/ • Speech annotation workshop / University of Turku • FIN-CLARIN roadshow / Helsinki Collegium of Advanced Studies • Presentations at XLIV Finnish Conference of Linguistics / University of Jyväskylä • Demo and Presentation at Language Center Days / University of Eastern Finland • Demo and Presentation for Principal Investigators / UHEL Arts and Humanities • CLARIN PLUS Workshop on User Involvement / University of Helsinki • Demos at Historical Network Research conference / University of Turku

Corpus and data-related advice: [email protected] Technical support (servers, access rights, virtual workspace etc.): [email protected] Project Support by Kielipankki and FIN-CLARIN

• Ancient Near East Center of Excellence – data access, methodology, web service hosting (language, social science and archeology) • Helsinki Termbank of Arts and Science – access to data, web service hosting, co-operation on term mining (terminology) • Citizens’ Mindscape : https://www.laaketutka.fi/– data access and web service hosting (social sciences, national institute for health and welfare) • ComHis Turku – news archive access (history and language processing) • DMA – Archive of digital morphology (social and geographical linguistics) • Name archive : https://nimiarkisto.fi/ - GIS data, stories, pictures, ... (geography, social sciences, history, ...) • ...