<<

Digital humanities at the University of : state of the art Andres Kimber, Liina Lindström, Peeter Tinits , 27.09.2019 Centre for Digital Humanities and Information Society of the

Founded in 2018; some actions already before it

Goal: develop interdisciplinary teaching and research in the field of DH and IS

Partners: all institutes in the Faculty of Arts and Humanities, Institute of Computer Science, Institute of Social Studies, Tartu University Library

Council

digihum.ut.ee

Muide avastasin, et kult-evo-sem alustas 2015 ja DH oli esimeses kutses märksõna. Pmst temaatiline asi, mis toimus TÜs, aga ei tea kas seostub. Staff

Head of the Centre: Liina Lindström (Institute of Estonian and General Linguistics)

Head of the Council: Andra Siibak (Institute of Social Studies)

Project manager: Andres Kimber (starting 1.10. Ann Siiman)

Specialist in DH: Peeter Tinits

Lecturer in Computational Linguistics: Siim Orasmaa

Junior Researcher in Applied Dialectology: Maarja-Liisa Pilvik

Visiting lecturer in DH: Joshua Wilbur

Visiting lecturer in DH: Artjoms Šela Learning and teaching Digital Humanities Programme & courses

Elective module of DH for all MA programmes in the Faculty of Arts and Humanities since 2017

Minor in DH since 2019/20

Funding from HITSA

Elective courses by visiting lecturers

Summer Schools Digital Methods in Humanitites and Social Sciences 2018, 2019

Our main target group has been MA and PhD students Guest lecturers

Visiting lecturers in DH since 2016

Funded by ASTRA

Lecturers with different disciplinary backgrounds in order to offer courses to students from different fields

List of lecturers:

David Lorenz (quantitative linguistics), Leandro Ezequiel Koile (phyologenetic & quantitative linguistics), Néhémie Strupler (computational archaeology), Artjoms Šeļa (literary studies), Kimmo Elo (social sciences), Joshua Wilbur (documentary linguistics/language technology), Timothy Tangherlini (computational folkloristics) Activities

Linguistics (dialectology, historical linguistics, corpus linguistics, phonetics, documentary linguistics)

Archaeology (GIS, photogrammetry)

Literary Studies (stylometry)

Cultural Evolution (computational methods)

Social Science

NLP tools Case studies: Corpus-based dialectology

Liina Lindström, Maarja-Liisa Pilvik et al.

Corpus of Estonian Dialects, compiled 1998-2018

Recordings (from 1960-1970s) → transcriptions → morphological tagging → data analysis & visualization on maps

Quantitative variation analysis; frequency & frequency maps; distribution of dialectal features etc.

Maps & GIS applications for dialect research: http://rurake.keeleressursid.ee/index.php/apps/

Case studies: Corpus-based dialectology

Simple frequency maps: 1sg pronoun omission rate in Estonian dialects

Combined with methods used in variation studies: (mixed-effect) logistic regression models, conditional inference trees and random forests, etc. Case studies: Communal court minute books

● Some materials from 1866 to 1890 previously digitized in 2004 ● in html-format ● 22 municipalities, 2867 files ● Server crash → rescued via web archives Case studies: Communal court minute books

Aigi Rahi-Tamm, Kadri Muischnek, Liina Lindström, Maarja-Liisa Pilvik, Siim Orasmaa, Gerth Jaanimäe jt in collaboration with Estonian National Archives

Cleaning and morphological annotating the texts; Named Entity Recognition

EstNLTK tools (Python libraries for analyzing Estonian)

Highly varying language: North and (dialectal features); old and new spelling system; handwritten texts

Annotated texts available in the Corpus of Old Literary Estonian

Crowdsourcing platform opened in 2019:

http://www.ra.ee/vallakohtud/ Case studies: Spelling variation and prescription

1880-1920 processes of standardization

Text corpora & dictionary advice Case studies: Historical biographies

Metadata on culturally significant people (1800-1930) => database.

Useful for cultural history, linguistic analyses, etc.

Visualization link Case studies: cooperation with GLAMs

Cooperation with Estonian National Library on text corpora Case studies: Cultural evolution of films

1910-2019 film crews Growth in:

Size Structure Complexity Case studies: unmasking academic “forgery” - 1978: Collection of poems by Gavriil Batenkov (1793-1863). Published by literary scholar Aleksandr Iliushin. - ~40% of texts in the collection don’t have a confirmed manuscript source

Possible forgery?

- Extensive & close linguistic and formal examination led to inconclusive results (Shapir 2000) - A way to solve: “unmasking” (Koppel et al. 2012) method + using multi-level features together (lexical, morphological, versification: rhythm, rhyme) - Unmasking is basically asking: how author A is behaving in relation to (their)self? How vs. others? Is Pseudo-author similar to actual author A? Case studies: unmasking academic “forgery” How real same-author samples behave?

Grey lines: author vs. others

Red line: demonstrates how each particular author classifies vs. themselves: FAST DECREASE, LOW PRECISION RATES Case studies: unmasking academic “forgery” How Pseudo-Batenkov behaves vs. “real” self? Case studies: Archaeological data management

● Archaeological data in four different databases ● COST Action: SEADDA (Saving European Archaeological Data from the Digital Dark Age) Estonian Museum Information System muis.ee National registry of cultural monuments register.muinas.ee Case studies: Archaeological data management

● Digitalisation of reports and local lore ● Working towards aggregating data for FAIR principles University of of Tartu arheoloogia.ee tara.ut.ee

Distribution of Estonian archaeological sites Case studies: 3D models of cultural heritage

● Photogrammetry, laser scanning & RTI ● Lacking official data management and metadata guidelines

Cellar of a 14th century merchant house in Tartu Mummy of a boy (4.-2. century BC) at UT

Model by Andres Kimber / University of Tartu Model by Ragnar Saage / University of Tartu https://skfb.ly/6NG7O https://skfb.ly/6HrZO Case studies: GIS and spatial analysis in archaeology

● Mapping, fieldwork planning and reporting ● Analysing land use patterns and landscape perception

Total viewshed analysis of Rebala Heritage Predictive model of settlement locations Reserve

Andres Kimber / University of Tartu Model by Allar Haav / University of Tartu Case studies: Documentary Linguistics and Language Technology (Pite Saami)

Automatic annotations: ● word ● lemma ● part of speech ● morphology ● English glosses Case studies: Text mining cultural transitions

Deep Transitions in socio-technical systems (1900-2019)

4-year project in Social Science

Newspaper texts -> social history Events

Cultural Evolution Seminar (2015-2018) DH-lab

Room to work in every Friday

Learning together, works in progress Summer School Digital Methods in Humanities and Social Sciences 2018, 2019

~100 students

4-5 days of workshop

Target group: PhD&MA

4 graduate schools DH conferences

In collaboration with Estonian Digital Humanities Society and other institutions. Thank you!