A Nordic seminar “How can we use corpora?”

12–13 December 2013, Copenhagen,

A Summary on the discussion held at the seminar

At the seminar there were over 40 participants from five . Only Faroe Islands, Greenland and the Sami area were not represented. The purpose of the seminar was to discuss how sign language corpuses can be used by people working in different occupations or areas of interest, like research, language planning or guidance, lexicography, teaching and interpreting.

The seminar was arranged by the Corpus working group, which is one of the three working groups that were set up after the first Nordic meeting of sign language planners in Copenhagen in March 2012 http://nordiskateckensprak.wordpress.com/2012/06/.

The status of sign language corpus work in each Nordic country

Sweden

Sweden is the only Nordic country, which already has a sign language corpus. The corpus work has been conducted by the Sign Language Section at the Stockholm University. So far the largest corpus, 25 hours of signed dialogue, was gathered in a three-year project in 1999–2011. Of this material approximately five hours has been annotated with Swedish glosses and translations, and this material is at present freely available online (see information in Swedish http://www.ling.su.se/teckenspr%C3%A5kskorpus/information-om-korpusprojektet). In addition to this, Sign Language Section has currently two smaller-scale corpus projects. One on the tactile sign language (Signing space in for persons with acquired deafblindness) and one on as a second language (L2).

Finland

In there are a few projects at the moment in which material for a corpus of Finnish and Finland-Swedish Sign Language is being gathered and produced. At the University of Jyväskylä in the Sign Language Centre professor Ritva Takkinen leads a project which aims to record up to 100 signers of which at least 10 is planned to be users of FinSSL. Ten FinSL users have been recorded already. Recordings are made in a studio with six cameras shooting from different angles. Both dialogue and monologue is being gathered and each signer is given the same eight tasks according to which they act during the recordings. In addition to Takkinen’s project, a smaller, special corpus will be compiled in Tommi Jantunen’s ProGram project 2013–2018. A third project in which recordings of FinSL signing are being made is a long-term collaborative study conducted by the Sign Language Centre and the Service Foundation for the Deaf . In this project it is researched how children, who use a cochlear implant, are adapting and using their two languages, FinSL and Finnish. For ethical reasons the recordings made in this project cannot be made available to anybody outside the project.

In the Corpus and SignWiki Project (2013–2015) run by the Finnish Association of the Deaf and the University of Applied Sciences (Humak) new signed material is being recorded and already existing material will be prepared for the corpus. One main aim in the project is also to lay foundations for different ways of producing material for the corpus starting from user licenses, copyright issues, video formats, the metadata and different corpus tools, so that later, by following these procedures, anyone could take his/her material into the corpus.

In Finland also the automatic recognition and annotation of sign language is being developed in the CoBaSil Project (2011–2014) at the Aalto University http://research.ics.aalto.fi/cbir/cobasil/.

All the above listed parties co-operate with the FIN-CLARIN consortium https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/KielipankkiFrontpage and all language material they produce or acquire will be deposited to the Language Bank of Finland.

Norway

Various material exists, but none of it has been taken to an archive or corpus where it could be easily accessed for different purposes. As examples of the existing material could be mentioned the signed news, Tegnspråknytt, produced by NRK, the Norwegian Broadcasting Corporation http://tv.nrk.no/serie/tegnspraaknytt and the various material produced for the website of the National Association of the Deaf http://www.deafnet.no. National Library of (Nasjonalbiblioteket) is now willing to archive the sign language material and to act as its distributor. At the present there are though no resources to do the concrete work of acquiring the material (choosing the material, writing the metadata, clear up the user rights, copyright issues etc.). Funding for a corpus project has been applied, but so far not succeeded.

Iceland

In funding for sign language corpus work has not yet been found. What is interesting is, that compared to bigger signer communities in Sweden, Denmark, Finland and Norway, in Iceland it could be aimed to compile a concise corpus in the sense, that every member of the signing community could be filmed. There are on an average 200 deaf signers in Iceland. At the moment only the SignWiki site http://is.signwiki.org offers open access material of . In addition to the example sentences longer signed texts on varied issues has been published on the site.

Denmark

In Denmark funding for sign language corpus work has been applied for several times, but so far no money has been granted. Material that could be taken to the corpus exists scattered as in Sweden, Finland and Norway. Programs in sign language are being produced, e.g. by Danish Broadcasting Corporation, DR http://www.dr.dk/DR1/Tegnsprog/, and varied signed material is also published on the website of the Danish Deaf Association http://www.deaf.dk/. The online dictionary of DSL http://tegnsprog.dk/ is exceptional among the world’s sign language dictionaries in the sense that the example sentences are not only translated into Danish but also annotated by Danish glosses. Every sign that appears as a head sign of a dictionary entry or is used in an example sentence can be looked for where else in the example sentences it is used (click the button “K” in an entry; K = concordance). With this feature the Danish dictionary is not only a dictionary but also a small corpus.

Faroe Islands and Greenland

In Faroe Islands and Greenland no sign language corpus work has yet been conducted. In addition to the lack of funding they also lack the human resources, as there are no linguistically trained people who would know these sign languages. To collect and process material for the corpuses on the sign languages in Faroe Islands and Greenland may need a different kind of approach. Perhaps linguistic fieldwork collaboratively with another Nordic country?

Nordic co-operation

The last time we have had a seminar on the linguistic issues concerning the Nordic sign languages was at Stockholm University in 2007 and in 2008. The theme of those meetings was the dictionaries. Now, in this seminar, everybody agreed that it is time to refresh the Nordic co- operation, and it was decided to aim for a yearly meeting (seminar or workshop), which can have a changing theme like corpus work, dictionary work or producing of teaching material.

The next meeting was agreed to be held in 2014 at the University of Jyväskylä (the Sign Language Centre, professor Ritva Takkinen). The practical arrangements will be shared between Stockholm University, Council and the University of Jyväskylä.

It was also suggested, that besides these larger meetings, we could have a smaller-scale corpus co-operation going on between the Nordic corpus projects.

It was also asked, that could it be possible to educate Nordic sign language interpreters, i.e. interpreters who could work in Nordic events?

Encouraging people to start recording their sign language in various occasions was raised several times during the discussion. We should not only wait for the big, nice corpus project to start and to do all the work, but we all can contribute right away. One requirement though is, that one always gets a written permission from signers in which it is defined how and by whom the collected material can be used. Otherwise it can never be put to an archive or corpus. A Nordic co-operation was suggested for creating permission forms, which also the laymen can use. The forms that have been used in sign language corpus projects in Sweden and in Finland can be used as a starting point, but one has to bear in mind, that different countries may have differing regulations that has to be taken into account.

What could we gain from the Nordic co-operation?

It is well known that our Nordic sign languages are genetically related. Swedish, Finnish and Finland-Swedish Sign Language are related with each other, and so are also Danish, Norwegian and Icelandic Sign Language and the Sign Language of Faroe Islands and the one used in Greenland. In addition to this, deaf people in Nordic countries have long traditions of collaboration. It would thus be very interesting to get to know, e.g., how this history has left marks into our sign languages. What kind of differences and similarities there are? Further, in linguistics of spoken languages many linguistic phenomena has been much better understood when the structure of different languages has been compared. The same applies of course to the sign languages.

Onno Crasborn suggested, that we use the web page of the Sign Linguistics Corpora Network http://www.ru.nl/slcn/ as a starting point when beginning the Nordic co-operation on corpus work.

The old 16 and 8 mm films

Probably in all Nordic countries various institutes and individuals have in their possession old 16 mm and 8 mm films from 1950s, 1960s and 1970s containing signing. Many of these films may already be in bad condition (a hint of deterioration can be, e.g. a smell of vinegar or chlorine) and must be handled with care, which means that their digitization should be given to a specialist.

In Finland the Museum of the Deaf have listed all the films that it has and small samples of them have even been digitalized. The list is made according to what is said on the film labels. It is thus unfortunately impossible to know how much signing each film contains without watching the films.

In Denmark lots of old recordings exist as well, at least from 1970 onwards. 8 and 16 mm films exist too, but they may have been partly destroyed, because the cellar where they were stored in was flooded.

One Nordic project could be to start saving this material in digital form into an archive. In co- operation we could first check, if there are material filmed at the same event in different countries and perhaps even copies of the same film, in which case it wouldn’t be sensible to digitalize it many times. Some films may also have been digitalized by the national broadcasting companies, which is thus also worthwhile to check.

The importance but slowness of annotation work

Only annotated video makes it possible to base the research on real language usage. However, annotating sign language material is very time-consuming. The basic annotation, which contains the glosses and a spoken language translation, is estimated by experienced sign language annotators to take about 10 hours work per 1 minute.

So, one solution is to get more money for the annotation work. Easier said than done, but we have to continue applying for funding.

It was also discussed, that in future we hopefully have software, that can recognize signing and can do the annotation for us. There are several project going on in different countries in which automatic recognition of sign language is being developed.

What kind of material to annotate? Where to allocate our resourses? In a colloquial dialogue we can see the language as it is used by deaf signers in every-day life including all natural, individual and contextual variance. To be able to answers questions concerning, e.g. a correct sentence structure in a well-planned signing used in tv news and alike, we though need a corpus of that particular kind of language usage. In FinSL corpus work is aimed to annotate both colloquial and well-planned signing in parallel for several reasons, e.g. in order to be able to compare these different styles of language usage with each other.

Archiving sign languages

As the annotating is so slow, both in Sweden and in the Netherlands the sign language corpus people have begun to think, that more efforts could be put to pure archiving our sign languages. For the archived material we can try to acquire as wide user rights as possible, and that way we we can gradually grow kind of a digital sign language library, which can be used for various purposes.

It was noted too, that also small corpuses are interesting, and that they should not be disregarded.

It was also commented, that one of the biggest challenges is the lack of lexicon. Our sign language dictionaries are still guite restricted in their content and that hinders, e.g. the corpus annotation.

The Language Archive (TLA) of the Max Planck Institute for Psycholinguistics (MPI) http://sign.let.ru.nl/groups/slcwikigroup/wiki/9e151/Depositing_your_sign_data_at_The_Langu age_Archive_TLA_of_the_Max_Planck_Institute_for_Psycholinguistics_MPI.html is willing to host whatever sign language material we have. Onno Crasborn suggested, that we all use it for archiving. At the same time as our own corpus will be exploited, we can search all corpuses at the same time and compare the material in them more easily. Further, by giving our material, it is possible that people at MPI get a better understanding of the needs of sign languages and begin to develope better corpus tools for us.

Funding the Nordic corpus work

It would probably be difficult to get Nordic money for corpus work in general, but we could apply money for a parallel Nordic corpus. In a parallel corpus the same text, e.g. a story, would be signed in all Nordic sign languages, and the outputs would be glossed and translated both in the country’s native spoken language and in another spoken language which would be more widely understood, possibly English, if we want the corpus be accessible internationally. It was noted, that the application must be wide (i.e. realistic) enough, so that it is understood how slow the annotation work is.

Moreover, it was asked, could we apply money from the European Union, or apply for business money?

The worry about loosing our sign languages and the threat of radical changes in their structure and lexicon

The worry on the disappearance of our Nordic sign languages was brought up by many during the discussion. As most of the deaf children today get a cochlear implant and go to school intergrated in general education, they seldom have daily contacts with other deaf children or adults. The situation being this, the children don’t really learn the language, even if they and their family would try hard.

The fact seems to be, that for whatever reason, some features in our sign languages are disappearing, language changes, more or less radically. Because of that it is very important to now collect as much signed material as possible, even though we don’t have the resources to annotate it. Corpus can serve either as a booster to sign language or as an archive that people can use afterwards to revive the language.

An example of disappearing feature can be mentioned the mathematical signing that Päivi Rainò in Finland is collecting and studying. A short description of the phenomena by Rainò: ”Visual counting is carried out using fingers, both hands and the three-dimensional neutral space in front of the signer. Fingers, hands and space are used as buoys to retrieve, for example, subtotals in a regular, syntactically well-defined manner. The process is intelligible for native signers monitoring the calculation process, but strangely enough, not for hearing teachers working in deaf schools or hearing interpreters of . From day care to adult education, signed calculations made by sign language users are incorrectly interpreted as merely “counting on fingers”, that is perceived to hinder development of their mental arithmetic skills. Sign language calculations are not actually counting on fingers. Specific hand movements in front of the body resemble a working memory that presents relationships between numbers visually and actual calculations are performed mentally.”

There are also differences in each Nordic country’s situation and in the threat of language disappearance. For example in Sweden the (re)habilitation and education of both deaf and hard of hearing children view sign language more positively than on average in other Nordic countries. It is thus somewhat easier for children and their families to aquire and maintain the sign language skills in Sweden. And so, it was optimistically commented by a Swedish participant that sign languages will survive, and that all this corpus and dictionary work, language planning etc. will partly secure that.

Deaf people’s involvement in corpus making

The issue of deaf people’s involvement in corpus making was discussed from different angles. The corpus itself constitutes of material signed by the deaf, and in many corpus projects at least part of the annotators are deaf as well. The in-depth research is though often made by hearing linguists whose mother tongue is something else than a sign language. Many deaf people also find it hard to access the web pages presenting linguistic and corpus work.

To begin with, we need money to be able to employ more deaf people to the annotation work. It was suggested, that we start thinking about applying Nordic money for the annotation.

We need skilled people to do the corpus work. Deaf people may see themselves not to be suited. We should mentor them so that they get empowered. Working on a Nordic level would probably be one way to boost deaf people in different countries.

We should also think more long-term and encourage deaf people to educate themselves in linguistics, language technology etc.

It was also noted, that elicitating material on the basis of corpus, e.g. for a dictionary or a grammar book, is something that only the deaf signers can do.

Micellaneous notes

Could we add the etymology of the signs to the corpus? E.g MILK from Norway is used in SSL, HUMAN-BEING from FinSL used in Northern Sweden.

Challenges and collaboration: The deaf people who have moved back to Finland, how has their language been affected by Swedish Sign Language.

Child language corpora, ethical issues. Perhaps we later get technical solutions for anonymizising the signer.

Content of the corpuses. By themes (carpenting, sport stories etc), cultural issues, linguistic issues, things that interest general public, church, mathematical signing etc.

- The summary was written by Leena Savolainen -