Survey on Swedish Language Resources

Total Page:16

File Type:pdf, Size:1020Kb

Survey on Swedish Language Resources Survey on Swedish Language Resources Kjell Elenius1 Eva Forsbom2 Beáta Megyesi2 1Speech, Music and Hearing School of Computer Science and Communication, KTH Sweden 2Department of Linguistics and Philology Uppsala University Sweden 2008-03-14 Survey on Swedish Language Resources ii Elenius, Forsbom and Megyesi Abstract Language resources, such as lexicons, databases, dictionaries, corpora, and tools to create and process these resources are necessary components in human language technology and natural language applications. In this survey, we describe the inventory process and the results of existing language resources for Swedish, and the need for Swedish language resources to be used in research and real-world applications in language technology as well as in linguistic research. The survey is based on an investigation sent to industry and academia, institutions and organizations, to experts involved in the development of Swedish language resources in Sweden, the Nordic countries and world-wide. This study is a result of the project called “An Infrastructure for Swedish language technology” supported by the Swedish Research Council´s Committee for Research Infrastructures 2007 - 2008. iii Survey on Swedish Language Resources Table of Contents 1. Introduction............................................................................................................................................1 2. Method ...................................................................................................................................................1 2. 1 Questionnaire .................................................................................................................................1 2. 2 Survey.............................................................................................................................................2 3. Results....................................................................................................................................................4 3.1 Information about the organizations ...............................................................................................4 3.1.1 Country of origin .....................................................................................................................5 3.1.2 Type of organization................................................................................................................5 3.1.3 No of employees......................................................................................................................6 3.1.4 Main activity............................................................................................................................6 3.1.5 Main language technology area...............................................................................................6 3.1.6 Main products/services............................................................................................................7 3.1.7 Language coverage..................................................................................................................8 4. Information on existing language resources ........................................................................................10 4.1 Written language resources ...........................................................................................................10 4.1.1 Genres....................................................................................................................................11 4.1.2 Annotation standards .............................................................................................................12 4.1.3 Resources for evaluation .......................................................................................................12 4.1.4 Tools for processing language data .......................................................................................12 4.2 Spoken language resources ...........................................................................................................13 4.2.1 Type of spoken resources ......................................................................................................14 4.2.2 Speakers.................................................................................................................................14 4.2.3 Bandwidths ............................................................................................................................14 4.2.4 Database Genres/Types .........................................................................................................14 4.2.5 Annotation standards .............................................................................................................15 4.2.6 Resources for evaluation .......................................................................................................15 4.2.7 Tools for processing speech data...........................................................................................15 4.3 Multimodal language resources ....................................................................................................15 4.3.1 Database Genres/Types .........................................................................................................16 4.3.2 Speakers.................................................................................................................................16 4.3.3 Annotation standards .............................................................................................................16 4.3.4 Resources for evaluation .......................................................................................................16 4.3.5 Tools for processing multimodal data ...................................................................................16 4.4 Other language resources ..............................................................................................................16 4.5 Production of language resources .................................................................................................16 4.6 Validation of language resources ..................................................................................................18 4.6.1 When producing language resources do you follow specific guidelines?.............................18 4.6.2 Do you follow specific standards?.........................................................................................18 4.6.3 Do you use specific gold standards/benchmarks? .................................................................19 4.6.4 Are your LRs validated?........................................................................................................19 5. Needs for language resources..........................................................................................................20 5.1 Needs for written language resources ...........................................................................................20 5.1.2 Needs for genres ....................................................................................................................22 5.1.3 Needs for annotation standards..............................................................................................23 5.1.4 Needs for evaluation resources..............................................................................................23 5.1.5 Needs for tools for processing written language ...................................................................24 5.2 Needs for spoken language resources ...........................................................................................25 5.2.1 Needs for speech types ..........................................................................................................25 5.2.2 Needs for speakers.................................................................................................................25 5.2.3 Needs for bandwidths ............................................................................................................25 5.2.4 Needs for databases/genres....................................................................................................26 iv Elenius, Forsbom and Megyesi 5.2.5 Needs for annotation standards..............................................................................................26 5.2.6 Needs for evaluation..............................................................................................................26 5.2.7 Needs for tools for processing speech data ...........................................................................26 5.3 Needs for multimodal language resources ....................................................................................27 5.3.1 Needs for database genres/types............................................................................................27 5.3.2 Needs for speakers.................................................................................................................27 5.3.3 Needs for annotation standards..............................................................................................27 5.3.4 Needs for evaluation..............................................................................................................27 5.3.5 Needs for tools for processing multimodal language data ....................................................27 5.4 Needs for other language resources ..............................................................................................28
Recommended publications
  • Is Spoken Danish Less Intelligible Than Swedish? Charlotte Gooskens, Vincent J
    Is spoken Danish less intelligible than Swedish? Charlotte Gooskens, Vincent J. van Heuven, Renée van Bezooijen, Jos J.A. Pacilly To cite this version: Charlotte Gooskens, Vincent J. van Heuven, Renée van Bezooijen, Jos J.A. Pacilly. Is spoken Danish less intelligible than Swedish?. Speech Communication, Elsevier : North-Holland, 2010, 52 (11-12), pp.1022. 10.1016/j.specom.2010.06.005. hal-00698848 HAL Id: hal-00698848 https://hal.archives-ouvertes.fr/hal-00698848 Submitted on 18 May 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Accepted Manuscript Is spoken Danish less intelligible than Swedish? Charlotte Gooskens, Vincent J. van Heuven, Renée van Bezooijen, Jos J.A. Pacilly PII: S0167-6393(10)00109-3 DOI: 10.1016/j.specom.2010.06.005 Reference: SPECOM 1901 To appear in: Speech Communication Received Date: 3 August 2009 Revised Date: 31 May 2010 Accepted Date: 11 June 2010 Please cite this article as: Gooskens, C., van Heuven, V.J., van Bezooijen, R., Pacilly, J.J.A., Is spoken Danish less intelligible than Swedish?, Speech Communication (2010), doi: 10.1016/j.specom.2010.06.005 This is a PDF file of an unedited manuscript that has been accepted for publication.
    [Show full text]
  • Family Language Policy in Bilingual Finnish and Swedish Families in Finland
    FAMILY LANGUAGE POLICY IN BILINGUAL FINNISH AND SWEDISH FAMILIES IN FINLAND Austin Huhta Master’s Thesis Applied Linguistics Department of Language and Communication Studies University of Jyväskylä Fall 2020 UNIVERSITY OF JYVÄSKYLÄ Faculty Department Humanities and Social Sciences Department of Language and Communication Studies Author Austin Huhta Title Family Language Policy in Bilingual Finnish and Swedish Families in Finland Subject Level Applied Language Studies Master’s Thesis Month and year Number of pages December 2020 30 Abstract In Finland families are only allowed to choose one language for their child to be the child’s L1 even if the family is bilingual. With both Finnish and Swedish being national languages of Finland this thesis looked into which language a family chose, why they chose it, and how they helped their child maintain it. Looking at their perspective on this can allow us to get further insight into family language policy in Finland. The research method used here is a case study, with semi-structured interviews for the data collection and interpretive phenomenological analysis for the data analysis. This thesis interviewed a bilingual family with a Finn and a Swedish Swede and their one child. It found that while their initial language choice was Swedish, that their family language policy was dynamic. Over time the child was switched from Swedish medium education to Finnish medium education; however, at home multiple family language policies worked together to help maintain his Swedish language skills. The findings demonstrated that the right combination of family language policies and more formal educational settings can work together to help children grow up to be bilingual even if the minority language is mainly used at home.
    [Show full text]
  • Baek, Chung & Chang Zhang (University of Warwick) Building
    Baek, Chung & Chang Zhang (University of Warwick) Building solidarity with constructive journalism: Mediating climate-induced migration With the increasing frequency and severity of extreme environmental events, “climate-induced migration” (CIM) is gaining increasing tractions in public discourse. In particular, mass media play a key role in shaping the global imagination about climate change and the mobility it drives. However, studies have found that the traditional media tend to frame the CIM in a decontextualised, racialised and feminised manner, which generates detached compassion or aversion instead of action-mobilising solidarity. Thus, this paper aims to interrogate what kind of journalism promotes social justice and solidarity for those displaced by climate change, and then what types of media contribute to producing more socially relevant journalism for CIM. Drawing on different journalism theories such as interpretive, constructive and ethical, which mainly promote positive, solution-based and humanity-sensitive journalism, the authors build up a theoretical framework, responsible journalism to capture the ethical mediation of CIM. The framework consists of six pillars: contextualization; identification of stakeholders; providing solution; action-mobilising solidarity; deconstruction of ethnic, gender and postcolonial hierarchy; vocalization. This paper will contribute to building a framework that delivers fairer flow of information reducing bias situated in CIM discourses and that is migration-sensitive and leads solidarity, building friendly environment to understand CIM and further promoting social responsibility for climate change. Belghazi, Marwa On being a bridge: For a multi-lingual and floating support to newly arrived refugees I have been coordinating support projects related to the refugee resettlement scheme for the past four years in different NGOs in London.
    [Show full text]
  • F R Id T Jo F Nansens in St It U
    VOLOS -R" RECEIVED WO* 2 9 899 oari Olav Schram Stokke Subregional Cooperation and Protection of the Arctic Marine Environment: The Barents Sea INSTITUTT POLOS Report No. 5/1997 NANSENS Polar Oceans Reports FRIDTJOF 3 1-05 FRIDTJOF NANSENS INSTITUTE THE FRIDTJOF NANSEN INSTITUTE Olav Schram Stokke Subregional Cooperation and Protection of the Arctic Marine Environment: The Barents Sea POLOS Report No. 5/1997 ISBN 82-7613-235-9 ISSN 0808-3622 ---------- Polar Oceans Reports a publication series from Polar Oceans and the Law of the Sea Project (POLOS) Fridtjof Nansens vei 17, Postboks 326, N-1324 Lysaker, Norway Tel: 67111900 Fax: 67111910 E-mail: [email protected] Bankgiro: 6222.05.06741 Postgiro: 5 08 36 47 © The Fridtjof Nansen Institute Published by The Fridtjof Nansen Institute DISCLAIMER Portions of this document may be illegible in electronic image products. Images are produced from the best available original document. Polar Oceans and the Law of the Sea Project (POLOS) POLOS is a three-year (1996-98) international research project in international law and international relations, initiated and coordinated by the Fridtjof Nansen Institute (FNI). The main focus of POLOS is the changing conditions in the contemporary international community influencing the Arctic and the Antarctic. The primary aim of the project is to analyze global and regional solutions in the law of the sea and ocean policy as these relate to the Arctic and Southern Oceans, as well as to explore the possible mutual relevance of the regional polar solutions, taking into consideration both similarities and differences of the two polar regions.
    [Show full text]
  • THE SWEDISH LANGUAGE Sharingsweden.Se PHOTO: CECILIA LARSSON LANTZ/IMAGEBANK.SWEDEN.SE
    FACTS ABOUT SWEDEN / THE SWEDISH LANGUAGE sharingsweden.se PHOTO: CECILIA LARSSON LANTZ/IMAGEBANK.SWEDEN.SE PHOTO: THE SWEDISH LANGUAGE Sweden is a multilingual country. However, Swedish is and has always been the majority language and the country’s main language. Here, Catharina Grünbaum paints a picture of the language from Viking times to the present day: its development, its peculiarities and its status. The national language of Sweden is Despite the dominant status of Swedish, Swedish and related languages Swedish. It is the mother tongue of Sweden is not a monolingual country. Swedish is a Nordic language, a Ger- approximately 8 million of the country’s The Sami in the north have always been manic branch of the Indo-European total population of almost 10 million. a domestic minority, and the country language tree. Danish and Norwegian Swedish is also spoken by around has had a Finnish-speaking population are its siblings, while the other Nordic 300,000 Finland Swedes, 25,000 of ever since the Middle Ages. Finnish languages, Icelandic and Faroese, are whom live on the Swedish-speaking and Meänkieli (a Finnish dialect spoken more like half-siblings that have pre- Åland islands. in the Torne river valley in northern served more of their original features. Swedish is one of the two national Sweden), spoken by a total of approxi- Using this approach, English and languages of Finland, along with Finnish, mately 250,000 people in Sweden, German are almost cousins. for historical reasons. Finland was part and Sami all have legal status as The relationship with other Indo- of Sweden until 1809.
    [Show full text]
  • The Racist Legacy in Modern Swedish Saami Policy1
    THE RACIST LEGACY IN MODERN SWEDISH SAAMI POLICY1 Roger Kvist Department of Saami Studies Umeå University S-901 87 Umeå Sweden Abstract/Resume The Swedish national state (1548-1846) did not treat the Saami any differently than the population at large. The Swedish nation state (1846- 1971) in practice created a system of institutionalized racism towards the nomadic Saami. Saami organizations managed to force the Swedish welfare state to adopt a policy of ethnic tolerance beginning in 1971. The earlier racist policy, however, left a strong anti-Saami rights legacy among the non-Saami population of the North. The increasing willingness of both the left and the right of Swedish political life to take advantage of this racist legacy, makes it unlikely that Saami self-determination will be realized within the foreseeable future. L'état suédois national (1548-1846) n'a pas traité les Saami d'une manière différente de la population générale. L'Etat de la nation suédoise (1846- 1971) a créé en pratique un système de racisme institutionnalisé vers les Saami nomades. Les organisations saamies ont réussi à obliger l'Etat- providence suédois à adopter une politique de tolérance ethnique à partir de 1971. Pourtant, la politique précédente de racisme a fait un legs fort des droits anti-saamis parmi la population non-saamie du nord. En con- séquence de l'empressement croissant de la gauche et de la droite de la vie politique suédoise de profiter de ce legs raciste, il est peu probable que l'autodétermination soit atteinte dans un avenir prévisible. 204 Roger Kvist Introduction In 1981 the Supreme Court of Sweden stated that the Saami right to reindeer herding, and adjacent rights to hunting and fishing, was a form of private property.
    [Show full text]
  • Role Language in Swedish a Study of the Comic Åsa Nisse
    Role Language in Swedish A Study of the Comic Åsa Nisse Marcus Zanteré 2016 1 Contents 1. Introduction .............................................................................................................................. 3 1.1. Purpose .............................................................................................................................. 3 1.2. Theory ............................................................................................................................... 3 2. Method and Material ................................................................................................................... 4 2.1. Introduction of Primary Source ............................................................................................ 5 3. Results ......................................................................................................................................... 5 3.1. Laughing in Swedish ............................................................................................................ 5 3.2. Eye Dialect ............................................................................................................................ 7 3.3. Dialects ................................................................................................................................. 9 4. Discussion and Future Studies ..................................................................................................... 9 5. Conclusion ................................................................................................................................
    [Show full text]
  • Report of Sweden
    GEGN.2/2019/39/CRP.39 18 March 2019 English United Nations Group of Experts On Geographical Names 2019 session New York, 29 April – 3 May 2019 Item 5 (a) of the agenda * Reports by Governments on the situation in their countries and on the progress made in the standardization of geographical names Report of Sweden Submitted by Sweden** Summary: The national report of Sweden is divided into six sections. The first, on national standardization, provides a short overview of current legislation and of the main authorities involved in the standardization of geographical names. The second, on names in multilingual areas, contains information on minority language names and the responsible authorities. The third focuses on two ongoing committee reports concerning the Sami-speaking minority in the north of Sweden. The fourth includes information on an English online version of a booklet (published in Swedish in 2001 and revised in 2016) on good place-name practice. The fifth provides an updated presentation of two Swedish working groups – the Place-Name Advisory Board and the Geographical Names Network – that provide information and advice to different stakeholders. The sixth section contains a description of two research projects involving field collection of place names on the island of Öland and in the city of Uppsala, a rural and an urban landscape, respectively. The following resolutions adopted at the United Nations Conferences on the Standardization of Geographical Names are particularly relevant to the present work on name standardization in Sweden: • 1972: resolution II/36 (E/CONF.61/4) on problems of minority languages • 2002: resolution VIII/9 (E/CONF.94/3) on geographical names as cultural heritage • 2007: resolution IX/4 (E/CONF.98/136) on geographical names as intangible cultural heritage • 2012: resolution X/4 (E/CONF.101/144) on discouraging the commercialization of geographical names.
    [Show full text]
  • University of Groningen an Acoustic Analysis of Vowel Pronunciation In
    University of Groningen An Acoustic Analysis of Vowel Pronunciation in Swedish Dialects Leinonen, Therese IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2010 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Leinonen, T. (2010). An Acoustic Analysis of Vowel Pronunciation in Swedish Dialects. s.n. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). The publication may also be distributed here under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license. More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverne- amendment. Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 01-10-2021 Chapter 2 Background In this chapter the linguistic and theoretical background for the thesis is presented.
    [Show full text]
  • Authentic Language
    ! " " #$% " $&'( ')*&& + + ,'-* # . / 0 1 *# $& " * # " " " * 2 *3 " 4 *# 4 55 5 * " " * *6 " " 77 .'%%)8'9:&0 * 7 4 "; 7 * *6 *# 2 .* * 0* " *6 1 " " *6 *# " *3 " *# " " *# 2 " " *! "; 4* $&'( <==* "* = >?<"< <<'-:@-$ 6 A9(%9'(@-99-@( 6 A9(%9'(@-99-(- 6A'-&&:9$' ! '&@9' Authentic Language Övdalsk, metapragmatic exchange and the margins of Sweden’s linguistic market David Karlander Centre for Research on Bilingualism Stockholm University Doctoral dissertation, 2017 Centre for Research on Bilingualism Stockholm University Copyright © David Budyński Karlander Printed and bound by Universitetsservice AB, Stockholm Correspondence: SE 106 91 Stockholm www.biling.su.se ISBN 978-91-7649-946-7 ISSN 1400-5921 Acknowledgements It would not have been possible to complete this work without the support and encouragement from a number of people. I owe them all my humble thanks.
    [Show full text]
  • Sweden PR Country Landscape 2013
    Sweden PR Country Landscape 2013 Global Alliance for Public Relations and Communication Management ● ● ● ● Acknowledgments Authors: Asia Holt, Ariel Sierra, Denise Vaughn, Eliza Winston, and Vanessa Copeland Students in the Strategic Public Relations master’s degree program at Virginia Commonwealth University Supervised and guided by: Dr. Judy VanSlyke Turk, Ph.D., APR, Fellow PRSA, Professor Read and signed off by Jeanette Agnrud, July 12, 2013 Swedish Association of Communication Professionals Read and approved by: Juan-Carlos Molleda, Ph.D., Project Coordinator and Professor, University of Florida Date of completion: July 2013 Overview The public relations industry in Sweden has grown significantly in the past two decades. The country’s technology-heavy industries and tech-savvy culture have led Swedish PR agencies to quickly adopt online tools in campaigns for clients. And that has, in turn, enabled Swedish PR practitioners to move ahead of their European competitors in online sophistication. In fact, Sweden is so interested in social media and building an online community that the country’s Twitter feed was lauded in a New York Times article. http://www.nytimes.com/2012/06/11/world/europe/many-voices-of-sweden-via-twitter.html?_r=1 A different citizen, ranging from 16 to 60 years old, manages Sweden’s official Twitter handle, @Sweden, each week. The program, called Curators of Sweden, was created to present the country to the rest of the world on Twitter. It shows how Swedish citizens embrace technology and online communities, using them to reach out to all parts of the globe. However, like many people who practice public relations today, Swedish practitioners struggle to find the best ways to measure the impact of their work.
    [Show full text]
  • CV Oscar Kjell Phd in Psychology, Department of Psychology, Lund University ______
    CV Oscar Kjell PhD in Psychology, Department of Psychology, Lund University _________________________________________________________________________________ Personal information Nationality Swedish Date of birth November 28, 1982 Phone +46 70 8803292 Email [email protected] Website www.oscarkjell.se; www.r-text.org Twitter @OscarKjell Address Department of Psychology, Box 117, 221 00 Lund University, Sweden I develop and validate ways to measure psychological constructs using psychometrics and artificial intelligence (AI) such as natural language processing and machine learning. I am the developer of the r-package text (www.r-text.org), which provides powerful functions tailored to analyse text data for testing research hypotheses in social and behavior sciences. Using AI, as compared with traditional rating scales, we can (more) accurately measure psychological constructs; differentiate between similar constructs (such as worry and depression); as well as depict the content individuals use to describe their states of mind. I am, for example, applying these methods to better understand subjective well-being and to compare harmony in life versus satisfaction with life. I am particularly interested in sustainable well-being; how pursuing different types of well-being might bring about different social and environmental consequences. I am a co-founder of WordDiagnostics, where we apply these AI methods in a decision support system for diagnosing psychiatric disorders. Education 2012 – 2018 PhD in Psychology (Lund University, Sweden)
    [Show full text]