
Survey on Swedish Language Resources Kjell Elenius1 Eva Forsbom2 Beáta Megyesi2 1Speech, Music and Hearing School of Computer Science and Communication, KTH Sweden 2Department of Linguistics and Philology Uppsala University Sweden 2008-03-14 Survey on Swedish Language Resources ii Elenius, Forsbom and Megyesi Abstract Language resources, such as lexicons, databases, dictionaries, corpora, and tools to create and process these resources are necessary components in human language technology and natural language applications. In this survey, we describe the inventory process and the results of existing language resources for Swedish, and the need for Swedish language resources to be used in research and real-world applications in language technology as well as in linguistic research. The survey is based on an investigation sent to industry and academia, institutions and organizations, to experts involved in the development of Swedish language resources in Sweden, the Nordic countries and world-wide. This study is a result of the project called “An Infrastructure for Swedish language technology” supported by the Swedish Research Council´s Committee for Research Infrastructures 2007 - 2008. iii Survey on Swedish Language Resources Table of Contents 1. Introduction............................................................................................................................................1 2. Method ...................................................................................................................................................1 2. 1 Questionnaire .................................................................................................................................1 2. 2 Survey.............................................................................................................................................2 3. Results....................................................................................................................................................4 3.1 Information about the organizations ...............................................................................................4 3.1.1 Country of origin .....................................................................................................................5 3.1.2 Type of organization................................................................................................................5 3.1.3 No of employees......................................................................................................................6 3.1.4 Main activity............................................................................................................................6 3.1.5 Main language technology area...............................................................................................6 3.1.6 Main products/services............................................................................................................7 3.1.7 Language coverage..................................................................................................................8 4. Information on existing language resources ........................................................................................10 4.1 Written language resources ...........................................................................................................10 4.1.1 Genres....................................................................................................................................11 4.1.2 Annotation standards .............................................................................................................12 4.1.3 Resources for evaluation .......................................................................................................12 4.1.4 Tools for processing language data .......................................................................................12 4.2 Spoken language resources ...........................................................................................................13 4.2.1 Type of spoken resources ......................................................................................................14 4.2.2 Speakers.................................................................................................................................14 4.2.3 Bandwidths ............................................................................................................................14 4.2.4 Database Genres/Types .........................................................................................................14 4.2.5 Annotation standards .............................................................................................................15 4.2.6 Resources for evaluation .......................................................................................................15 4.2.7 Tools for processing speech data...........................................................................................15 4.3 Multimodal language resources ....................................................................................................15 4.3.1 Database Genres/Types .........................................................................................................16 4.3.2 Speakers.................................................................................................................................16 4.3.3 Annotation standards .............................................................................................................16 4.3.4 Resources for evaluation .......................................................................................................16 4.3.5 Tools for processing multimodal data ...................................................................................16 4.4 Other language resources ..............................................................................................................16 4.5 Production of language resources .................................................................................................16 4.6 Validation of language resources ..................................................................................................18 4.6.1 When producing language resources do you follow specific guidelines?.............................18 4.6.2 Do you follow specific standards?.........................................................................................18 4.6.3 Do you use specific gold standards/benchmarks? .................................................................19 4.6.4 Are your LRs validated?........................................................................................................19 5. Needs for language resources..........................................................................................................20 5.1 Needs for written language resources ...........................................................................................20 5.1.2 Needs for genres ....................................................................................................................22 5.1.3 Needs for annotation standards..............................................................................................23 5.1.4 Needs for evaluation resources..............................................................................................23 5.1.5 Needs for tools for processing written language ...................................................................24 5.2 Needs for spoken language resources ...........................................................................................25 5.2.1 Needs for speech types ..........................................................................................................25 5.2.2 Needs for speakers.................................................................................................................25 5.2.3 Needs for bandwidths ............................................................................................................25 5.2.4 Needs for databases/genres....................................................................................................26 iv Elenius, Forsbom and Megyesi 5.2.5 Needs for annotation standards..............................................................................................26 5.2.6 Needs for evaluation..............................................................................................................26 5.2.7 Needs for tools for processing speech data ...........................................................................26 5.3 Needs for multimodal language resources ....................................................................................27 5.3.1 Needs for database genres/types............................................................................................27 5.3.2 Needs for speakers.................................................................................................................27 5.3.3 Needs for annotation standards..............................................................................................27 5.3.4 Needs for evaluation..............................................................................................................27 5.3.5 Needs for tools for processing multimodal language data ....................................................27 5.4 Needs for other language resources ..............................................................................................28
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages90 Page
-
File Size-