Survey on Swedish Language Resources

Survey on Swedish Language Resources

Survey on Swedish Language Resources Kjell Elenius1 Eva Forsbom2 Beáta Megyesi2 1Speech, Music and Hearing School of Computer Science and Communication, KTH Sweden 2Department of Linguistics and Philology Uppsala University Sweden 2008-03-14 Survey on Swedish Language Resources ii Elenius, Forsbom and Megyesi Abstract Language resources, such as lexicons, databases, dictionaries, corpora, and tools to create and process these resources are necessary components in human language technology and natural language applications. In this survey, we describe the inventory process and the results of existing language resources for Swedish, and the need for Swedish language resources to be used in research and real-world applications in language technology as well as in linguistic research. The survey is based on an investigation sent to industry and academia, institutions and organizations, to experts involved in the development of Swedish language resources in Sweden, the Nordic countries and world-wide. This study is a result of the project called “An Infrastructure for Swedish language technology” supported by the Swedish Research Council´s Committee for Research Infrastructures 2007 - 2008. iii Survey on Swedish Language Resources Table of Contents 1. Introduction............................................................................................................................................1 2. Method ...................................................................................................................................................1 2. 1 Questionnaire .................................................................................................................................1 2. 2 Survey.............................................................................................................................................2 3. Results....................................................................................................................................................4 3.1 Information about the organizations ...............................................................................................4 3.1.1 Country of origin .....................................................................................................................5 3.1.2 Type of organization................................................................................................................5 3.1.3 No of employees......................................................................................................................6 3.1.4 Main activity............................................................................................................................6 3.1.5 Main language technology area...............................................................................................6 3.1.6 Main products/services............................................................................................................7 3.1.7 Language coverage..................................................................................................................8 4. Information on existing language resources ........................................................................................10 4.1 Written language resources ...........................................................................................................10 4.1.1 Genres....................................................................................................................................11 4.1.2 Annotation standards .............................................................................................................12 4.1.3 Resources for evaluation .......................................................................................................12 4.1.4 Tools for processing language data .......................................................................................12 4.2 Spoken language resources ...........................................................................................................13 4.2.1 Type of spoken resources ......................................................................................................14 4.2.2 Speakers.................................................................................................................................14 4.2.3 Bandwidths ............................................................................................................................14 4.2.4 Database Genres/Types .........................................................................................................14 4.2.5 Annotation standards .............................................................................................................15 4.2.6 Resources for evaluation .......................................................................................................15 4.2.7 Tools for processing speech data...........................................................................................15 4.3 Multimodal language resources ....................................................................................................15 4.3.1 Database Genres/Types .........................................................................................................16 4.3.2 Speakers.................................................................................................................................16 4.3.3 Annotation standards .............................................................................................................16 4.3.4 Resources for evaluation .......................................................................................................16 4.3.5 Tools for processing multimodal data ...................................................................................16 4.4 Other language resources ..............................................................................................................16 4.5 Production of language resources .................................................................................................16 4.6 Validation of language resources ..................................................................................................18 4.6.1 When producing language resources do you follow specific guidelines?.............................18 4.6.2 Do you follow specific standards?.........................................................................................18 4.6.3 Do you use specific gold standards/benchmarks? .................................................................19 4.6.4 Are your LRs validated?........................................................................................................19 5. Needs for language resources..........................................................................................................20 5.1 Needs for written language resources ...........................................................................................20 5.1.2 Needs for genres ....................................................................................................................22 5.1.3 Needs for annotation standards..............................................................................................23 5.1.4 Needs for evaluation resources..............................................................................................23 5.1.5 Needs for tools for processing written language ...................................................................24 5.2 Needs for spoken language resources ...........................................................................................25 5.2.1 Needs for speech types ..........................................................................................................25 5.2.2 Needs for speakers.................................................................................................................25 5.2.3 Needs for bandwidths ............................................................................................................25 5.2.4 Needs for databases/genres....................................................................................................26 iv Elenius, Forsbom and Megyesi 5.2.5 Needs for annotation standards..............................................................................................26 5.2.6 Needs for evaluation..............................................................................................................26 5.2.7 Needs for tools for processing speech data ...........................................................................26 5.3 Needs for multimodal language resources ....................................................................................27 5.3.1 Needs for database genres/types............................................................................................27 5.3.2 Needs for speakers.................................................................................................................27 5.3.3 Needs for annotation standards..............................................................................................27 5.3.4 Needs for evaluation..............................................................................................................27 5.3.5 Needs for tools for processing multimodal language data ....................................................27 5.4 Needs for other language resources ..............................................................................................28

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    90 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us