DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2016 Named Entity Recognition for Search Queries in the Music Domain SANDRA LILJEQVIST KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Named Entity Recognition for Search Queries in the Music Domain SANDRA LILJEQVIST
[email protected] Master’s Thesis in Computer Science School of Computer Science and Communication KTH Royal Institute of Technology Supervisor: Olov Engwall Examiner: Viggo Kann Principal: Spotify AB Stockholm - June 2016 Abstract This thesis addresses the problem of named entity recogni- tion (NER) in music-related search queries. NER is the task of identifying keywords in text and classifying them into predefined categories. Previous work in the field has mainly focused on longer documents of editorial texts. However, in recent years, the application of NER for queries has attracted increased attention. This task is, however, ac- knowledged to be challenging due to queries being short, ungrammatical and containing minimal linguistic context. The usage of NER for queries is especially useful for the implementation of natural language queries in domain- specific search applications. These applications are often backed by a database, where the query format otherwise is restricted to keyword search or the usage of a formal query language. In this thesis, two techniques for NER for music-related queries are evaluated; a conditional random field based so- lution and a probabilistic solution based on context words. As a baseline, the most elementary implementation of NER, commonly applied on editorial text, is used. Both of the evaluated approaches outperform the baseline and demon- strate an overall F1 score of 79.2% and 63.4% respectively.