Scientific paper Creation and Analysis of the Yugoslav Rock Song Lyrics Corpus from 1967 to 20031 UDC 811.163.41’322 DOI 10.18485/infotheca.2019.19.1.1 Ljudmila Petkovi´c ABSTRACT: The paper analyses the pro-
[email protected] cess of creation and processing of the Yu- University of Belgrade goslav rock song lyrics corpus from 1967 to Belgrade, Serbia 2003, from the theoretical and practical per- spective. The data have been obtained and XML-annotated using the Python program- ming language and the libraries lyricsmas- ter/yattag. The corpus has been preprocessed and basic statistical data have been gener- ated by the XSL transformation. The diacritic restoration has been carried out in the Slovo Majstor and LeXimir tools (the latter appli- cation has also been used for generating the frequency analysis). The extraction of socio- cultural topics has been performed using the Unitex software, whereas the prevailing top- ics have been visualised with the TreeCloud software. KEYWORDS: corpus linguistics, Yugoslav rock and roll, web scraping, natural language processing, text mining. PAPER SUBMITTED: 15 April 2019 PAPER ACCEPTED: 19 June 2019 1 This paper originates from the author’s Master’s thesis “Creation and Analysis of the Yugoslav Rock Song Lyrics Corpus from 1945 to 2003”, which was defended at the University of Belgrade on March 18, 2019. The thesis was conducted under the supervision of the Prof. Dr Ranka Stankovi´c,who contributed to the topic’s formulation, with the remark that the year of 1945 was replaced by the year of 1967 in this paper.