Knolwedge Acquision from

Music Digital Libraries

Sergio Oramas, Mohamed Sordo Movaon

• Musicological Knowledge is hidden between the lines • Machines don’t know how to read Why Knowledge Acquision?

• Obtain knowledge automacally • Make complex quesons • Visualize the informaon • Improve navigaon • Share knowledge

4 Musical Libraries Musical Libraries

digital recording, scan Musical Libraries

OCR, manual transcripon

digital recording, scan Musical Libraries

Informaon Extracon, semanc annotaon

OCR, manual transcripon

digital recording, scan Musical Libraries

Informaon Extracon, semanc annotaon

Current DL OCR, manual transcripon

digital recording, scan Musical Libraries

Web search

Informaon Extracon, semanc annotaon

Current DL OCR, manual transcripon

digital recording, scan Musical Libraries

Web search

Informaon Extracon, semanc annotaon

Current DL OCR, manual transcripon

digital recording, scan Semanc Web

• The Semanc Web aims at converng the current web, dominated by unstructured and semi-structured documents into a web of .

• Achievements useful for Digital Libraries – Common framework for data representaon and interconnecon (RDF, ontologies) – Semanc technologies to annotate texts (Enty Linking) – Language for complex queries (SPARQL) and DBpedia

- Digital Encyclopedia - - Unstructured - Structured - Keyword search - Query search

13 14 15 DBpedia

• Dbpedia example queries – Composers born in in XVIII Century – American jazz musicians that have wrien songs recorded by RCA Records • Dbpedia graph applicaons – Enty Relevance – Enty Similarity – Enty Recommendaon

16

17 Music Digital Libraries

• Current navigaon – Indexes – Keyword-based search

• From searchable repositories to knowledge environments – Complex queries – Informaon visualizaon – Recommendaon – Data Analycs – Q&A Music Digital Libraries

19 Music Digital Libraries

20 Dataset: The New Grove

• Encyclopedic diconary • One of the largest reference works in Western music • Arst biographies crawled from the Grove Music Online – 16,707 biographies (1st paragrahph) – From pre-medieval to contemporary Dataset: The New Grove

• What are the most relevant music schools? • What are the arsts most similar to Schoenberg? • Which are the most represented roles in the Grove? • Is there a migraon tendency in arsts? To which cies? • What is the best city to die for a musician?

22 Dataset: The New Grove

23 Informaon Extracon

Anton Webern (b Vienna, 3 Dec 1883; d Miersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and , who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […]

24 Informaon Extracon

Anton Webern (b Vienna, 3 Dec 1883; d Miersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […]

25 Informaon Extracon

Anton Webern (b Vienna, 3 Dec 1883; d Miersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […]

26 Informaon Extracon

Anton Webern (b Vienna, 3 Dec 1883; d Miersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […]

27 Informaon Extracon: Enty Linking

Anton Webern (b Vienna, 3 Dec 1883; d Miersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […]

Domain Knowledge Base

28 Informaon Extracon: Relaon Extracon

• Webern, who was probably Schoenberg's first private pupil

29 Informaon Extracon: Relaon Extracon

• Webern, who was probably Schoenberg's first private pupil

Webern Schoenberg pupil_of

30 Methodology

31 Knowledge Graph

32 Knowledge Graph: Data Analycs

• 16,707 biographies Role Amount • 434 roles composer 2618 teacher 1065 • Graph conductor 968 – 47,367 nodes pianist 704 – 274,333 edges organist 676 singer 404 violinist 285 … musicologist 144 cric 133

33 Birth Year

Death Year

34 Birth Year

Death Year

35 Knowledge Graph: Data Analycs

Country Births Deaths Difference 2317 2094 -10% Italy 1616 1279 -21% Germany 1270 1292 2% France 991 1058 7% United Kingdom 882 877 -1%

City Births Deaths Difference London 322 507 57% Paris 304 720 137% 266 501 88% Vienna 177 292 65% Rome 159 256 61%

36 Knowledge Graph: Enty Relevance

• City – Paris, London, Vienna, Rome, Venice, Berlin, Paris, New York • Venue – Covent Garden Theatre, King's Theatre, Drury Lane, Carnegie Hall, Théâtre de la Monnaie, Stadheater, Theatre Royal, … • Educaonal Instuon – Paris Conservatoire, Moscow Conservatory, Juilliard School, St Petersburg Conservatory, Bmus, Conservatory, Leipzig Conservatory, Vienna Hochschule für Musik, ...

37 Knowledge Graph: Enty Relevance

• Biography subject – Haydn, Claude Debussy, , Robert Stevenson, Paul Hindemith, Giovanni Pierluigi da Palestrina, , Maurice Ravel, Jean-Philippe Rameau – Mozart?, Bach?, Wagner? • Genre – chamber music, cappella, jazz, folk music, avant garde, baroque music, electronic music, musical theatre, plainchant

38 Knowledge Graph: Enty Similarity

• PageRank algorithm and Maximal Common Subgraph

• Arnold Schoenberg: Anton Webern, Paul Hindemith, Gustav Mahler, Alban Berg, Claude Debussy • : Heinrich Jalowetz, Eusebius Mandyczewski, , Karl Weigl, Anton Wranitzky • Manuel de Falla: Ricardo Viñes, Juan Vicente Lecuna, Enrique Granados, Miguel Llobet Soles, Luigi Russolo • Miles Davis: Dizzy Gillespie, Herbie Hancock, Paul Chambers, Tony Williams, Cannonball Adderley

39 Relaons Graph: Informaon Visualizaon

40 Relaons Graph: Informaon Visualizaon

41 Mixing Different Data Sources

• Creaon and curaon of a library from Web content • FlaBase: Flamenco Knowledge Base

42 Mixing Different Data Sources

• Data Acquision – APIs – Web crawling – SPARQL endpoints

• Enty Resoluon – String similarity between labels – Graph similarity (context informaon)

43 FlaBase: Flamenco Knowledge Base

44 FlaBase: Flamenco Knowledge Base

• Data gathered – 1,174 Arsts (text biography) – 76 Palos (flamenco genres) – 2,913 Albums – 14,078 Tracks – 771 Andalusian locaons • Knowledge Extracted – Place of birth – Date of birth – Enty menons in text

45 FlaBase: Data Analycs

• Number of arsts by year of birth

46 FlaBase: Data Analycs

47 FlaBase: Arst Relevance

• Flamenco expert evaluaon

Precision Values

48 Conclusions

• Music Digital Libraries can benefit from semanc approaches

• Music Digital Libraries are sll in an early stage of development compared to the Web (Linked Open Data, Google Knowledge Graph)

• Knowledge acquision from Digital Libraries can help musicologists not only to search content, but also to discover new knowledge

49 Aknowledgments

• This work was partly funded by the COFLA2 research project (Proyectos de Excelencia de la Junta de Andalucía, FEDER P12- TIC-1362).

50 Bibliography

• Oramas S., Sordo M., Espinosa-Anke L., Serra X. (2015). A Semanc-based approach for Arst Similarity. Internaonal Society for Music Informaon Retrieval Conference ISMIR 2015. In Press. • Oramas S., Gomez F., Gomez E., Mora J. (2015). FlaBase: Towards the creaon of a Flamenco Music Knowledge Base. Internaonal Society for Music Informaon Retrieval Conference ISMIR 2015. In Press. • Sordo, M., Oramas S., & Espinosa-Anke L. (2015). Extracng Relaons from Unstructured Text Sources for Music Recommendaon. Internaonal Conference on Applicaons of Natural Language to Informaon Systems NLDB 2015. • Oramas S., Sordo M., Espinosa-Anke L. (2015). A Rule-based Approach to Extracng Relaons from Music Tidbits. 2nd Workshop on Knowledge Extracon from Text at WWW 2015. • Oramas S., Sordo M., Serra X. (2014). Automac Creaon of Knowledge Graphs from Digital Musical Document Libraries. Conference in Interdisciplinary Musicology CIM 2014.

51 Thanks!

Knolwedge Acquision from

Music Digital Libraries

Sergio Oramas, Mohamed Sordo

[email protected] @sergiooramas