Knolwedge Acquisi on from
Music Digital Libraries
Sergio Oramas, Mohamed Sordo Mo va on
• Musicological Knowledge is hidden between the lines • Machines don’t know how to read Why Knowledge Acquisi on?
• Obtain knowledge automa cally • Make complex ques ons • Visualize the informa on • Improve naviga on • Share knowledge
4 Musical Libraries Musical Libraries
digital recording, scan Musical Libraries
OCR, manual transcrip on
digital recording, scan Musical Libraries
Informa on Extrac on, seman c annota on
OCR, manual transcrip on
digital recording, scan Musical Libraries
Informa on Extrac on, seman c annota on
Current DL OCR, manual transcrip on
digital recording, scan Musical Libraries
Web search
Informa on Extrac on, seman c annota on
Current DL OCR, manual transcrip on
digital recording, scan Musical Libraries
Web search
Informa on Extrac on, seman c annota on
Current DL OCR, manual transcrip on
digital recording, scan Seman c Web
• The Seman c Web aims at conver ng the current web, dominated by unstructured and semi-structured documents into a web of linked data.
• Achievements useful for Digital Libraries – Common framework for data representa on and interconnec on (RDF, ontologies) – Seman c technologies to annotate texts (En ty Linking) – Language for complex queries (SPARQL) Wikipedia and DBpedia
- Digital Encyclopedia - Knowledge Base - Unstructured - Structured - Keyword search - Query search
13 14 15 DBpedia
• Dbpedia example queries – Composers born in Vienna in XVIII Century – American jazz musicians that have wri en songs recorded by RCA Records • Dbpedia graph applica ons – En ty Relevance – En ty Similarity – En ty Recommenda on
17 Music Digital Libraries
• Current naviga on – Indexes – Keyword-based search
• From searchable repositories to knowledge environments – Complex queries – Informa on visualiza on – Recommenda on – Data Analy cs – Q&A Music Digital Libraries
19 Music Digital Libraries
20 Dataset: The New Grove
• Encyclopedic dic onary • One of the largest reference works in Western music • Ar st biographies crawled from the Grove Music Online – 16,707 biographies (1st paragrahph) – From pre-medieval to contemporary Dataset: The New Grove
• What are the most relevant music schools? • What are the ar sts most similar to Schoenberg? • Which are the most represented roles in the Grove? • Is there a migra on tendency in ar sts? To which ci es? • What is the best city to die for a musician?
22 Dataset: The New Grove
23 Informa on Extrac on
Anton Webern (b Vienna, 3 Dec 1883; d Mi ersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […]
24 Informa on Extrac on
Anton Webern (b Vienna, 3 Dec 1883; d Mi ersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […]
25 Informa on Extrac on
Anton Webern (b Vienna, 3 Dec 1883; d Mi ersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […]
26 Informa on Extrac on
Anton Webern (b Vienna, 3 Dec 1883; d Mi ersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […]
27 Informa on Extrac on: En ty Linking
Anton Webern (b Vienna, 3 Dec 1883; d Mi ersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […]
Domain Knowledge Base
28 Informa on Extrac on: Rela on Extrac on
• Webern, who was probably Schoenberg's first private pupil
29 Informa on Extrac on: Rela on Extrac on
• Webern, who was probably Schoenberg's first private pupil
Webern Schoenberg pupil_of
30 Methodology
31 Knowledge Graph
32 Knowledge Graph: Data Analy cs
• 16,707 biographies Role Amount • 434 roles composer 2618 teacher 1065 • Graph conductor 968 – 47,367 nodes pianist 704 – 274,333 edges organist 676 singer 404 violinist 285 … musicologist 144 cri c 133
33 Birth Year
Death Year
34 Birth Year
Death Year
35 Knowledge Graph: Data Analy cs
Country Births Deaths Difference United States 2317 2094 -10% Italy 1616 1279 -21% Germany 1270 1292 2% France 991 1058 7% United Kingdom 882 877 -1%
City Births Deaths Difference London 322 507 57% Paris 304 720 137% New York 266 501 88% Vienna 177 292 65% Rome 159 256 61%
36 Knowledge Graph: En ty Relevance
• City – Paris, London, Vienna, Rome, Venice, Berlin, Paris, New York • Venue – Covent Garden Theatre, King's Theatre, Drury Lane, Carnegie Hall, Théâtre de la Monnaie, Stad heater, Theatre Royal, … • Educa onal Ins tu on – Paris Conservatoire, Moscow Conservatory, Juilliard School, St Petersburg Conservatory, Bmus, Prague Conservatory, Leipzig Conservatory, Vienna Hochschule für Musik, ...
37 Knowledge Graph: En ty Relevance
• Biography subject – Haydn, Claude Debussy, Arnold Schoenberg, Robert Stevenson, Paul Hindemith, Giovanni Pierluigi da Palestrina, Gustav Mahler, Maurice Ravel, Jean-Philippe Rameau – Mozart?, Bach?, Wagner? • Genre – chamber music, cappella, jazz, folk music, avant garde, baroque music, electronic music, musical theatre, plainchant
38 Knowledge Graph: En ty Similarity
• PageRank algorithm and Maximal Common Subgraph
• Arnold Schoenberg: Anton Webern, Paul Hindemith, Gustav Mahler, Alban Berg, Claude Debussy • Guido Adler: Heinrich Jalowetz, Eusebius Mandyczewski, Robert Fuchs, Karl Weigl, Anton Wranitzky • Manuel de Falla: Ricardo Viñes, Juan Vicente Lecuna, Enrique Granados, Miguel Llobet Soles, Luigi Russolo • Miles Davis: Dizzy Gillespie, Herbie Hancock, Paul Chambers, Tony Williams, Cannonball Adderley
39 Rela ons Graph: Informa on Visualiza on
40 Rela ons Graph: Informa on Visualiza on
41 Mixing Different Data Sources
• Crea on and cura on of a library from Web content • FlaBase: Flamenco Knowledge Base
42 Mixing Different Data Sources
• Data Acquisi on – APIs – Web crawling – SPARQL endpoints
• En ty Resolu on – String similarity between labels – Graph similarity (context informa on)
43 FlaBase: Flamenco Knowledge Base
44 FlaBase: Flamenco Knowledge Base
• Data gathered – 1,174 Ar sts (text biography) – 76 Palos (flamenco genres) – 2,913 Albums – 14,078 Tracks – 771 Andalusian loca ons • Knowledge Extracted – Place of birth – Date of birth – En ty men ons in text
45 FlaBase: Data Analy cs
• Number of ar sts by year of birth
46 FlaBase: Data Analy cs
47 FlaBase: Ar st Relevance
• Flamenco expert evalua on
Precision Values
48 Conclusions
• Music Digital Libraries can benefit from seman c approaches
• Music Digital Libraries are s ll in an early stage of development compared to the Web (Linked Open Data, Google Knowledge Graph)
• Knowledge acquisi on from Digital Libraries can help musicologists not only to search content, but also to discover new knowledge
49 Aknowledgments
• This work was partly funded by the COFLA2 research project (Proyectos de Excelencia de la Junta de Andalucía, FEDER P12- TIC-1362).
50 Bibliography
• Oramas S., Sordo M., Espinosa-Anke L., Serra X. (2015). A Seman c-based approach for Ar st Similarity. Interna onal Society for Music Informa on Retrieval Conference ISMIR 2015. In Press. • Oramas S., Gomez F., Gomez E., Mora J. (2015). FlaBase: Towards the crea on of a Flamenco Music Knowledge Base. Interna onal Society for Music Informa on Retrieval Conference ISMIR 2015. In Press. • Sordo, M., Oramas S., & Espinosa-Anke L. (2015). Extrac ng Rela ons from Unstructured Text Sources for Music Recommenda on. Interna onal Conference on Applica ons of Natural Language to Informa on Systems NLDB 2015. • Oramas S., Sordo M., Espinosa-Anke L. (2015). A Rule-based Approach to Extrac ng Rela ons from Music Tidbits. 2nd Workshop on Knowledge Extrac on from Text at WWW 2015. • Oramas S., Sordo M., Serra X. (2014). Automa c Crea on of Knowledge Graphs from Digital Musical Document Libraries. Conference in Interdisciplinary Musicology CIM 2014.
51 Thanks!
Knolwedge Acquisi on from
Music Digital Libraries
Sergio Oramas, Mohamed Sordo
[email protected] @sergiooramas