Knolwedge Acquisi$On from Music Digital Libraries
Total Page:16
File Type:pdf, Size:1020Kb
Knolwedge Acquision from Music Digital Libraries Sergio Oramas, Mohamed Sordo Mo0vaon • Musicological Knowledge is hidden between the lines • Machines don’t know how to read Why Knowledge Acquisi0on? • Obtain knowledge automacally • Make complex ques0ons • Visualize the informaon • Improve navigaon • Share knowledge 4 Musical Libraries Musical Libraries digital recording, scan Musical Libraries OCR, manual transcrip0on digital recording, scan Musical Libraries Informaon Extrac0on, seman0c annotaon OCR, manual transcrip0on digital recording, scan Musical Libraries Informaon Extrac0on, seman0c annotaon Current DL OCR, manual transcrip0on digital recording, scan Musical Libraries Web search Informaon Extrac0on, seman0c annotaon Current DL OCR, manual transcrip0on digital recording, scan Musical Libraries Web search Informaon Extrac0on, seman0c annotaon Current DL OCR, manual transcrip0on digital recording, scan Seman0c Web • The Semanc Web aims at conver0ng the current web, dominated by unstructured and semi-structured documents into a web of linked data. • Achievements useful for Digital Libraries – Common framework for data representaon and interconnec0on (RDF, ontologies) – Seman0c technologies to annotate texts (En0ty Linking) – Language for complex queries (SPARQL) Wikipedia and DBpedia - Digital Encyclopedia - Knowledge Base - Unstructured - Structured - Keyword search - Query search 13 14 15 DBpedia • Dbpedia example queries – Composers born in Vienna in XVIII Century – American jazz musicians that have wri\en songs recorded by RCA Records • Dbpedia graph applicaons – En0ty Relevance – En0ty Similarity – En0ty Recommendaon 16 Google Knowledge Graph 17 Music Digital Libraries • Current navigaon – Indexes – Keyword-based search • From searchable repositories to knowledge environments – Complex queries – Informaon visualizaon – Recommendaon – Data Analy0cs – Q&A Music Digital Libraries 19 Music Digital Libraries 20 Dataset: The New Grove • Encyclopedic dic0onary • One of the largest reference works in Western music • Ar0st biographies crawled from the Grove Music Online – 16,707 biographies (1st paragrahph) – From pre-medieval to contemporary Dataset: The New Grove • What are the most relevant music schools? • What are the ar0sts most similar to Schoenberg? • Which are the most represented roles in the Grove? • Is there a migraon tendency in ar0sts? To which ci0es? • What is the best city to die for a musician? 22 Dataset: The New Grove 23 Informaon Extrac0on Anton Webern (b Vienna, 3 Dec 1883; d Mi\ersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […] 24 Informaon Extrac0on Anton Webern (b Vienna, 3 Dec 1883; d Mi\ersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […] 25 Informaon Extrac0on Anton Webern (b Vienna, 3 Dec 1883; d Mi\ersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […] 26 Informaon Extrac0on Anton Webern (b Vienna, 3 Dec 1883; d Mi\ersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […] 27 Informaon Extrac0on: En0ty Linking Anton Webern (b Vienna, 3 Dec 1883; d Mi\ersill,15 Sept 1945). Austrian composer and conductor. Webern, who was probably Schoenberg's first private pupil, and Alban Berg, who came to him a few weeks later […] Boulez and Stockhausen and other integral serialists of the Darmstadt School […] Domain Knowledge Base 28 Informaon Extrac0on: Relaon Extrac0on • Webern, who was probably Schoenberg's first private pupil 29 Informaon Extrac0on: Relaon Extrac0on • Webern, who was probably Schoenberg's first private pupil Webern Schoenberg pupil_of 30 Methodology 31 Knowledge Graph 32 Knowledge Graph: Data Analy0cs • 16,707 biographies Role Amount • 434 roles composer 2618 teacher 1065 • Graph conductor 968 – 47,367 nodes pianist 704 – 274,333 edges organist 676 singer 404 violinist 285 … musicologist 144 cric 133 33 Birth Year Death Year 34 Birth Year Death Year 35 Knowledge Graph: Data Analy0cs Country Births Deaths Difference United States 2317 2094 -10% Italy 1616 1279 -21% Germany 1270 1292 2% France 991 1058 7% United Kingdom 882 877 -1% City Births Deaths Difference London 322 507 57% Paris 304 720 137% New York 266 501 88% Vienna 177 292 65% Rome 159 256 61% 36 Knowledge Graph: En0ty Relevance • City – Paris, London, Vienna, Rome, Venice, Berlin, Paris, New York • Venue – Covent Garden Theatre, King's Theatre, Drury Lane, Carnegie Hall, Théâtre de la Monnaie, Stad\heater, Theatre Royal, … • Educaonal Ins0tu0on – Paris Conservatoire, Moscow Conservatory, Juilliard School, St Petersburg Conservatory, Bmus, Prague Conservatory, Leipzig Conservatory, Vienna Hochschule für Musik, ... 37 Knowledge Graph: En0ty Relevance • Biography subject – Haydn, Claude Debussy, Arnold Schoenberg, Robert Stevenson, Paul Hindemith, Giovanni Pierluigi da Palestrina, Gustav Mahler, Maurice Ravel, Jean-Philippe Rameau – Mozart?, Bach?, Wagner? • Genre – chamber music, cappella, jazz, folk music, avant garde, baroque music, electronic music, musical theatre, plainchant 38 Knowledge Graph: En0ty Similarity • PageRank algorithm and Maximal Common Subgraph • Arnold Schoenberg: Anton Webern, Paul Hindemith, Gustav Mahler, Alban Berg, Claude Debussy • Guido Adler: Heinrich Jalowetz, Eusebius Mandyczewski, Robert Fuchs, Karl Weigl, Anton Wranitzky • Manuel de Falla: Ricardo Viñes, Juan Vicente Lecuna, Enrique Granados, Miguel Llobet Soles, Luigi Russolo • Miles Davis: Dizzy Gillespie, Herbie Hancock, Paul Chambers, Tony Williams, Cannonball Adderley 39 Relaons Graph: Informaon Visualizaon 40 Relaons Graph: Informaon Visualizaon 41 Mixing Different Data Sources • Creaon and curaon of a library from Web content • FlaBase: Flamenco Knowledge Base 42 Mixing Different Data Sources • Data Acquisi0on – APIs – Web crawling – SPARQL endpoints • En0ty Resolu0on – String similarity between labels – Graph similarity (context informaon) 43 FlaBase: Flamenco Knowledge Base 44 FlaBase: Flamenco Knowledge Base • Data gathered – 1,174 Ar0sts (text biography) – 76 Palos (flamenco genres) – 2,913 Albums – 14,078 Tracks – 771 Andalusian locaons • Knowledge Extracted – Place of birth – Date of birth – En0ty men0ons in text 45 FlaBase: Data Analy0cs • Number of ar0sts by year of birth 46 FlaBase: Data Analy0cs 47 FlaBase: Ar0st Relevance • Flamenco expert evaluaon Precision Values 48 Conclusions • Music Digital Libraries can benefit from seman0c approaches • Music Digital Libraries are s0ll in an early stage of development compared to the Web (Linked Open Data, Google Knowledge Graph) • Knowledge acquisi0on from Digital Libraries can help musicologists not only to search content, but also to discover new knowledge 49 Aknowledgments • This work was partly funded by the COFLA2 research project (Proyectos de Excelencia de la Junta de Andalucía, FEDER P12- TIC-1362). 50 Bibliography • Oramas S., Sordo M., Espinosa-Anke L., Serra X. (2015). A Seman,c-based approach for Ar,st Similarity. Internaonal Society for Music Informaon Retrieval Conference ISMIR 2015. In Press. • Oramas S., Gomez F., Gomez E., Mora J. (2015). FlaBase: Towards the crea,on of a Flamenco Music Knowledge Base. Internaonal Society for Music Informaon Retrieval Conference ISMIR 2015. In Press. • Sordo, M., Oramas S., & Espinosa-Anke L. (2015). Extracng Rela,ons from Unstructured Text Sources for Music Recommenda,on. Internaonal Conference on Applicaons of Natural Language to Informaon Systems NLDB 2015. • Oramas S., Sordo M., Espinosa-Anke L. (2015). A Rule-based Approach to Extracng Rela,ons from Music Tidbits. 2nd Workshop on Knowledge Extrac0on from Text at WWW 2015. • Oramas S., Sordo M., Serra X. (2014). Automa,c Crea,on of Knowledge Graphs from Digital Musical Document Libraries. Conference in Interdisciplinary Musicology CIM 2014. 51 Thanks! Knolwedge Acquisi0on from Music Digital Libraries Sergio Oramas, Mohamed Sordo [email protected] @sergiooramas .