Optical Music Recognition for Dummies
Total Page:16
File Type:pdf, Size:1020Kb
Optical Music Recognition for Dummies A Gentle Introduction to Optical Music Recognition Watch on YouTube Image by Yngve Bakken Nilsen, Flickr Speakers Jorge Jan Alexander Ichiro Calvo-Zaragoza Hajič jr. Pacha Fujinaga Introduction The History of OMR Introduction to Optical Music Recognition Ichiro Fujinaga Music Technology Area, Schulich School of Music McGill University Introduction ❖ What is Optical Music Recognition (OMR)? ❖ Why is OMR important? ❖ History of OMR research ❖ Obstacles and challenges ISMIR 2018 Fujinaga 5 /45 Optical Music Recognition (OMR) A process of converting images of music scores into a symbolic computer representation, such as MIDI, MusicXML, or MEI (Music Encoding Initiative). OMR ISMIR 2018 Fujinaga 6 /45 Steps Involved in OMR Image Music Symbol Music Notation Preprocessing Recognition Reconstruction Final Binarization Staves Symbol Digitized Processing Combination Output Noise Removal Score Semantic Structural Symbol Assignment Analysis Segmentation (pitch, value) Musical Image Symbol Structure Segmentation Classification Reconstruction ISMIR 2018 Fujinaga 7 /45 Why is OMR important? ❖ Automatic playback ❖ Rearrangements ❖ Transpositions ❖ Change of mode (Major /Minor) ❖ Symbolic music analysis ❖ Searching ❖ Data mining ❖ Distant reading ❖ Reprints (for publishers) ❖ Braille output ❖ Score following ISMIR 2018 Fujinaga 8 /45 History of OMR Research: The Pioneers ❖ Dennis Pruslin ❖ Automatic recognition of sheet music (1966) Sc.D. Dissertation, MIT ❖ David Prerau ❖ Computer pattern recognition of standard engraved music notation (1970). Ph.D. Dissertation, MIT Denis PruslinDavid Prerau AKA The Tool Denis Pruslin with grandson Kevin, 2010/11 Baker House, c. 1959 ISMIR 2018 Fujinaga 9 /45 The first published digital scan of music (Prerau 1970) ISMIR 2018 Fujinaga 1 /45 0 1984: Wabot-2 https://www.scaruffi.com/mind/ai/wabot.jpg ISMIR 2018 Fujinaga 1 /45 1 OMR Theses ❖ 1966 Denis Pruslin (MIT) ❖ 1970 David Prerau (MIT) ❖ 1988 Ichiro Fujinaga (McGill: MA) ❖ 1989 Nicholas Carter (University of Surrey) ❖ 1996 Ichiro Fujinaga (McGill) ❖ 1996 Kia Ng (University of Leeds) ❖ 1996 Bertrand Coüasnon (Université de Rennes) ❖ 1997 David Bainbridge (University of Edinburgh) ❖ 2006 Laurent Pugin (Université de Genève) ❖ 2009 Alicia Fornés (Universitat Autònoma de Barcelona) ❖ 2012 Ana Rebelo (Universidade do Porto) ❖ 2014 Andrew Hankinson (McGill) ❖ 2016 Jorge Calvo Zaragoza (Universitat d’Alacant) ISMIR 2018 Fujinaga 1 /45 2 Scanner at U.S. Census Data (c.1960) ISMIR 2018 Fujinaga 13/45 Flatbed Scanner from Fujitsu (1983) ISMIR 2018 Fujinaga 14/45 How I did OMR without a scanner in 1983 1. Made a photocopy of a score to a transparency 2. Bought lots of graph papers 3. Found a large classroom (at night) 4. Taped the graph papers to the blackboard 5. Put the transparency of the score on a overhead projector 6. Projected the score onto the blackboard (with graph papers) 7. Manually blacked out the graph paper wherever there were black pixels 8. Punched cards using a run-length encoding method 9. Wrote an OMR program in Pascal and ran it on a mainframe computer ISMIR 2018 Fujinaga 15/45 Desktop scanners ❖ Datacopy 90 (1984): $9,945 (2018: $24,000) ❖ Datacopy 700 (1985): $3,950 (2018: $9,200) ❖ Datacopy 730 (1987): $1,800 (2018: $4,000) ❖ 200 DPI, 8-bit greyscale ❖ Canon LiDE 120 (2018): $60 ❖ 2400 DPI, 24-bit color ISMIR 2018 Fujinaga 16/45 ISMIR 2018 Fujinaga 17/45 ISMIR 2018 Fujinaga 18/45 Notes on Image File Sizes ❖ Scanner: 200 DPI (dots per inch) @ 8-bit/dot or 1 byte / pixel ❖ 8 1/2 x 11 paper (~A4): 8.5 x 200 x 11 x 200 x 1 byte= ~3.7MB ❖ Projector: 800x600 @ 24bit/pixel ❖ 480,000 x 3 bytes = ~1.4MB ❖ iPhone 6: 750 x 1334 @ 24bit/pixel ❖ 1,000,500 x 3 bytes = ~3 MB ❖ Color scanner: 600 DPI (dots per inch) @ 24-bit/pixel ❖ 8 1/2 x 11 paper (~A4): 8.5 x 600 x 11 x 600 x 3 byte= ~100MB ISMIR 2018 Fujinaga 19/45 Size of RAM on PCs in the 1980s ❖ IBM PC XT: 256KB (1985), 640KB (1986), 10MB hard disk ❖ Apple Macintosh: 512KB (1984), 1MB (1986), 20MB external hard disk ❖ Scanner: 200 DPI (dots per inch) @ 1 byte / pixel ❖ 8 1/2 x 11 paper (~A4): 8.5 x 200 x 11 x 200 x 1 byte= ~3.7MB ❖ 640,000/ (200DPI * 6 inches) = 533 pixels (~2.5 inch) ISMIR 2018 Fujinaga 20/45 2000: Gamera ❖ Framework for creation of structured document recognition system ❖ Designed for domain experts ❖ Image processing tools (filters, binarizations, etc.) ❖ Document segmentation and analysis ❖ Symbol segmentation and classification ❖ Portable, extensible, simple, open-source, GUI, batch Generalized Algorithms and Methods for Enhancement and Restoration of Archives ISMIR 2018 Fujinaga 21/45 2000: Gamera ISMIR 2018 Fujinaga 22/45 Birth of Gamera ❖ Christened Gamera on 1 April 2001 (17th Anniversary!) ❖ Based on Fujinaga’s Adaptive Optical Music Recognition algorithms ❖ First public mention of Gamera at the 1st Joint Conference on Digital Libraries (JCDL: June 2001) in Roanoke, VA ❖ First paper presented at the 2nd International Symposium on Music Information Retrieval (ISMIR: October 2001) in Bloomington, IN ISMIR 2018 Fujinaga 23/45 Early Gamera Screenshot (Linux) ca. June 2002 ISMIR 2018 Fujinaga 24/45 Some Features of Gamera c. 2008 ISMIR 2018 Fujinaga 25/45 Preprocessing Brightness Enhancement ISMIR 2018 Fujinaga 26/45 Preprocessing Binarization ISMIR 2018 Fujinaga 27/45 Staffline Removal ISMIR 2018 Fujinaga 28/45 Staffline Removal Four-line hand-drawn staff example ISMIR 2018 Fujinaga 29/45 Staffline Removal Difficult ISMIR 2018 Fujinaga 30/45 Staffline Removal Lute tablature ISMIR 2018 Fujinaga 31/45 Symbol classifier / Gamera ISMIR 2018 Fujinaga 32/45 Lute tablature symbol recognition ISMIR 2018 Fujinaga 33/45 2008: Other Applications Optical Recognition of Psaltic Byzantine Chant Notation Christoph Dalitz · Georgios K. Michalakis · Christine Pranzas ISMIR 2018 Fujinaga 34/45 2009: Other Applications Optical Recognition of Lute Tablature Christoph Dalitz · Thomas Karsten ISMIR 2018 Fujinaga 35/45 2002: Aruspix ❖ Developed by Laurent Pugin ❖ Specialized for typographic music ❖ Uses HMM (Hidden Markov Model) ❖ Does not remove staff lines ISMIR 2018 Fujinaga 36/45 ISMIR 2018 Fujinaga 37/45 2011: Liber Usualis Project Full-text search of 2,000 pages of Latin text and square notation ISMIR 2018 Fujinaga 38/45 Preprocessing: Aruspix ISMIR 2018 Fujinaga 40/45 Music recognition: Gamera ISMIR 2018 Fujinaga 41/45 Text recognition: Ocropus ISMIR 2018 Fujinaga 42/45 Pitch correction: Aruspix ISMIR 2018 Fujinaga 43/45 Web interface: Diva.js ISMIR 2018 Fujinaga 44/45 2012: Rodan Andrew Hankinson ❖ Remote Online Document Analysis Network (RODAN) ❖ Workflow management system for large-scale optical music recognition processes ❖ File management ❖ Cropping ❖ Noise removal ❖ Binarization ❖ Staff recognition and segmentation ❖ Symbol recognition ❖ Music reconstruction / Semantic assignment ❖ Corrections ❖ Build indices for searching ISMIR 2018 Fujinaga 45/45 Rodan: OMR Workflow Management System ISMIR 2018 Fujinaga 46/45 2016: Cantus Ultimus ISMIR 2018 Fujinaga 47/45 punctum Greyscale d d c dc cb c dedcd dfd edc Binarization C clef Border Removal Lyric Removal Staff Removal Shape Classification Music Reconstruction Shape/Image Alignment Conclusions ❖ What is Optical Music Recognition (OMR)? ❖ Why is OMR important? ❖ History of OMR research ISMIR 2018 Fujinaga 49/45 Challenges of OMR What Makes Music Notation Complicated? Lossy Knowledge Representation ● Any knowledge representation of the real world that the computer can handle has limits and inevitable bias ● MIDI representation very limited ● Common Western Music Notation (CWMN) is extremely complex ● The rules of CWMN are often violated ○ Minor violations and simplifications (e.g., omitted 3’s in tuplets) ○ Major violations (e.g., invalid meter) ○ See Donald Byrd’s Gallery for more examples “The Rite of Spring” by Igor Stravinsky Understanding the Complexity Graphical complexity (e.g, dense scores, artifacts from image capturing procedure, overlapping symbols, ambiguities) Structural complexity ● Inherently complicated content ● Syntactic and semantic rules ● Violations of these rules Nocturne (Op. 15, no. 2) by Frédéric Chopin ➢ Natural limit of OMR: where musicians disagree ➢ Building a rule-based system nearly impossible Complexity In Other Notation Systems ● Many historical and modern music notation systems ● Mensural notation: Interpretation of the same glyph depends on period of time From Wikipedia by Bitethesilverbullet Guqin Notation from Wikipedia From https://en.wikipedia.org/wiki/Mensural_notation Comparing OMR to OCR OMR is often compared to Optical Character Recognition (OCR): Wikipedia Image of Text OCR Text Die freie Enzyklopädie Image of Encoded OMR Music Scores Music Challenges of OMR The OMR domain The OMR Domain One of the key points for understanding OMR as a research field is its heterogeneity regarding the input domain and finding a suitable output representation. Icons by Freepik from Flaticon (Pack: Interaction) Input Signal Offline: Image of the sheet Online: Signal captured by an electronic pen Engraving Mechanism Typeset by a machine (“Printed”) Handwritten (typically over printed staff lines) Notational Type Common Western Preceding notations (mensural, Music Notation neumes, ...) Instrument-specific notation Others (Numbered, (tabulature, drums, ...) Yogyakarta, Surakarta, ...) Graphical Complexity Ideal conditions Camera-based scenario (blurring, skewing, 3D distortions, low resolution, ...) Degraded documents (heterogeneous background, inkblots, mold, ...) Structural Complexity Single-staff single-voice