Computational Auditory Scene Induction

Total Page:16

File Type:pdf, Size:1020Kb

Computational Auditory Scene Induction NORTHWESTERN UNIVERSITY Computational Auditory Scene Induction A DISSERTATION SUBMITTED TO THE GRADUATE SCHOOL AND THE DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE OF NORTHWESTERN UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for the degree DOCTOR OF PHILOSOPHY Field of Computer Science By Jinyu Han EVANSTON, ILLINOIS August 2012 2 c Copyright by Jinyu Han 2012 All Rights Reserved 3 ABSTRACT Computational Auditory Scene Induction Jinyu Han Real world sound is a mixture of different sources. The sound scene of a busy cof- feehouse, for example, usually consists of several conversations, music playing, laughter and maybe a baby crying, the door being slammed, different machines operating in the background and more. When humans are confronted with these sounds, they rapidly and automatically adjust themselves in this complex sound environment, paying attention to the sound source of interest. This ability has been labeled in psychoacoustics under the name of Auditory Scene Analysis (ASA). The counterpart to ASA in machine listening is called Computational Auditory Scene Analysis (CASA) - the efforts to build computer models to perform auditory scene anal- ysis. Research on CASA has led to great advancement in machine systems capable of analyzing complex sound scene, such as audio source separation and multiple pitch esti- mation. Such systems often fail to perform in presence of corrupted or incomplete sound scenes. In a real world sound scene, different sounds overlap in time and frequency, in- terfering with and canceling each other. Sometimes, the sound of interest may have some 4 critical information totally missing, examples including an old recording from a scratched CD or a band-limited telephone speech signal. In the real world filled with incomplete sounds, the human auditory system has the ability, known as Auditory Scene Induction (ASI), to estimate the missing parts of a continuous auditory scene briefly covered by noise or other interferences, and perceptually resynthesize them. Since human is able to infer the missing elements in an auditory scene, it is important for machine systems to have the same function. However, there are very few efforts in computer audition to computationally realize this ability. This thesis focuses on the computational realization of auditory scene induction - Com- putational Auditory Scene Induction (CASI). More specifically, the goal of my research is to build computer models that are capable of resynthesizing the missing information of an audio scene. Building upon existing statistical models (NMF, PLCA, HMM and N-HMM) for audio representation, I will formulate this ability as a model-based spectro- gram analysis and inference problem under the expectation{maximization (EM) frame- work with missing data in the observation. Various sources of information, including the spectral and temporal structure of audio, and the top-down knowledge about speech are incorporated into the proposed models to produce accurate reconstruction of the missing information in an audio scene. The effectiveness of these proposed machine systems are demonstrated on three audio signal processing tasks: singing melody extraction, audio imputation and audio bandwidth expansion. Each system is assessed through experiments on real world audio data and compared to the state-of-art. Although far from perfect, the proposed systems have shown many advantages and significant improvement over the existing systems. In addition, this thesis has shown that different applications related to 5 missing audio data can be considered under the unified framework of CASI. This opened a new avenue of research in the Computer Audition community. 6 Acknowledgements First and foremost, I would like to thank my advisor, Professor Bryan Pardo, for creating the group in which I was able to do this work, for inviting me to join his lit- tle ensemble back in 2007, and for supporting me since then. Bryan opened the door for me to a whole new world of knowledge and practice. Without his unabated trust, and unwavering commitment to providing me a creative and protected environment, this work would not have been accomplished. His passion for scientific exploration and his philosophy of research will continue to inspire me in the future. I owe an immense amount of gratitude to Gautham J. Mysore, who has been an excellent mentor and collaborator over the last year. He sets an example as scholar and taught me the qualities a researcher should possess, for which I am particularly grateful. He has taught me a great deal about research from general approaches to problem solving to specifics about machine learning and signal processing. Special thanks go to my thesis readers, Jorge Nocedal and Thrasyvoulos N. Pappas for serving on my dissertation committee and for providing valuable feedback on this dissertation. Their insightful reading and suggestions of my original proposal have greatly improved the final work. I thank Professor Thrasyvoulos N. Pappas for his enjoyable class on Digital Signal Processing which built the foundations of my thesis work. I thank Professor Jorge Nocedal for his excellent lectures from which I learned a great deal about 7 optimization and machine learning. I am also grateful to Professor Doug Downey for participating in my PhD qualify exam. I would like to thank all of the members at the Media Technology Lab, Gracenote. I am extremely grateful to Markus Cremer and Bob Coovor for their inspiration and encouragement in research and my personal life. Special thanks go to Ching-Wei Chen, with whom the collaboration has been a great joy. I would like to thank my wonderful former and present labmates who make the Interac- tive Audio (IA) Lab a pleasant place to work. Particular honors go to Zhiyao Duan, Zafar Rafii, Mark Cartwright, David Little, and Michael Skalak, with whom I have had par- ticularly enlightening discussions and fruitful collaborations. Without John Woodruff's foundational work, my research would have been much more difficult. Many thanks also go to Arefin Huq, Rui Jiang, Sara Laupp, Anda Bereczky and Dominik Kaeser for making my time at the IA Lab particularly enjoyable. I would like to thank Prof. Yuan Dong for giving me my first opportunity to conduct research and encouraging me to pursue graduate study. It was at his lab at the Orange Labs (Beijing), France Telecom, that I discovered my love and passion for audio related research. I would also like to acknowledge the financial support provided to me through two NSF grants (IIS-0812314 and IIS-0643752). 8 I dedicate this thesis to Jiayi Han, Feng Li and Jin Xu 9 Table of Contents ABSTRACT 3 Acknowledgements 6 List of Tables 11 List of Figures 13 Chapter 1. Introduction 19 1.1. Contribution 21 1.2. Outline 26 1.3. Structure in Audio 28 1.4. Auditory Scene Analysis and Induction 31 1.5. Motivation 34 Chapter 2. Singing Melody Extraction 40 2.1. Related work 42 2.2. Modeling of Audio 47 2.3. System description 54 2.4. Illustrative example 59 2.5. Experiment 62 2.6. Contributions and Conclusion 65 10 Chapter 3. Audio Imputation 69 3.1. Related work 71 3.2. Non-negative Hidden Markov Model 80 3.3. Audio Imputation by Non-negative Spectrogram Factorization 96 3.4. System description 99 3.5. Experiment 107 3.6. Contribution and Conclusion 117 Chapter 4. Language Informed Audio Bandwidth Expansion 119 4.1. Related work 122 4.2. System Overivew 126 4.3. Word Models 127 4.4. Speaker Level Model 129 4.5. Estimation of incomplete data 131 4.6. Experimental results 135 4.7. Contribution and Conclusion 142 Chapter 5. Conclusion and Future Research 153 5.1. Future Directions 154 References 158 11 List of Tables 2.1 The expectation{maximization (EM) algorithm of PLCA learning 51 2.2 Performance comparison of the proposed algorithm against DHP and LW, averaged across 9 songs of 270 seconds from the MIREX melody extraction dataset. 63 3.1 The parameters of the Non-negative Hidden Markov Model. These parameters can be estimated using Expectation-Maximization algorithm. q and z range over the sets of spectral component indices and dictionary indices respectively. f ranges over the set of analysis frequencies in the FFT. 90 3.2 The generative process of an audio spectrogram using N-HMM. 91 3.3 The EM process of N-HMM Learning 95 3.4 Algorithm I for Audio Imputation 104 3.5 Algorithm II for Audio Bandwidth Expansion 106 3.6 Audio excepts dataset used for Evaluations 112 3.7 Performances of the Audio Imputation results by the proposed Algorithm I and PLCA. There is no statistical difference at a significant level 0:05 between the two methods with a p-value 0:76. 115 12 3.8 Performances of the Audio Bandwidth Expansion results by the proposed Algorithm II and PLCA. There is statistical difference at a significant level 0:05 between the two methods with a p-value 0:01 116 4.1 Algorithm III for Language Informed Speech Bandwidth Expansion 134 4.2 Scale of Mean Opinion Score used by the objective measure OVRL. 137 4.3 Performances of audio BWE results by the proposed method and PLCA in Con-A . Numbers in bold font indicate the difference between the proposed and PLCA is statistically significant by a student t-test at 5% significance level. 140 4.4 Performances of audio BWE results by the proposed method and PLCA in Con-B . Numbers in bold font indicate the difference between the proposed and PLCA is statistically significant by a student t-test at 5% significance level. 140 13 List of Figures 1.1 Illustration of the (a) waveform and (b) spectrogram of an audio clip of a male speaker saying, \She had your dark suit in greasy wash water all year". The level of the signal at a given time-frequency bin is indicated by a color value as explained in the (c) colorbar.
Recommended publications
  • Automatic Music Transcription: an Overview Emmanouil Benetos Member, IEEE, Simon Dixon, Zhiyao Duan Member, IEEE, and Sebastian Ewert Member, IEEE
    1 Automatic Music Transcription: An Overview Emmanouil Benetos Member, IEEE, Simon Dixon, Zhiyao Duan Member, IEEE, and Sebastian Ewert Member, IEEE I. INTRODUCTION IV-F, as well as methods for transcribing specific sources The capability of transcribing music audio into music within a polyphonic mixture such as melody and bass line. notation is a fascinating example of human intelligence. It involves perception (analyzing complex auditory scenes), cog- A. Applications & Impact nition (recognizing musical objects), knowledge representation A successful AMT system would enable a broad range (forming musical structures) and inference (testing alternative of interactions between people and music, including music hypotheses). Automatic Music Transcription (AMT), i.e., the education (e.g., through systems for automatic instrument design of computational algorithms to convert acoustic music tutoring), music creation (e.g., dictating improvised musical signals into some form of music notation, is a challenging task ideas and automatic music accompaniment), music production in signal processing and artificial intelligence. It comprises (e.g., music content visualization and intelligent content-based several subtasks, including (multi-)pitch estimation, onset and editing), music search (e.g., indexing and recommendation of offset detection, instrument recognition, beat and rhythm track- music by melody, bass, rhythm or chord progression), and ing, interpretation of expressive timing and dynamics, and musicology (e.g., analyzing jazz improvisations and other non- score typesetting. Given the number of subtasks it comprises notated music). As such, AMT is an enabling technology with and its wide application range, it is considered a fundamental clear potential for both economic and societal impact. problem in the fields of music signal processing and music AMT is closely related to other music signal processing information retrieval (MIR) [1], [2].
    [Show full text]
  • Real-Time Programming and Processing of Music Signals Arshia Cont
    Real-time Programming and Processing of Music Signals Arshia Cont To cite this version: Arshia Cont. Real-time Programming and Processing of Music Signals. Sound [cs.SD]. Université Pierre et Marie Curie - Paris VI, 2013. tel-00829771 HAL Id: tel-00829771 https://tel.archives-ouvertes.fr/tel-00829771 Submitted on 3 Jun 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Realtime Programming & Processing of Music Signals by ARSHIA CONT Ircam-CNRS-UPMC Mixed Research Unit MuTant Team-Project (INRIA) Musical Representations Team, Ircam-Centre Pompidou 1 Place Igor Stravinsky, 75004 Paris, France. Habilitation à diriger la recherche Defended on May 30th in front of the jury composed of: Gérard Berry Collège de France Professor Roger Dannanberg Carnegie Mellon University Professor Carlos Agon UPMC - Ircam Professor François Pachet Sony CSL Senior Researcher Miller Puckette UCSD Professor Marco Stroppa Composer ii à Marie le sel de ma vie iv CONTENTS 1. Introduction1 1.1. Synthetic Summary .................. 1 1.2. Publication List 2007-2012 ................ 3 1.3. Research Advising Summary ............... 5 2. Realtime Machine Listening7 2.1. Automatic Transcription................. 7 2.2. Automatic Alignment .................. 10 2.2.1.
    [Show full text]
  • 2007–09 Program Requirements (.Pdf)
    1 FOREWORD F ROM THE DEAN http://www.yorku.ca/grads/calendar/ Graduate study involves a level of engagement with subject matter, in Business, Law, Education, Translation and Social Work and in fellow students, and faculty members that marks a high point in health-related disciplines focused through York’s new Faculty of one’s intellectual and creative development. At the master’s and Health. Innovative and unique interdisciplinary programs have Doctoral levels, graduate study in one way or another is at the centre been created in such areas as Environmental Studies, Earth & Space of research and scholarly intensity within the University and provides Science, Social & Political Thought, Interdisciplinary Studies, exciting challenges and opportunities. Women’s Studies, and our most recent programs: Humanities, Human Resources Management, and Critical Studies in Disability. Since its inception in 1963, the Faculty of Graduate Studies has A further innovative dimension has involved the creation of a grown from 11 students in a single graduate program to more number of specialized graduate diplomas—such as Early Childhood than 5000 students in 46 programs. York’s graduate studies are Education, and Environmental/Sustainability Education—which expanding, with five new graduate programs in development, three may be earned concurrently with the master’s or Doctoral degree of which begin this year; 11 more programs are expanding, either in several programs, and which may also be taken as stand-alone adding a doctoral program where there is an existing master’s, graduate diplomas. York offers 32 graduate diplomas. The Faculty or adding new fields or different master’s programs.
    [Show full text]
  • Music Similarity: Learning Algorithms and Applications
    UNIVERSITY OF CALIFORNIA, SAN DIEGO More like this: machine learning approaches to music similarity A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Brian McFee Committee in charge: Professor Sanjoy Dasgupta, Co-Chair Professor Gert Lanckriet, Co-Chair Professor Serge Belongie Professor Lawrence Saul Professor Nuno Vasconcelos 2012 Copyright Brian McFee, 2012 All rights reserved. The dissertation of Brian McFee is approved, and it is ac- ceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair University of California, San Diego 2012 iii DEDICATION To my parents. Thanks for the genes, and everything since. iv EPIGRAPH I’m gonna hear my favorite song, if it takes all night.1 Frank Black, “If It Takes All Night.” 1Clearly, the author is lamenting the inefficiencies of broadcast radio programming. v TABLE OF CONTENTS Signature Page................................... iii Dedication...................................... iv Epigraph.......................................v Table of Contents.................................. vi List of Figures....................................x List of Tables.................................... xi Acknowledgements................................. xii Vita......................................... xiv Abstract of the Dissertation............................. xvi Chapter 1 Introduction.............................1 1.1 Music information retrieval..................1 1.2 Summary of contributions..................1
    [Show full text]
  • 2021 Finalist Directory
    2021 Finalist Directory April 29, 2021 ANIMAL SCIENCES ANIM001 Shrimply Clean: Effects of Mussels and Prawn on Water Quality https://projectboard.world/isef/project/51706 Trinity Skaggs, 11th; Wildwood High School, Wildwood, FL ANIM003 Investigation on High Twinning Rates in Cattle Using Sanger Sequencing https://projectboard.world/isef/project/51833 Lilly Figueroa, 10th; Mancos High School, Mancos, CO ANIM004 Utilization of Mechanically Simulated Kangaroo Care as a Novel Homeostatic Method to Treat Mice Carrying a Remutation of the Ppp1r13l Gene as a Model for Humans with Cardiomyopathy https://projectboard.world/isef/project/51789 Nathan Foo, 12th; West Shore Junior/Senior High School, Melbourne, FL ANIM005T Behavior Study and Development of Artificial Nest for Nurturing Assassin Bugs (Sycanus indagator Stal.) Beneficial in Biological Pest Control https://projectboard.world/isef/project/51803 Nonthaporn Srikha, 10th; Natthida Benjapiyaporn, 11th; Pattarapoom Tubtim, 12th; The Demonstration School of Khon Kaen University (Modindaeng), Muang Khonkaen, Khonkaen, Thailand ANIM006 The Survival of the Fairy: An In-Depth Survey into the Behavior and Life Cycle of the Sand Fairy Cicada, Year 3 https://projectboard.world/isef/project/51630 Antonio Rajaratnam, 12th; Redeemer Baptist School, North Parramatta, NSW, Australia ANIM007 Novel Geotaxic Data Show Botanical Therapeutics Slow Parkinson’s Disease in A53T and ParkinKO Models https://projectboard.world/isef/project/51887 Kristi Biswas, 10th; Paxon School for Advanced Studies, Jacksonville,
    [Show full text]
  • Data-Driven Audio Recognition: a Supervised Dictionary Approach
    DATA-DRIVEN AUDIO RECOGNITION: A SUPERVISED DICTIONARY APPROACH APREPRINT Imad Rida Laboratoire BMBI Compiègne Université de Technologie de Compiègne Compiègne, France 2021-01-01 ABSTRACT Machine hearing or listening represents an emerging area. Conventional approaches rely on the design of handcrafted features specialized to a specific audio task and that can hardly generalized to other audio fields. Unfortunately, these predefined features may be of variable discrimination power while extended to other tasks or even within the same task due to different nature of clips. Motivated by this need of a principled framework across domain applications for machine listening, we propose a generic and data-driven representation learning approach. For this sake, a novel and efficient supervised dictionary learning method is presented. Experiments are performed on both computational auditory scene (East Anglia and Rouen) and synthetic music chord recognition datasets. Obtained results show that our method is capable to reach state-of-the-art hand-crafted features for both applications arXiv:2012.14761v1 [cs.SD] 29 Dec 2020 Keywords Audio · Dictionary learning · Music · Scene Humans have a very high perception capability through physical sensation, which can include sensory input from the eyes, ears, nose, tongue, or skin. A lot of efforts have been devoted to develop intelligent computer systems capable to interpret data in a similar manner to the way humans use their senses to relate to the world around them. While most efforts have focused on vision perception which represents the dominant sense in humans, machine hearing also known as machine listening or computer audition represents an emerging area [1].
    [Show full text]
  • Interpretable Machine Learning
    SYM POS IUM28. + 29. NOV 2019 KLOSTER SANKT JOSEF 2. Netzwerkkongress der ZD.B-Initiativen für die Wissenschaft ABSTRACTBAND PROGRAMM SEITE 4 SNAPSHOTS SEITE 14 ABSTRACTS SEITE 52 2 | PROGRAMM PROGRAMM | 3 PROGRAMM 09:00 UHR Ankunft im Kloster St. Josef in Neumarkt 08:45 UHR Begrüßung TAG 10:15 UHR Eröffnung des Symposiums TAG 09:00 UHR Blitz-Intro für Postersession C Begrüßung durch das Zentrum Digitalisierung.Bayern Postersession C 10:45 UHR Keynote – Multi-Inter-Trans!? Zusammenarbeiten jenseits der Disziplin 09:15 UHR Prof. Dr. Ruth Müller C1 C2 C3 Munich Center for Technology in Society 01DO. 28. NOV. 2019 Technische Universität München 02FR. 29. NOV. 2019 11:30 UHR LET’S TALK ABOUT: Interdisziplinarität in der wissenschaftlichen Praxis 10:15 UHR Kaffeepause Prof. Dr. Oliver Amft – Universität Erlangen-Nürnberg LET’S TALK ABOUT: Innovation und Impact – Welche Rolle spielt die Dr. Jörg Haßler – LMU München 10:45 UHR Wissenschaft beim digitalen Fortschritt? Prof. Dr. Nicholas Müller – HAW Würzburg-Schweinfurth Prof. Dr. Andreas Festag – TH Ingolstadt Prof. Dr. Ruth Müller – TU München Prof. Dr. Albrecht Schmidt – LMU München Prof. Andreas Muxel – HAW Augsburg Prof. Dr. Björn Schuller – Universität Augsburg Prof. Dr. Eva Rothgang – OTH Amberg-Weiden Prof. Dr. Ramin Tavakoli Kolagari – TH Nürnberg Prof. Dr. Verena Tiefenbeck – Universität Erlangen-Nürnberg 12:30 UHR Mittagspause 11:45 UHR TREFFEN DER PHD MEET-UP 13:30 UHR Gruppenfoto ZD.B-ARBEITSKREISE Austausch der Austausch der Blitz-Intro für Postersession A 13:45 UHR ZD.B-Professor*innen & ZD.B-Doktorand*innen Nachwuchsgruppen- Postersession A 14:00 UHR leiter*innen A1 A2 A3 A4 12:30 UHR Mittagspause 15:00 UHR FELLOWS COACHING D OPEN SPACE I 13:30 UHR FELLOWS COACHING E OPEN SPACE II 15:45 UHR Kaffeepause 14:15 UHR Keynote – Open Science – warum & wie machen wir das? Dr.
    [Show full text]
  • Statement of Research Interests Sumit Basu
    Statement of Research Interests Sumit Basu sbasu@ sumitbasu.net http://www.media.mit.edu/~sbasu Post-Doctoral Researcher, Microsoft Research PhD (September 2002), MIT Department of Electrical Engineering and Computer Science Thesis Advisor: Professor Alex (Sandy) Pentland, MIT Media Laboratory My research objective is simple: I want to play with audio. I want to take auditory streams from the world, explore them, search through them, cut them apart, extract information from them, filter them, morph them, change them, and play them back into the ether. I want to find new and better ways to do these things and more, then teach them to others, both students and colleagues. I want to make audio useful and fun for a broad variety of communities. I want to build a myriad of interfaces, personal monitoring mechanisms, professional audio tools, toys, and instruments using these methods. In short, I want to do for audio what my colleagues in the computer vision and computer graphics communities have done with images and video. Over the years, I have worked in human-computer interfaces, computer vision/graphics, signal processing, statistical modeling/machine learning, and of course computer audition. W hen I began my graduate studies, I was initially drawn to computer vision and Sandy Pentland‘s group at the MIT Media Lab because of their strong sense of play œ they were extracting interesting meta-information from visual streams, such as the location of a user‘s head and hands, and using it for interactive applications like playing with a virtual dog. I joined their efforts and spent several years working on computer vision and interactive vision systems.
    [Show full text]
  • Tzanetakis, George Curriculum Vitae (Updated July 2016) 1
    Tzanetakis, George Curriculum Vitae (updated July 2016) 1 Research Interests Music Information Retrieval, Audio Signal Processing, Machine Learning, Human Computer Interaction, Digital Libraries, Software Frameworks for Audio Processing, Auditory Scene Analysis Education • Ph.D in Computer Science, Princeton University 2002 Manipulation, Analysis and Retrieval Systems for Audio Signals Advisor: Perry Cook • MA in Computer Science, Princeton University 1999 • BSE Computer Science, Magna Cum Laude, University of Crete, Greece 1997 • Music Education – Music Theory and Composition - Music Department, Princeton University (10 courses while doing PhD in Computer Science) (1997-2001) – Musicology, saxophone performance, and theory Athenaum Conservatory, Athens, Greece (1993-1997) – Piano and theory - Heraklion Conservatory, Greece (1985-1995) Professional Employment History • 2010-present: Associate Professor of Computer Science, University of Victoria, BC Canada (also cross-listed in Music, and in Electrical and Computer Engineering) • 2011 (6 months): Visiting Scientist, Google Research, Mountain View, California Collaboratos: Dick Lyon, Douglas Eck, Jay Yagnik, David Ross, Tom Walters • 2003-2010: Assistant Professor of Computer Science, University of Victoria, BC Canada (also cross-listed in Music, and in Electrical and Computer Engineering) • 2002-2003: Postdoctoral Fellow, Computer Science, Carnegie Mellon University (Computer Music Group, Informedia Group) Collaborators: R.Dannenberg, C.Atkenson, A.Hauptmann, H.Wactlar, and C. Faloutsos
    [Show full text]
  • Semantic Annotation of Music Collections: a Computational Approach
    Semantic Annotation of Music Collections: A Computational Approach Mohamed Sordo TESI DOCTORAL UPF / 2011 Directors de la tesi: Dr. Xavier Serra i Casals Dept. of Information and Communication Technologies Universitat Pompeu Fabra, Barcelona, Spain Dr. Òscar Celma i Herrada Gracenote, Emeryville, CA, USA Copyright c Mohamed Sordo, 2011. Dissertation submitted to the Department of Information and Communication Technologies of Universitat Pompeu Fabra in partial fulfillment of the require- ments for the degree of DOCTOR PER LA UNIVERSITAT POMPEU FABRA, with the mention of European Doctor. Music Technology Group (http://mtg.upf.edu), Dept. of Information and Communica- tion Technologies (http://www.upf.edu/dtic), Universitat Pompeu Fabra (http://www. upf.edu), Barcelona, Spain. A Radia, Idris y Randa. Me siento muy orgulloso de ser vuestro hijo y tu hermano. A toda mi familia. Acknowledgements During these last few years, I had the luck to work with an amazing group of people at the Music Technology Group. First and foremost, I would specially like to thank 3 people regarding this dissertation. Xavier Serra, for giving me the opportunity to join the Music Technology Group, and for his wise advices in key moments of the thesis work. Òscar Celma, for being the perfect co- supervisor a post–graduate student can have. Whether it was for guidance or for publishing, he was always there. Fabien Gouyon, who would have been without any doubt the third supervisor of this thesis. I especially thank him for giving me the opportunity to join his research group in the wonderful city of Porto, as a research stay.
    [Show full text]
  • Audio Data Analysis
    Audio data analysis CES Data Scientist Slim ESSID Audio Data Analysis and Signal Processing team [email protected] http://www.telecom-paristech.fr/~essid Credits O. GILLET, C. JODER, N. MOREAU, G. RICHARD, F. VALLET, … Slim Essid About “audio”… ►Audio frequency: the range of audible frequencies (20 to 20,000 Hz) Threshold of pain Audible sound Minimal audition threshold CC Attribution 2.5 Generic Frequencies (kHz) 2 CES Data Science – Audio data analysis Slim Essid About “audio”… ►Audio content categories To ad -minis -ter medi Speech Music Environmental 3 CES Data Science – Audio data analysis Slim Essid About “audio”… ►An important distinction: speech vs non-speech Speech signals Music & non-speech (environmental) “Simple” production model: No generic production model: the source-filter model “timbre”, “pitch”, “loudness”, … Image: Edward Flemming, course materials for 24.910 Topics in Linguistic Theory: Laboratory Phonology, Spring 2007. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on 05 May 2012 4 CES Data Science – Audio data analysis Slim Essid About “audio”… ►Different research communities Music Information Speech Research Signal representations Music Speech classification recognition (genre, mood, …) Audio coding Speaker Transcription Source recognition separation Speech Rhythm Sound enhancement analysis synthesis … … … Machine Listening / Computer audition 5 CES Data Science – Audio data analysis Slim Essid About “audio”… ►Research fields Acoustics Linguistics Psychology Psychoacoustics Audio content Musicology analysis Signal Knowledge processing engineering Machine learning Databases Statistics 6 CES Data Science – Audio data analysis Slim Essid About “audio”… ►Research fields Acoustics Linguistics Psychology Psychoacoustics Audio content Musicology analysis Signal Knowledge processing engineering Machine learning Databases Statistics 7 CES Data Science – Audio data analysis Slim Essid Why analyse audio data? .
    [Show full text]
  • ARVR Presentations
    Mark F. Bocko | Professor Department: Electrical and Computer Engineering Focus: Spatial audio Pilot Project: “Development of a quantitative framework for spatial audio characterization” Project Goals • Develop quantitative methods to assess spatial audio rendering systems • Incorporate quantitative binaural hearing models into audio system design tools • Predict what listeners will report hearing (locations, spatial extent of sources, diffusiveness) October 1, 2018 Geunyoung Yoon | Professor Department: Ophthalmology, The Institute of Optics, Center for Visual Science, Biomedical Engineering Focus: Physiological Optics, Vision Correction, Visual Psychophysics, Optical Imaging, Biomechanics, Eye Diseases Lab website: http://www.cvs.rochester.edu/yoonlab/ RESEARCH TOPICS: OCULAR OPTICS & CUSTOMIZED ACOOMMODATION & PRESBYOPIA VISION CORRECTION • Vergence-Accommodation conflict • Eye’s aberration and visual quality under VR/AR environments • Ocular wavefront sensing • Extended depth of focus technology • Advanced ophthalmic lenses • Accommodating intraocular lens • Sport vision • Peripheral vision and optics • Optical metrology • Binocular accommodation • Emmetropization / Refractive error OCULAR OPTICS and VISION ANTERIOR SEGMENT IMAGING • Adaptive optics vision simulator • Mechanisms of pathologic cornea • Adaptation to habitual optics diseases • Neural processing and perception • Ocular surface diseases and dry eye • Binocular integration • Corneal biomechanics • Neural plasticity • Multimodal high-resolution ocular • Stereopsis imaging
    [Show full text]