A Parametric Sound Object Model for Sound Texture Synthesis

Total Page:16

File Type:pdf, Size:1020Kb

A Parametric Sound Object Model for Sound Texture Synthesis Daniel Mohlmann¨ A Parametric Sound Object Model for Sound Texture Synthesis Dissertation zur Erlangung des Grades eines Doktors der Ingenieurwissenschaften | Dr.-Ing. | Vorgelegt im Fachbereich 3 (Mathematik und Informatik) der Universitat¨ Bremen im Juni 2011 Gutachter: Prof. Dr. Otthein Herzog Universit¨atBremen Prof. Dr. J¨ornLoviscach Fachhochschule Bielefeld Abstract This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are con- sidered. Four essential processing steps for sound texture analysis are identified, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fixed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of pa- rameters over time. In contrast to standard spectral modeling techniques, this repre- sentation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of different length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of different sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modula- tion, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various tech- niques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed. 3 Acknowledgements First and foremost, I wish to thank my advisor, Prof. Dr. Otthein Herzog, for his support, for his advice and encouragement. Our discussions about music, acoustics, methodology and machine learning have always been inspiring for me, and his con- structive criticism has helped to sharpen the focus of this work. I also wish to express my gratitude to the Artificial Intelligence Research Group as a whole, for being help- ful and generous in many different ways and giving me the opportunity to travel. In particular, I want to acknowledge the help of Dr. Thomas Wagner, for encouraging me to visit conferences and for his advice regarding a number of machine learning and AI issues. Furthermore, I want to thank my second supervisor, Prof. Dr. J¨ornLoviscach, for his support and for his help with some rather complicated technical problems. I was lucky to profit from his extensive knowledge in audio processing and his inventiveness. I owe my deepest gratitude to the Klaus Tschira Foundation for its generous sup- port, and to the Graduate School Advances in Digital Media for providing the kind of creative environment and the helpful assistance that is invaluable for a work like this. Finally, I want to thank my parents for their support, and I want to thank Silke Gennies for being so infinitely encouraging and patient during all of this. 5 Contents 1 Introduction and Motivation 13 1.1 Goals and Research Questions . 13 1.2 Definitions . 14 1.3 Applications for Adaptive Sound Textures . 19 1.4 Overview of this Thesis . 23 2 Foundations of Sound Production and Perception 25 2.1 Sound Sources and Acoustics . 26 2.2 Human Hearing . 36 2.3 Temporal Patterns and Structure . 49 3 Sound Synthesis and Manipulation 55 3.1 Model Requirements for Different Applications . 55 3.2 An Overview of Sound Coding Concepts . 59 3.3 Additive Synthesis . 67 3.4 Spectral Modeling . 71 3.5 Spectral Envelope Coding . 74 3.6 Sound Source Separation . 79 4 Sound Texture Analysis and Synthesis Algorithms 85 4.1 Requirements for Sound Texture Synthesis . 85 4.2 Texture Algorithms in Computer Graphics . 89 4.3 Manually Constructed Sound Textures . 93 4.4 Automatic Synthesis-by-Analysis Methods . 95 4.5 Comparison of the Methods . 99 4.6 Chapter Conclusions . 104 5 Proposed Sound Texture Description Model and Workflow 107 5.1 Technical Aspects of the Sound Texture Problem . 107 5.2 A Hierarchical Description Model for Sound Textures . 113 6 Implementation of the Sound Element Analysis and Synthesis 115 6.1 Overview of the Parametric Sound Model . 115 6.2 B-Spline Coding . 116 6.3 Encoding of the Fundamental Frequency Trajectory . 119 6.4 The Spectral Envelope . 121 6.5 Re-Synthesis . 127 6.6 Choice of the Model Resolution . 130 7 8 Contents 6.7 Possible Extensions and Additional Parameters . 133 6.8 Limitations of the Sound Element Model . 136 7 Evaluation 139 7.1 Inter-Subjective Evaluation Methods . 139 7.2 Evaluation Methods of Audio Coding Quality . 142 7.3 Sound Element Test Method . 144 7.4 Survey Results . 149 7.5 Compression Performance . 158 8 Conclusion 161 8.1 Contributions of this Work . 161 8.2 Observations About Sound Modeling . 163 8.3 Future Work . 165 8.4 Final Thoughts . 167 References 169 A Sound Texture Definitions 181 B Spectrograms of Sounds in the Evaluation 185 C PCA-Based Dimensionality Reduction for Sound Elements 193 List of Figures 1.1 Creature design in Spore ........................... 21 2.1 Different types of wave propagation . 27 2.2 Frequency, amplitude and phase parameters of a sinusoid . 28 2.3 Spectrogram of a plucked guitar string . 29 2.4 Harmonics of a guitar string . 29 2.5 The first six vibrational modes of a pitched drum . 31 2.6 Cross-section of the human vocal tract . 35 2.7 Cross-section of the human ear . 37 3.1 Screenshot of the GeneSynth software . 65 3.2 Illustration of the sinusoid tracking procedure . 69 3.3 Illustration of parabolic interpolation . 71 3.4 Interpolation behavior of the LPC model . 77 3.5 Interpolation behavior of the MFCC model . 78 3.6 Interpolation behavior of the spline model . 80 5.1 UML component diagram of the sound texture model . 113 5.2 Example structure of a sound texture with three simultaneous tracks . 114 6.1 Illustration of the effect of the spline order on basis functions . 117 6.2 Knot placement in a B-spline with fixed start and end points . 118 6.3 Comparison of the B-spline model and the Bezi´erspline model . 119 6.4 Screenshot of the implemented tracking and conversion software . 121 6.5 Correspondence between the envelope coefficient matrix and the spectral envelope model . 122 6.6 Computation of the partial peak amplitudes . 123 6.7 Illustration of the conversion from observed peaks to virtual peaks . 125 6.8 Different methods for improved transient modeling . 132 6.9 Illustration of the correspondence problem . 138 7.1 Screenshot from the online evaluation (participant data) . 148 7.2 Screenshot from the online evaluation (test sample presentation) . 149 7.3 Statistical properties of the group of participants . 150 7.4 Evaluation results for directly converted sounds . 153 7.5 Evaluation results for sounds created from two originals . 156 B.1 Spectrograms of the original and synthetic \bee" sound . 185 B.2 Spectrograms of the original and synthetic \hawk" sound . 185 9 10 List of Figures B.3 Spectrograms of the original and synthetic \cow" sound . 186 B.4 Spectrograms of the original and synthetic \dog bark" sound . 186 B.5 Spectrograms of the original and synthetic \pony" sound . 186 B.6 Spectrograms of the original and synthetic \waterdrop" sound . 187 B.7 Spectrograms of the original and synthetic \cuckoo" sound . 187 B.8 Spectrograms of the original and synthetic \gunshot" sound . 187 B.9 Spectrograms of the original and synthetic \creak" sound . 188 B.10 Spectrograms of the original and synthetic \warbler" sound . 188 B.11 Spectrograms of the original and synthetic “flute" sound . 188 B.12 Spectrograms of the original and synthetic \guitar" sound . 189 B.13 Spectrograms of the original and synthetic \large splash" sound . 189 B.14 Spectrograms of the original and synthetic \rain" sound . 190 B.15 Spectrograms of the original and synthetic \songthrush" sound . 190 B.16 Spectrograms of the original and synthetic \singing" sound . 190 B.17 Spectrograms of the original and synthetic \piano" sound . 191 B.18 Spectrograms of the original and synthetic “traffic” sound . 191 B.19 Spectrograms of the original and synthetic \thunder" sound . 191 B.20 Spectrograms of the original and synthetic \gravel" sound . 192 B.21 Spectrograms of the original and synthetic \rooster" sound . 192 C.1 Illustration of the principal component analysis . 194 List of Tables 2.1 Frequency bands on the Bark scale . 41 4.1 Comparison of sound texture algorithms and software . 101 7.1 List of test samples in the evaluation for directly parametrized sounds . 146 7.2 List of test samples in the evaluation for morphed parametrized sounds 147 7.3 Influence of the equipment quality in the evaluation . 158 A.1 Definitions of the term \sound texture" in existing publications . 183 11 12 List of Tables Chapter 1 Introduction and Motivation In audio processing, music and speech have always been at the center of attention (Athineos & Ellis, 2003). Their relevance for communication and entertainment is immediately obvious. However, in recent years a new discipline has developed that directs its attention to the domain of so-called sound textures, the chaotic multitude of atmospheric weather sounds, traffic noises, singing birds and babbling crowds that make up a great deal of everyone's daily life.
Recommended publications
  • Art to Commerce: the Trajectory of Popular Music Criticism
    Art to Commerce: The Trajectory of Popular Music Criticism Thomas Conner and Steve Jones University of Illinois at Chicago [email protected] / [email protected] Abstract This article reports the results of a content and textual analysis of popular music criticism from the 1960s to the 2000s to discern the extent to which criticism has shifted focus from matters of music to matters of business. In part, we believe such a shift to be due likely to increased awareness among journalists and fans of the industrial nature of popular music production, distribution and consumption, and to the disruption of the music industry that began in the late 1990s with the widespread use of the Internet for file sharing. Searching and sorting the Rock’s Backpages database of over 22,000 pieces of music journalism for keywords associated with the business, economics and commercial aspects of popular music, we found several periods during which popular music criticism’s focus on business-related concerns seemed to have increased. The article discusses possible reasons for the increases as well as methods for analyzing a large corpus of popular music criticism texts. Keywords: music journalism, popular music criticism, rock criticism, Rock’s Backpages Though scant scholarship directly addresses this subject, music journalists and bloggers have identified a trend in recent years toward commerce-specific framing when writing about artists, recording and performance. Most music journalists, according to Willoughby (2011), “are writing quasi shareholder reports that chart the movements of artists’ commercial careers” instead of artistic criticism. While there may be many reasons for such a trend, such as the Internet’s rise to prominence not only as a medium for distribution of music but also as a medium for distribution of information about music, might it be possible to discern such a trend? Our goal with the research reported here was an attempt to empirically determine whether such a trend exists and, if so, the extent to which it does.
    [Show full text]
  • Brazilian/American Trio São Paulo Underground Expands Psycho-Tropicalia Into New Dimensions on Cantos Invisíveis, a Global Tapestry That Transcends Place & Time
    Bio information: SÃO PAULO UNDERGROUND Title: CANTOS INVISÍVEIS (Cuneiform Rune 423) Format: CD / DIGITAL Cuneiform Promotion Dept: (301) 589-8894 / Fax (301) 589-1819 Press and world radio: [email protected] | North American and world radio: [email protected] www.cuneiformrecords.com FILE UNDER: JAZZ / TROPICALIA / ELECTRONIC / WORLD / PSYCHEDELIC / POST-JAZZ RELEASE DATE: OCTOBER 14, 2016 Brazilian/American Trio São Paulo Underground Expands Psycho-Tropicalia into New Dimensions on Cantos Invisíveis, a Global Tapestry that Transcends Place & Time Cantos Invisíveis is a wondrous album, a startling slab of 21st century trans-global music that mesmerizes, exhilarates and transports the listener to surreal dreamlands astride the equator. Never before has the fearless post-jazz, trans-continental trio São Paulo Underground sounded more confident than here on their fifth album and third release for Cuneiform. Weaving together a borderless electro-acoustic tapestry of North and South American, African and Asian, traditional folk and modern jazz, rock and electronica, the trio create music at once intimate and universal. On Cantos Invisíveis, nine tracks celebrate humanity by evoking lost haunts, enduring love, and the sheer delirious joy of making music together. São Paulo Underground fully manifests its expansive vision of a universal global music, one that blurs edges, transcends genres, defies national and temporal borders, and embraces humankind in its myriad physical and spiritual dimensions. Featuring three multi-instrumentalists, São Paulo Underground is the creation of Chicago-reared polymath Rob Mazurek (cornet, Mellotron, modular synthesizer, Moog Paraphonic, OP-1, percussion and voice) and two Brazilian masters of modern psycho- Tropicalia -- Mauricio Takara (drums, cavaquinho, electronics, Moog Werkstatt, percussion and voice) and Guilherme Granado (keyboards, synthesizers, sampler, percussion and voice).
    [Show full text]
  • Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM)
    University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses Summer 8-9-2017 Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM) Mohan Kumar Kanuri [email protected] Follow this and additional works at: https://scholarworks.uno.edu/td Part of the Signal Processing Commons Recommended Citation Kanuri, Mohan Kumar, "Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM)" (2017). University of New Orleans Theses and Dissertations. 2381. https://scholarworks.uno.edu/td/2381 This Thesis is protected by copyright and/or related rights. It has been brought to you by ScholarWorks@UNO with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights- holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/or on the work itself. This Thesis has been accepted for inclusion in University of New Orleans Theses and Dissertations by an authorized administrator of ScholarWorks@UNO. For more information, please contact [email protected]. Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM) A Thesis Submitted to the Graduate Faculty of the University of New Orleans in partial fulfillment of the requirements for the degree of Master of Science in Engineering – Electrical By Mohan Kumar Kanuri B.Tech., Jawaharlal Nehru Technological University, 2014 August 2017 This thesis is dedicated to my parents, Mr.
    [Show full text]
  • Masterarbeit
    Masterarbeit Erstellung einer Sprachdatenbank sowie eines Programms zu deren Analyse im Kontext einer Sprachsynthese mit spektralen Modellen zur Erlangung des akademischen Grades Master of Science vorgelegt dem Fachbereich Mathematik, Naturwissenschaften und Informatik der Technischen Hochschule Mittelhessen Tobias Platen im August 2014 Referent: Prof. Dr. Erdmuthe Meyer zu Bexten Korreferent: Prof. Dr. Keywan Sohrabi Eidesstattliche Erklärung Hiermit versichere ich, die vorliegende Arbeit selbstständig und unter ausschließlicher Verwendung der angegebenen Literatur und Hilfsmittel erstellt zu haben. Die Arbeit wurde bisher in gleicher oder ähnlicher Form keiner anderen Prüfungsbehörde vorgelegt und auch nicht veröffentlicht. 2 Inhaltsverzeichnis 1 Einführung7 1.1 Motivation...................................7 1.2 Ziele......................................8 1.3 Historische Sprachsynthesen.........................9 1.3.1 Die Sprechmaschine.......................... 10 1.3.2 Der Vocoder und der Voder..................... 10 1.3.3 Linear Predictive Coding....................... 10 1.4 Moderne Algorithmen zur Sprachsynthese................. 11 1.4.1 Formantsynthese........................... 11 1.4.2 Konkatenative Synthese....................... 12 2 Spektrale Modelle zur Sprachsynthese 13 2.1 Faltung, Fouriertransformation und Vocoder................ 13 2.2 Phase Vocoder................................ 14 2.3 Spectral Model Synthesis........................... 19 2.3.1 Harmonic Trajectories........................ 19 2.3.2 Shape Invariance..........................
    [Show full text]
  • One Direction Infection: Media Representations of Boy Bands and Their Fans
    One Direction Infection: Media Representations of Boy Bands and their Fans Annie Lyons TC 660H Plan II Honors Program The University of Texas at Austin December 2020 __________________________________________ Renita Coleman Department of Journalism Supervising Professor __________________________________________ Hannah Lewis Department of Musicology Second Reader 2 ABSTRACT Author: Annie Lyons Title: One Direction Infection: Media Representations of Boy Bands and their Fans Supervising Professors: Renita Coleman, Ph.D. Hannah Lewis, Ph.D. Boy bands have long been disparaged in music journalism settings, largely in part to their close association with hordes of screaming teenage and prepubescent girls. As rock journalism evolved in the 1960s and 1970s, so did two dismissive and misogynistic stereotypes about female fans: groupies and teenyboppers (Coates, 2003). While groupies were scorned in rock circles for their perceived hypersexuality, teenyboppers, who we can consider an umbrella term including boy band fanbases, were defined by a lack of sexuality and viewed as shallow, immature and prone to hysteria, and ridiculed as hall markers of bad taste, despite being driving forces in commercial markets (Ewens, 2020; Sherman, 2020). Similarly, boy bands have been disdained for their perceived femininity and viewed as inauthentic compared to “real” artists— namely, hypermasculine male rock artists. While the boy band genre has evolved and experienced different eras, depictions of both the bands and their fans have stagnated in media, relying on these old stereotypes (Duffett, 2012). This paper aimed to investigate to what extent modern boy bands are portrayed differently from non-boy bands in music journalism through a quantitative content analysis coding articles for certain tropes and themes.
    [Show full text]
  • Psychological Measurement for Sound Description and Evaluation 11
    11 Psychological measurement for sound description and evaluation Patrick Susini,1 Guillaume Lemaitre,1 and Stephen McAdams2 1Institut de Recherche et de Coordination Acoustique/Musique Paris, France 2CIRMMT, Schulich School of Music, McGill University Montréal, Québec, Canada 11.1 Introduction Several domains of application require one to measure quantities that are representa- tive of what a human listener perceives. Sound quality evaluation, for instance, stud- ies how users perceive the quality of the sounds of industrial objects (cars, electrical appliances, electronic devices, etc.), and establishes specifications for the design of these sounds. It refers to the fact that the sounds produced by an object or product are not only evaluated in terms of annoyance or pleasantness, but are also important in people’s interactions with the object. Practitioners of sound quality evaluation therefore need methods to assess experimentally, or automatic tools to predict, what users perceive and how they evaluate the sounds. There are other applications requir- ing such measurement: evaluation of the quality of audio algorithms, management (organization, retrieval) of sound databases, and so on. For example, sound-database retrieval systems often require measurements of relevant perceptual qualities; the searching process is performed automatically using similarity metrics based on rel- evant descriptors stored as metadata with the sounds in the database. The “perceptual” qualities of the sounds are called the auditory attributes, which are defined as percepts that can be ordered on a magnitude scale. Historically, the notion of auditory attribute is grounded in the framework of psychoacoustics. Psychoacoustical research aims to establish quantitative relationships between the physical properties of a sound (i.e., the properties measured by the methods and instruments of the natural sciences) and the perceived properties of the sounds, the auditory attributes.
    [Show full text]
  • A Unit Selection Text-To-Speech-And-Singing Synthesis Framework from Neutral Speech: Proof of Concept 39 II.1 Introduction
    Adding expressiveness to unit selection speech synthesis and to numerical voice production 90) - 02 - Marc Freixes Guerreiro http://hdl.handle.net/10803/672066 Generalitat 472 (28 de Catalunya núm. Rgtre. Fund. ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de ió la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual undac F (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs. Universitat Ramon Llull Universitat Ramon ADVERTENCIA. El acceso a los contenidos de esta tesis doctoral y su utilización debe respetar los derechos de la persona autora. Puede ser utilizada para consulta o estudio personal, así como en actividades o materiales de investigación y docencia en los términos establecidos en el art. 32 del Texto Refundido de la Ley de Propiedad Intelectual (RDL 1/1996).
    [Show full text]
  • Design of a Protocol for the Measurement of Physiological and Emotional Responses to Sound Stimuli
    DESIGN OF A PROTOCOL FOR THE MEASUREMENT OF PHYSIOLOGICAL AND EMOTIONAL RESPONSES TO SOUND STIMULI ANDRÉS FELIPE MACÍA ARANGO UNIVERSIDAD DE SAN BUENAVENTURA MEDELLÍN FACULTAD DE INGENIERÍAS INGENIERÍA DE SONIDO MEDELLÍN 2017 DESIGN OF A PROTOCOL FOR THE MEASUREMENT OF PHYSIOLOGICAL AND EMOTIONAL RESPONSES TO SOUND STIMULI ANDRÉS FELIPE MACÍA ARANGO A thesis submitted in partial fulfillment for the degree of Sound Engineer Adviser: Jonathan Ochoa Villegas, Sound Engineer Universidad de San Buenaventura Medellín Facultad de Ingenierías Ingeniería de Sonido Medellín 2017 TABLE OF CONTENTS ABSTRACT ................................................................................................................................................................ 7 INTRODUCTION ..................................................................................................................................................... 8 1. GOALS .................................................................................................................................................................... 9 2. STATE OF THE ART ........................................................................................................................................ 10 3. REFERENCE FRAMEWORK ......................................................................................................................... 15 3.1. Noise ...........................................................................................................................................................
    [Show full text]
  • Comparison of Frequency-Warped Filter Banks in Relation to Robust Features for Speaker Identification
    Recent Advances in Electrical Engineering Comparison of Frequency-Warped Filter Banks in relation to Robust Features for Speaker Identification SHARADA V CHOUGULE MAHESH S CHAVAN Electronics and Telecomm. Engg.Dept. Electronics Engg. Department, Finolex Academy of Management and KIT’s College of Engineering, Technology, Ratnagiri, Maharashtra, India Kolhapur,Maharashtra,India [email protected] [email protected] Abstract: - Use of psycho-acoustically motivated warping such as mel-scale warping in common in speaker recognition task, which was first applied for speech recognition. The mel-warped cepstral coefficients (MFCCs) have been used in state-of-art speaker recognition system as a standard acoustic feature set. Alternate frequency warping techniques such as Bark and ERB rate scale can have comparable performance to mel-scale warping. In this paper the performance acoustic features generated using filter banks with Bark and ERB rate warping is investigated in relation to robust features for speaker identification. For this purpose, a sensor mismatched database is used for closed set text-dependent and text-independent cases. As MFCCs are much sensitive to mismatched conditions (any type of mismatch of data used for training evaluation purpose) , in order to reduce the additive noise, spectral subtraction is performed on mismatched speech data. Also normalization of feature vectors is carried out over each frame, to compensate for channel mismatch. Experimental analysis shows that, percentage identification rate for text-dependent case using mel, bark and ERB warped filter banks is comparably same in mismatched conditions. However, in case of text-independent speaker identification, ERB rate warped filter bank features shows improved performance than mel and bark warped features for the same sensor mismatched condition.
    [Show full text]
  • Fundamentals of Psychoacoustics Psychophysical Experimentation
    Chapter 6: Fundamentals of Psychoacoustics • Psychoacoustics = auditory psychophysics • Sound events vs. auditory events – Sound stimuli types, psychophysical experiments – Psychophysical functions • Basic phenomena and concepts – Masking effect • Spectral masking, temporal masking – Pitch perception and pitch scales • Different pitch phenomena and scales – Loudness formation • Static and dynamic loudness – Timbre • as a multidimensional perceptual attribute – Subjective duration of sound 1 M. Karjalainen Psychophysical experimentation • Sound events (si) = pysical (objective) events • Auditory events (hi) = subject’s internal events – Need to be studied indirectly from reactions (bi) • Psychophysical function h=f(s) • Reaction function b=f(h) 2 M. Karjalainen 1 Sound events: Stimulus signals • Elementary sounds – Sinusoidal tones – Amplitude- and frequency-modulated tones – Sinusoidal bursts – Sine-wave sweeps, chirps, and warble tones – Single impulses and pulses, pulse trains – Noise (white, pink, uniform masking noise) – Modulated noise, noise bursts – Tone combinations (consisting of partials) • Complex sounds – Combination tones, noise, and pulses – Speech sounds (natural, synthetic) – Musical sounds (natural, synthetic) – Reverberant sounds – Environmental sounds (nature, man-made noise) 3 M. Karjalainen Sound generation and experiment environment • Reproduction techniques – Natural acoustic sounds (repeatability problems) – Loudspeaker reproduction – Headphone reproduction • Reproduction environment – Not critical in headphone
    [Show full text]
  • Abstract 1. Introduction the Process of Normalising Vowel Formant Data To
    COMPARING VOWEL FORMANT NORMALISATION PROCEDURES NICHOLAS FLYNN Abstract This article compares 20 methods of vowel formant normalisation. Procedures were evaluated depending on their effectiveness at neutralising the variation in formant data due to inter-speaker physiological and anatomical differences. This was measured through the assessment of the ability of methods to equalise and align the vowel space areas of different speakers. The equalisation of vowel spaces was quantified through consideration of the SCV of vowel space areas calculated under each method of normalisation, while the alignment of vowel spaces was judged through considering the intersection and overlap of scale-drawn vowel space areas. An extensive dataset was used, consisting of large numbers of tokens from a wide range of vowels from 20 speakers, both male and female, of two different age groups. Normalisation methods were assessed. The results showed that vowel-extrinsic, formant-intrinsic, speaker-intrinsic methods performed the best at equalising and aligning speakers‟ vowel spaces, while vowel-intrinsic scaling transformations were judged to perform poorly overall at these two tasks. 1. Introduction The process of normalising vowel formant data to permit accurate cross-speaker comparisons of vowel space layout, change and variation, is an issue that has grown in importance in the field of sociolinguistics in recent years. A plethora of different methods and formulae for this purpose have now been proposed. Thomas & Kendall (2007) provide an online normalisation tool, “NORM”, a useful resource for normalising formant data, and one which has opened the viability of normalising to a greater number of researchers. However, there is still a lack of agreement over which available algorithm is the best to use.
    [Show full text]
  • Perceptual Atomic Noise
    PERCEPTUAL ATOMIC NOISE Kristoffer Jensen University of Aalborg Esbjerg Niels Bohrsvej 6, DK-6700 Esbjerg [email protected] ABSTRACT signal point-of-view that creates sound with a uniform distribution and spectrum, or the physical point-of-view A noise synthesis method with no external connotation that creates a noise similar to existing sounds. Instead, is proposed. By creating atoms with random width, an attempt is made at rendering the atoms perceptually onset-time and frequency, most external connotations white, by distributing them according to the perceptual are avoided. The further addition of a frequency frequency axis using a probability function obtained distribution corresponding to the perceptual Bark from the bark scale, and with a magnitude frequency, and a spectrum corresponding to the equal- corresponding to the equal-loudness contour, by loudness contour for a given phon level further removes filtering the atoms using a warped filter corresponding the synthesis from the common signal point-of-view. to the equal-loudness contour for a given phon level. The perceptual frequency distribution is obtained by This gives a perceptual white resulting spectrum with a creating a probability density function from the Bark perceptually uniform frequency distribution. scale, and the equal-loudness contour (ELC) spectrum is created by filtering the atoms with a filter obtained in 2. NOISE the warped frequency domain by fitting the filter to a simple ELC model. An additional voiced quality Noise has been an important component since the parameter allows to vary the harmonicity. The resulting beginning of music, and recently noise music has sound is susceptible to be used in everything from loud evolved as an independent music style, sometimes noise music, contemporary compositions, to meditation avoiding the toned components altogether.
    [Show full text]