Progress in LPC-Based Frequency-Domain Audio Coding Takehiro Moriya, Ryosuke Sugiura, Yutaka Kamamoto, Hirokazu Kameoka and Noboru Harada
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Science of String Instruments
The Science of String Instruments Thomas D. Rossing Editor The Science of String Instruments Editor Thomas D. Rossing Stanford University Center for Computer Research in Music and Acoustics (CCRMA) Stanford, CA 94302-8180, USA [email protected] ISBN 978-1-4419-7109-8 e-ISBN 978-1-4419-7110-4 DOI 10.1007/978-1-4419-7110-4 Springer New York Dordrecht Heidelberg London # Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer ScienceþBusiness Media (www.springer.com) Contents 1 Introduction............................................................... 1 Thomas D. Rossing 2 Plucked Strings ........................................................... 11 Thomas D. Rossing 3 Guitars and Lutes ........................................................ 19 Thomas D. Rossing and Graham Caldersmith 4 Portuguese Guitar ........................................................ 47 Octavio Inacio 5 Banjo ...................................................................... 59 James Rae 6 Mandolin Family Instruments........................................... 77 David J. Cohen and Thomas D. Rossing 7 Psalteries and Zithers .................................................... 99 Andres Peekna and Thomas D. -
Surround Sound Processed by Opus Codec: a Perceptual Quality Assessment
28. Konferenz Elektronische Sprachsignalverarbeitung 2017, Saarbrücken SURROUND SOUND PROCESSED BY OPUS CODEC: APERCEPTUAL QUALITY ASSESSMENT Franziska Trojahn, Martin Meszaros, Michael Maruschke and Oliver Jokisch Hochschule für Telekommunikation Leipzig, Germany [email protected] Abstract: The article describes the first perceptual quality study of 5.1 surround sound that has been processed by the Opus codec standardised by the Internet Engineering Task Force (IETF). All listening sessions with up to five subjects took place in a slightly sound absorbing laboratory – simulating living room conditions. For the assessment we conducted a Degradation Category Rating (DCR) listening- opinion test according to ITU-T P.800 recommendation with stimuli for six channels at total bitrates between 96 kbit/s and 192 kbit/s as well as hidden references. A group of 27 naive listeners compared a total of 20 sound samples. The differences between uncompressed and degraded sound samples were rated on a five-point degradation category scale resulting in Degradation Mean Opinion Score (DMOS). The overall results show that the average quality correlates with the bitrates. The quality diverges for the individual test stimuli depending on the music characteristics. Under most circumstances, a bitrate of 128 kbit/s is sufficient to achieve acceptable quality. 1 Introduction Nowadays, a high number of different speech and audio codecs are implemented in several kinds of multimedia applications; including audio / video entertainment, broadcasting and gaming. In recent years the demand for low delay and high quality audio applications, such as remote real-time jamming and cloud gaming, has been increasing. Therefore, current research objectives do not only include close to natural speech or audio quality, but also the requirements of low bitrates and a minimum latency. -
Vapar Synth--A Variational Parametric Model for Audio Synthesis
VAPAR SYNTH - A VARIATIONAL PARAMETRIC MODEL FOR AUDIO SYNTHESIS Krishna Subramani , Preeti Rao Alexandre D’Hooge Indian Institute of Technology Bombay ENS Paris-Saclay [email protected] [email protected] ABSTRACT control over the perceptual timbre of synthesized instruments. With the advent of data-driven statistical modeling and abun- With the similar aim of meaningful interpolation of timbre in dant computing power, researchers are turning increasingly to audio morphing, Engel et al. [4] replaced the basic spectral deep learning for audio synthesis. These methods try to model autoencoder by a WaveNet [5] autoencoder. audio signals directly in the time or frequency domain. In the In the present work, rather than generating new timbres, interest of more flexible control over the generated sound, it we consider the problem of synthesis of a given instrument’s could be more useful to work with a parametric representation sound with flexible control over the pitch. Wyse [6] had the of the signal which corresponds more directly to the musical similar goal in providing additional information like pitch, ve- attributes such as pitch, dynamics and timbre. We present Va- locity and instrument class to a Recurrent Neural Network to Par Synth - a Variational Parametric Synthesizer which utilizes predict waveform samples more accurately. A limitation of a conditional variational autoencoder (CVAE) trained on a suit- his model was the inability to generalize to notes with pitches able parametric representation. We demonstrate1 our proposed the network has not seen before. Defossez´ et al. [7] also model’s capabilities via the reconstruction and generation of approached the task in a similar fashion, but proposed frame- instrumental tones with flexible control over their pitch. -
A Comparison of Viola Strings with Harmonic Frequency Analysis
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Student Research, Creative Activity, and Performance - School of Music Music, School of 5-2011 A Comparison of Viola Strings with Harmonic Frequency Analysis Jonathan Paul Crosmer University of Nebraska-Lincoln, [email protected] Follow this and additional works at: https://digitalcommons.unl.edu/musicstudent Part of the Music Commons Crosmer, Jonathan Paul, "A Comparison of Viola Strings with Harmonic Frequency Analysis" (2011). Student Research, Creative Activity, and Performance - School of Music. 33. https://digitalcommons.unl.edu/musicstudent/33 This Article is brought to you for free and open access by the Music, School of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Student Research, Creative Activity, and Performance - School of Music by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. A COMPARISON OF VIOLA STRINGS WITH HARMONIC FREQUENCY ANALYSIS by Jonathan P. Crosmer A DOCTORAL DOCUMENT Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Musical Arts Major: Music Under the Supervision of Professor Clark E. Potter Lincoln, Nebraska May, 2011 A COMPARISON OF VIOLA STRINGS WITH HARMONIC FREQUENCY ANALYSIS Jonathan P. Crosmer, D.M.A. University of Nebraska, 2011 Adviser: Clark E. Potter Many brands of viola strings are available today. Different materials used result in varying timbres. This study compares 12 popular brands of strings. Each set of strings was tested and recorded on four violas. We allowed two weeks after installation for each string set to settle, and we were careful to control as many factors as possible in the recording process. -
Timbre Perception
HST.725 Music Perception and Cognition, Spring 2009 Harvard-MIT Division of Health Sciences and Technology Course Director: Dr. Peter Cariani Timbre perception www.cariani.com Friday, March 13, 2009 overview Roadmap functions of music sound, ear loudness & pitch basic qualities of notes timbre consonance, scales & tuning interactions between notes melody & harmony patterns of pitches time, rhythm, and motion patterns of events grouping, expectation, meaning interpretations music & language Friday, March 13, 2009 Wikipedia on timbre In music, timbre (pronounced /ˈtæm-bər'/, /tɪm.bər/ like tamber, or / ˈtæm(brə)/,[1] from Fr. timbre tɛ̃bʁ) is the quality of a musical note or sound or tone that distinguishes different types of sound production, such as voices or musical instruments. The physical characteristics of sound that mediate the perception of timbre include spectrum and envelope. Timbre is also known in psychoacoustics as tone quality or tone color. For example, timbre is what, with a little practice, people use to distinguish the saxophone from the trumpet in a jazz group, even if both instruments are playing notes at the same pitch and loudness. Timbre has been called a "wastebasket" attribute[2] or category,[3] or "the psychoacoustician's multidimensional wastebasket category for everything that cannot be qualified as pitch or loudness."[4] 3 Friday, March 13, 2009 Timbre ~ sonic texture, tone color Paul Cezanne. "Apples, Peaches, Pears and Grapes." Courtesy of the iBilio.org WebMuseum. Paul Cezanne, Apples, Peaches, Pears, and Grapes c. 1879-80); Oil on canvas, 38.5 x 46.5 cm; The Hermitage, St. Petersburg Friday, March 13, 2009 Timbre ~ sonic texture, tone color "Stilleben" ("Still Life"), by Floris van Dyck, 1613. -
Tr 126 959 V15.0.0 (2018-07)
ETSI TR 126 959 V15.0.0 (2018-07) TECHNICAL REPORT 5G; Study on enhanced Voice over LTE (VoLTE) performance (3GPP TR 26.959 version 15.0.0 Release 15) 3GPP TR 26.959 version 15.0.0 Release 15 1 ETSI TR 126 959 V15.0.0 (2018-07) Reference DTR/TSGS-0426959vf00 Keywords 5G ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice The present document can be downloaded from: http://www.etsi.org/standards-search The present document may be made available in electronic versions and/or in print. The content of any electronic and/or print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx If you find errors in the present document, please send your comment to one of the following services: https://portal.etsi.org/People/CommiteeSupportStaff.aspx Copyright Notification No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm except as authorized by written permission of ETSI. -
Enhanced Voice Services (EVS) Codec Management of the Institute Until Now, Telephone Services Have Generally Failed to Offer a High-Quality Audio Experience Prof
TECHNICAL PAPER Fraunhofer Institute for Integrated Circuits IIS ENHANCED VOICE SERVICES (EVS) CODEC Management of the institute Until now, telephone services have generally failed to offer a high-quality audio experience Prof. Dr.-Ing. Albert Heuberger due to limitations such as very low audio bandwidth and poor performance on non- (executive) speech signals. However, recent developments in speech and audio coding now promise a Dr.-Ing. Bernhard Grill significant quality boost in conversational services, providing the full audio bandwidth for Am Wolfsmantel 33 a more natural experience, improved speech intelligibility and listening comfort. 91058 Erlangen www.iis.fraunhofer.de The recently standardized Enhanced Voice Service (EVS) codec is the first 3GPP commu- nication codec providing super-wideband (SWB) audio bandwidth for improved speech Contact quality already at 9.6 kbps. At the same time, the codec’s performance on other signals, Matthias Rose like music or mixed content, is comparable to modern audio codecs. The key technology Phone +49 9131 776-6175 of the codec is a flexible switching scheme between specialized coding modes for speech [email protected] and music signals. The codec was jointly developed by the following companies, repre- senting operators, terminal, infrastructure and chipset vendors, as well as leading speech Contact USA and audio coding experts: Ericsson, Fraunhofer IIS, Huawei Technologies Co. Ltd, NOKIA Fraunhofer USA, Inc. Corporation, NTT, NTT DOCOMO INC., ORANGE, Panasonic Corporation, Qualcomm Digital Media Technologies* Incorporated, Samsung Electronics Co. Ltd, VoiceAge and ZTE Corporation. Phone +1 408 573 9900 [email protected] The objective of this paper is to provide a brief overview of the landscape of commu- nication systems with special focus on the EVS codec. -
Parametric Coding for Spatial Audio
Master Thesis : Parametric Coding for Spatial Audio Author : Bertrand Fatus Supervisor at Orange : St´ephane Ragot Supervisor at KTH : Sten Ternstr¨om July-December 2015 1 2 Abstract This thesis presents a stereo coding technique used as an extension for the Enhanced Voice Services (EVS) codec [10] [8]. EVS is an audio codec recently standardized by the 3rd Generation Partnership Project (3GPP) for compressing mono signals at chosen rates from 7.2 to 128 kbit/s (for fixed bit rate) and around 5.9 kbit/s (for variable bit rate). The main goal of the thesis is to present the architecture of a parametric stereo codec and how the stereo extension of EVS may be built. Parametric stereo coding relies on the transmission of a downmixed signal, sum of left and right channels, and the necessary audible cues to synthesize back the stereo image from it at the decoding end. The codec has been implemented in MATLAB with use of the existing EVS codec. An important part of the thesis is dedicated to the description of the implementation of a robust downmixing technique. The remaining parts present the parametric coding architecture that has been adapted and used to develop the EVS stereo extension at 24.4 and 32 kbit/s and other open researches that have been conducted for more specific situ- ations such as spatial coding for stereo or binaural applications. Whereas the downmixing algorithm quality has been confronted to subjective testing and proven to be more efficient than any other existing techniques, the stereo extension has been tested less extensively. -
Automatic Transcription of Bass Guitar Tracks Applied for Music Genre Classification and Sound Synthesis
Automatic Transcription of Bass Guitar Tracks applied for Music Genre Classification and Sound Synthesis Dissertation zur Erlangung des akademischen Grades Doktoringenieur (Dr.-Ing.) vorlelegt der Fakultät für Elektrotechnik und Informationstechnik der Technischen Universität Ilmenau von Dipl.-Ing. Jakob Abeßer geboren am 3. Mai 1983 in Jena Gutachter: Prof. Dr.-Ing. Gerald Schuller Prof. Dr. Meinard Müller Dr. Tech. Anssi Klapuri Tag der Einreichung: 05.12.2013 Tag der wissenschaftlichen Aussprache: 18.09.2014 urn:nbn:de:gbv:ilm1-2014000294 ii Acknowledgments I am grateful to many people who supported me in the last five years during the preparation of this thesis. First of all, I would like to thank Prof. Dr.-Ing. Gerald Schuller for being my supervisor and for the inspiring discussions that improved my understanding of good scientific practice. My gratitude also goes to Prof. Dr. Meinard Müller and Dr. Anssi Klapuri for being available as reviewers. Thank you for the valuable comments that helped me to improve my thesis. I would like to thank my former and current colleagues and fellow PhD students at the Semantic Music Technologies Group at the Fraunhofer IDMT for the very pleasant and motivating working atmosphere. Thank you Christian, Holger, Sascha, Hanna, Patrick, Christof, Daniel, Anna, and especially Alex and Estefanía for all the tea-time conversations, discussions, last-minute proof readings, and assistance of any kind. Thank you Paul for providing your musicological expertise and perspective in the genre classification experiments. I also thank Prof. Petri Toiviainen, Dr. Olivier Lartillot, and all the collegues at the Finnish Centre of Excellence in Interdisciplinary Music Research at the University of Jyväskylä for a very inspiring research stay in 2010. -
Advances in Perceptual Stereo Audio Coding Using Linear Prediction Techniques
Advances in Perceptual Stereo Audio Coding Using Linear Prediction Techniques PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op gezag van de Rector Magnificus, prof.dr.ir. C.J. van Duijn, voor een commissie aangewezen door het College voor Promoties in het openbaar te verdedigen op dinsdag 15 mei 2007 om 16.00 uur door Arijit Biswas geboren te Calcutta, India Dit proefschrift is goedgekeurd door de promotoren: prof.dr. R.J. Sluijter en prof.Dr. A.G. Kohlrausch Copromotor: dr.ir. A.C. den Brinker °c Arijit Biswas, 2007. All rights are reserved. Reproduction in whole or in part is prohibited without the written consent of the copyright owner. The work described in this thesis has been carried out under the auspices of Philips Research Europe - Eindhoven, The Netherlands. CIP-DATA LIBRARY TECHNISCHE UNIVERSITEIT EINDHOVEN Biswas, Arijit Advances in perceptual stereo audio coding using linear prediction techniques / by Arijit Biswas. - Eindhoven : Technische Universiteit Eindhoven, 2007. Proefschrift. - ISBN 978-90-386-2023-7 NUR 959 Trefw.: signaalcodering / signaalverwerking / digitale geluidstechniek / spraakcodering. Subject headings: audio coding / linear predictive coding / signal processing / speech coding. Samenstelling promotiecommissie: prof.dr. R.J. Sluijter, promotor Technische Universiteit Eindhoven, The Netherlands prof.Dr. A.G. Kohlrausch, promotor Technische Universiteit Eindhoven, The Netherlands dr.ir. A.C. den Brinker, copromotor Philips Research Europe - Eindhoven, The Netherlands prof.dr. S.H. Jensen, extern lid Aalborg Universitet, Denmark Dr.-Ing. G.D.T. Schuller, extern lid Fraunhofer Institute for Digital Media Technology, Ilmenau, Germany dr.ir. S.J.L. van Eijndhoven, lid TU/e Technische Universiteit Eindhoven, The Netherlands prof.dr.ir. -
Expressive Sampling Synthesis
i THESE` DE DOCTORAT DE l’UNIVERSITE´ PIERRE ET MARIE CURIE Ecole´ doctorale Informatique, T´el´ecommunications et Electronique´ (EDITE) Expressive Sampling Synthesis Learning Extended Source–Filter Models from Instrument Sound Databases for Expressive Sample Manipulations Presented by Henrik Hahn September 2015 Submitted in partial fulfillment of the requirements for the degree of DOCTEUR de l’UNIVERSITE´ PIERRE ET MARIE CURIE Supervisor Axel R¨obel Analysis/Synthesis group, IRCAM, Paris, France Reviewer Xavier Serra MTG, Universitat Pompeu Fabra, Barcelona, Spain Vesa V¨alim¨aki Aalto University, Espoo, Finland Examiner Sylvain Marchand Universit´ede La Rochelle, France Bertrand David T´el´ecom ParisTech, Paris, France Jean-Luc Zarader UPMC Paris VI, Paris, France This page is intentionally left blank. Abstract This thesis addresses imitative digital sound synthesis of acoustically viable in- struments with support of expressive, high-level control parameters. A general model is provided for quasi-harmonic instruments that reacts coherently with its acoustical equivalent when control parameters are varied. The approach builds upon recording-based methods and uses signal trans- formation techniques to manipulate instrument sound signals in a manner that resembles the behavior of their acoustical equivalents using the fundamental control parameters intensity and pitch. The method preserves the inherent quality of discretized recordings of a sound of acoustic instruments and in- troduces a transformation method that retains the coherency with its timbral variations when control parameters are modified. It is thus meant to introduce parametric control for sampling sound synthesis. The objective of this thesis is to introduce a new general model repre- senting the timbre variations of quasi-harmonic music instruments regarding a parameter space determined by the control parameters pitch as well as global and instantaneous intensity. -
NTT Technical Review, Vol. 14, No. 11, Nov. 2016
Feature Articles: Basic Research Envisioning Future Communication Transmission of High-quality Sound via Networks Using Speech/Audio Codecs Yutaka Kamamoto, Takehiro Moriya, and Noboru Harada Abstract This article describes two recent advances in speech and audio codecs. One is EVS (Enhanced Voice Service), the new standard by 3GPP (3rd Generation Partnership Project) for speech codecs, which is capable of transmitting speech signals, music, and even the ambient sound on the speaker’s side. This codec has been adopted in a new VoLTE (voice over Long-Term Evolution) service with enhanced high- definition voice (HD+), which provides us with clearer and more natural conversations than conven- tional telephony services such as with fixed-line/land-line and 3G mobile phones. The other is MPEG-4 Audio Lossless Coding (ALS) standardized by the Moving Picture Experts Group (MPEG), which makes it possible to transmit studio-quality audio content to the home. ALS is expected to be used by some broadcasters, including IPTV (Internet protocol television) companies, in their broadcasts in the near future. Keywords: audio/speech coding, data compression, international standards 1. Introduction speech codecs use a human phonation model, so they are not suitable for music. When clean speech items Many audio and speech codecs are available, and are coded by audio codecs at low bit rates, we get the we can select the most suitable one for different usage impression that a machine is talking. Speech and scenarios ranging from those requiring reasonable audio compression schemes have these kinds of quality with low bit rates to ones demanding original trade-offs.