C12) United States Patent (IO) Patent No.: US 9,626,845 B2 Eagleman Et Al

Total Page:16

File Type:pdf, Size:1020Kb

C12) United States Patent (IO) Patent No.: US 9,626,845 B2 Eagleman Et Al IIIIII IIIIIIII Ill lllll lllll lllll lllll lllll lllll lllll lllll 111111111111111111 US009626845B2 c12) United States Patent (IO) Patent No.: US 9,626,845 B2 Eagleman et al. (45) Date of Patent: Apr. 18, 2017 (54) PROVIDING INFORMATION TO A USER (56) References Cited THROUGH SOMATOSENSORY FEEDBACK U.S. PATENT DOCUMENTS (71) Applicants:Baylor College of Medicine, Houston, 4,926,879 A * 5/1990 Sevrain . G08B 6/00 TX (US); William Marsh Rice 340/407.1 University, Houston, TX (US) 7,979,146 B2 * 7/2011 Ullrich .................... G06F 3/016 340/407.1 (72) Inventors: David M. Eagleman, Houston, TX (Continued) (US); Scott Novich, Dallas, TX (US) FOREIGN PATENT DOCUMENTS (73) Assignee: Baylor College of Medicine William Marsh Rice University, Houston, TX WO 2008106698 Al 9/2008 (US) WO 2012069429 Al 5/2012 ( *) Notice: Subject to any disclaimer, the term ofthis OTHER PUBLICATIONS patent is extended or adjusted under 35 U.S.C. 154(b) by O days. Plant, Geoff, "Training in the use of the Tactaid VII: A case study", KTH Computer Science and Communication (STL-QPSR), 1994, (21) Appl. No.: 14/750,626 vol. 35, No. 1, pp. 091-102. (22) Filed: Jun. 25, 2015 Primary Examiner - John A Twee!, Jr. (74) Attorney, Agent, or Firm - Jeffrey Schox; Ivan (65) Prior Publication Data Wong US 2016/0012688 Al Jan. 14, 2016 (57) ABSTRACT Related U.S. Application Data A hearing device may provide hearing-to-touch sensory (60) Provisional application No. 62/022,478, filed on Jul. substitution as a therapeutic approach to deafness. Through 9, 2014. use of signal processing on received signals, the hearing device may provide better accuracy with the hearing-to­ (51) Int. Cl. touch sensory substitution by extending beyond the simple H04B 3/36 (2006.01) filtering of an incoming audio stream as found in previous GOSB 6/00 (2006.01) tactile hearing aids. The signal processing may include low bitrate audio compression algorithms, such as linear predic­ (Continued) tive coding, mathematical transforms, such as Fourier trans­ (52) U.S. Cl. forms, and/or wavelet algorithms. The processed signals CPC ................ GOSB 6/00 (2013.01); A61F 11/04 may activate tactile interface devices that provide touch (2013.01); G06F 3/016 (2013.01); G09B sensation to a user. For example, the tactile interface devices 21/003 (2013.01) may be vibrating devices attached to a vest, which is worn (58) Field of Classification Search by the user. The vest may also provide other types of CPC ......... G08B 6/00; G06F 3/016; G09B 21/003; information to the user. A61F 11/04 (Continued) 28 Claims, 8 Drawing Sheets 100 \ ., -----i ,.,'' D U 1----: ' ' D ' ' ' ' :I '< .-1f)~ ' ' -\.o 104-,, : : ,,..,"' D ~~ ! ~......... -~' ' D!...1: : : ' ' ' ..... ! : ~----1' ' _, 0 ·~'' D:~ ~----- ___ __,,/ 1-\ ---- -.f-- -- 'I \ 108 108 US 9,626,845 B2 Page 2 (51) Int. Cl. G06F 3/01 (2006.01) G09B 21/00 (2006.01) A61F 11/04 (2006.01) ( 58) Field of Classification Search USPC ............................................ 340/407.1, 691.1 See application file for complete search history. (56) References Cited U.S. PATENT DOCUMENTS 8,068,025 B2 * 11/2011 Devenyi H04R 1/1041 340/407.1 8,754,757 Bl* 6/2014 Ullrich .................... G06F 3/016 340/407.1 9,019,087 B2 * 4/2015 Bakircioglu ............ G06F 3/016 340/407.1 9,298,260 B2 * 3/2016 Karaoguz ............... G06F 3/016 2007/0041600 Al 2/2007 Zachman 2011/0063208 Al * 3/2011 Van Den Eerenbeemd ........... G06F 3/016 340/407.1 2013/0218456 Al* 8/2013 Zelek . GOlC 21/00 701/411 * cited by examiner U.S. Patent Apr. 18, 2017 Sheet 1 of 8 US 9,626,845 B2 102 1(\Q:v \ II l'"'f J...... J :JD D:.: ., rs ;-----: '! j' :----: ! I D,~ ! I ·~ D ' ' ! I ' j ! ., ~, ' ,,.~-·106 j ! D,, '.... , ,,. j ! ,;u D ,~ ,.i ! ! I ' ! I ' , ____ J'; It ......................I :'! t D D.,i; I/G l 108 108 FIG. 1 202 ,.'1Qf1 ,, I ·100 @ ·10s ·10s FIG. 2 U.S. Patent Apr. 18, 2017 Sheet 2 of 8 US 9,626,845 B2 300 INPUT \ DEVICE ~302 SENSORY DATA PROCESSING MODULE ...................................................... .........., ...................................................................................................................................................................... """"\! • I I--, 310 SENSORY DATA r"':1VPllTA'vi.JI/I . TIO~ .~. ;Ai ·- RECORDING MODULE MODULE I TACT!LE INTERFACE CONTROL MODULE -----------------------~ DAT.A-FRAMES WITH CONTROL SIGNALS POWER TACTILE SUPPLY INTERFACE '-322 CONTROLLER I 326 TACTILE INTERFACE '-i08 DE\IICC~._1 ·:....v FIG. 3 U.S. Patent Apr. 18, 2017 Sheet 3 of 8 US 9,626,845 B2 COMPUTATIONAL MODULE ~------------------~~~-314 e,rAu::1~:N:AL TRANSFORMATION Dn, v -· '' --.......-i MODULE ·422 424 ENCOD!NG MODULE 426 TACTILE INTERFACE COMMANDS FIG. 4 U.S. Patent Apr. 18, 2017 Sheet 4 of 8 US 9,626,845 B2 500 \ BUFFER DATA SAMPLES FROM DATA SiGNAL / TRANSFORM DAT~ SAMPLES FROM TiME 504 DOMAIN TO SPACE DOMAIN / COMPRESS TRANSFORMED DATA SAMPLES / 506 TO REDUCE DiMENSIONAUTY OF SAMPLES MAP COMPRESSED DAT.A SAMPLES TO _,,./" TACTILE INTERFACE DEVICE ARRAY FIG. 5 600 \ /;1?8 602tl,_ ... ... 602N InhJm1 lll ll ! l I l J I l I l l.l ll f1 0 500 1000 i500 2000 2500 3000 3500 4000 FIG. 6 U.S. Patent Apr. 18, 2017 Sheet 5 of 8 US 9,626,845 B2 700 \ 7Q" RECEIVE DATA SET OF RECORDED SPEECH - I L i 0 1 1 PRv,.,vr:! CE ~i,~"l.' :\IMPi~(\NAL ~ ,~v V , sAHPi.., , V l. Esw o~, v-704 DATA FROM THE RECE!VED DATA SET ! 7Qg DETERMINE STATISTICAL CORRELATIONS V I V BETWEEN THE PRODUCED SAMPLES i FORM ACORRELATION MATRIX FROM THE DETERMINED STATISTICAL CORRELATIONS, 10 1 11 V 708 IN \A"vv!"h., ; H.r\!.• N E l.. EHE"TV !\! IN R·vvv\A' 1coL . '•i OF r Mt;,.. MATRIX IS ASTATISTICAL DEPENDANCY OF THE I-TH AND J-TH DIMENSION i PRODUCE AMAPPPING OF THE DIMENSIONS 710 INTO .A 2D SPACE BASED ON THE V 0vvf'lRR~LATION .. ~.. MATRIX,,., .. FIG. 7 U.S. Patent Apr. 18, 2017 Sheet 6 of 8 US 9,626,845 B2 800 \ COLLECT NSAMPLES OF BUFFERED DATA V 802 PERFORM DISCRETE COSINE TRANSFORM (DCT) ON EACH OF THEN SAMPLES OF BUFFERED V 804 DATA TO OBTAIN NCOEFFICIENT BINS ".,,vU:V;, 'PRE~-, w~ N; ' r,o~FF'"IEW.., t !v ; BIN";:i r~ORT' r: 'E 806 BUFFERED DATA TOY COEFFICIENT BINS V CORRESPONDING TOY TACTILE INTERFACE DEVICES GENERATE CONTROL SIGNALS FOR THE YTACTILE INTERFACE DEVICES BASED / 808 ON THEY COEFFICIENT BINS FIG. 8 U.S. Patent Apr. 18, 2017 Sheet 7 of 8 US 9,626,845 B2 900 \ "T'"T)~ .r\K COLLECT NSAMPLES OF BUFFERED DATA ./902 DERIVE ADIGITAL FINITE IMPULSE RESPONSE (FIR). , FILTER FOR EACH OF THEN SAMPLES HAVING COEFFICIENTS THAT APPROXIMATES THE BUFFERED ./904 DATA, :N WHICH THE COEFFICIENTS .ARE STATIC FOR THE DURAlON OF THE NSAMPLES 1 CONVERT THE FIR COEFFICIENTS INTO A FREQUENCY -DOMAIN REPRESENTATION OF ./906 LINE SPECTR~L PARS (LSPS) GENERATE CONTROL SIGNALS FOR YTACT!LE INTERFACE DEVICES 8,~SED ON ,~N LSP 908 FREQUENCY VALUE FOR APLURALITY OF / FREQUENCY LOCATIONS MAPPED TO THE YTACTILE INTERFACE DEVICES FIG. 9 U.S. Patent Apr. 18, 2017 Sheet 8 of 8 US 9,626,845 B2 <O:.:.:~ C-:> 0::: ~ LLl 1' ( .. ) }---· ::_;;;;n. ,1 54: ~--~~-···O ~ 0 =-~ C) LU (9 er.: ( LU c? l-- 0 CL l-- <t: >- a:::: (0 0 -.es:: LU (.'-! --"C <l:" •••• s l-- (L CL -~ ~ 2 (/) <i:: 0 =-,---· 0 j <.:::> 0 <i:: 0 ~ o::, ~ c? LLl (!) =· 0 tE a:::: =- (.0 LL 0::: LU LU l-- "---· }---· CL z <i:: =· ----· 0 0::: <i:: LU (.f) ::::i ~ 0 =c.-.">=· a:: =:) CL 0 US 9,626,845 B2 1 2 PROVIDING INFORMATION TO A USER of some of these band-passed channels, leading to aliasing THROUGH SOMATOSENSORY FEEDBACK noise. Thus, the touch representation of received audio sounds is inaccurate and also insufficiently accurate enough CROSS-REFERENCE TO OTHER PATENT to provide a substitute for hearing the audio sounds. Fur- APPLICATIONS 5 thermore, the limited bandwidth available for conveying information restricts the application of these prior sound-to­ This application claims the benefit of priority to U.S. touch substitution devices to only low-bandwidth audio Provisional Patent Application No. 62/022,478 to Eagleman applications, without the ability to convey high-throughput et al. filed on Jul. 9, 2014 and entitled "PROVIDING data. INFORMATION TO A USER THROUGH SOMATOSEN- 10 SORY FEEDBACK," which is incorporated by reference SUMMARY herein. A somatosensation feedback device may provide infor­ GOVERNMENT LICENSE RIGHTS mation to a user by transforming and/or compressing the 15 information and mapping the information to an array of This invention was made with govermnent support under devices in the somatosensation feedback device. In one Grant #5T32EB006350-05 awarded by National Institutes embodiment, the somatosensation feedback device may be a of Health (NIH). The govermnent has certain rights in the wearable item or embedded in wearable clothing, such as a invention. vest. The vest may include an array of feedback devices that 20 provide somatosensation feedback, such as vibration or FIELD OF THE DISCLOSURE motion. The vibration or motion may convey the trans­ formed and/or compressed information to the user. The instant disclosure relates to providing information In one embodiment, the vest may be used as a hearing through touch sensation. More specifically, this disclosure device to provide audio-to-touch transformation of data. The relates to high-information rate substitution devices (such 25 hearing device may provide hearing-to-touch sensory sub­ as, for hearing). stitution as a therapeutic approach to deafness. Through use of signal processing on received signals, the hearing device BACKGROUND may provide better accuracy with the hearing-to-touch sen­ sory substitution by extending beyond the simple filtering of There are at least 2 million functionally deaf individuals 30 an incoming audio stream as found in previous tactile in the United States, and an estimated 53 million worldwide.
Recommended publications
  • Relating Optical Speech to Speech Acoustics and Visual Speech Perception
    UNIVERSITY OF CALIFORNIA Los Angeles Relating Optical Speech to Speech Acoustics and Visual Speech Perception A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Jintao Jiang 2003 i © Copyright by Jintao Jiang 2003 ii The dissertation of Jintao Jiang is approved. Kung Yao Lieven Vandenberghe Patricia A. Keating Lynne E. Bernstein Abeer Alwan, Committee Chair University of California, Los Angeles 2003 ii Table of Contents Chapter 1. Introduction ............................................................................................... 1 1.1. Audio-Visual Speech Processing ............................................................................ 1 1.2. How to Examine the Relationship between Data Sets ............................................ 4 1.3. The Relationship between Articulatory Movements and Speech Acoustics........... 5 1.4. The Relationship between Visual Speech Perception and Physical Measures ....... 9 1.5. Outline of This Dissertation .................................................................................. 15 Chapter 2. Data Collection and Pre-Processing ...................................................... 17 2.1. Introduction ........................................................................................................... 17 2.2. Background ........................................................................................................... 18 2.3. Recording a Database for the Correlation Analysis.............................................
    [Show full text]
  • Improving Opus Low Bit Rate Quality with Neural Speech Synthesis
    Improving Opus Low Bit Rate Quality with Neural Speech Synthesis Jan Skoglund1, Jean-Marc Valin2∗ 1Google, San Francisco, CA, USA 2Amazon, Palo Alto, CA, USA [email protected], [email protected] Abstract learned representation set [11]. A typical WaveNet configura- The voice mode of the Opus audio coder can compress wide- tion requires a very high algorithmic complexity, in the order band speech at bit rates ranging from 6 kb/s to 40 kb/s. How- of hundreds of GFLOPS, along with a high memory usage to ever, Opus is at its core a waveform matching coder, and as the hold the millions of model parameters. Combined with the high rate drops below 10 kb/s, quality degrades quickly. As the rate latency, in the hundreds of milliseconds, this renders WaveNet reduces even further, parametric coders tend to perform better impractical for a real-time implementation. Replacing the di- than waveform coders. In this paper we propose a backward- lated convolutional networks with recurrent networks improved compatible way of improving low bit rate Opus quality by re- memory efficiency in SampleRNN [12], which was shown to be synthesizing speech from the decoded parameters. We compare useful for speech coding in [13]. WaveRNN [14] also demon- two different neural generative models, WaveNet and LPCNet. strated possibilities for synthesizing at lower complexities com- WaveNet is a powerful, high-complexity, and high-latency ar- pared to WaveNet. Even lower complexity and real-time opera- chitecture that is not feasible for a practical system, yet pro- tion was recently reported using LPCNet [15]. vides a best known achievable quality with generative models.
    [Show full text]
  • Lecture 16: Linear Prediction-Based Representations
    Return to Main LECTURE 16: LINEAR PREDICTION-BASED REPRESENTATIONS Objectives ● Objectives: System View: Analysis/Synthesis ❍ Introduce analysis/synthesis systems The Error Signal ❍ Discuss error analysis and gain-matching Gain Matching ❍ Relate linear prediction coefficients to other spectral representations Transformations: Levinson-Durbin ❍ Introduce reflection and prediction coefficent recursions Reflection Coefficients Transformations ❍ Lattice/ladder filter implementations The Burg Method There is a classic textbook on this subject: Summary: Signal Modeling J.D. Markel and A.H. Gray, Linear Prediction of Speech, Springer-Verlag, New Typical Front End York, New York, USA, ISBN: 0-13-007444-6, 1976. On-Line Resources: This lecture also includes material from two other textbooks: AJR: Linear Prediction LPC10E MAD LPC J. Deller, et. al., Discrete-Time Processing of Speech Signals, MacMillan Publishing Co., ISBN: 0-7803-5386-2, 2000. Filter Design and, L.R. Rabiner and B.W. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Upper Saddle River, New Jersey, USA, ISBN: 0-13-015157-2, 1993. Return to Main Introduction: 01: Organization (html, pdf) Speech Signals: ECE 8463: FUNDAMENTALS OF SPEECH 02: Production (html, pdf) RECOGNITION 03: Digital Models Professor Joseph Picone (html, pdf) Department of Electrical and Computer Engineering Mississippi State University 04: Perception (html, pdf) email: [email protected] phone/fax: 601-325-3149; office: 413 Simrall 05: Masking URL: http://www.isip.msstate.edu/resources/courses/ece_8463 (html, pdf) Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, 06: Phonetics and Phonology Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These (html, pdf) systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP).
    [Show full text]
  • Source-Channel Coding for CELP Speech Coders
    2G'S c5 Source-Channel Coding for CELP Speech Coders J.A. Asenstorfer B.Ma.Sc. (Hons),B.Sc.(Hons),8.8',M'E' University of Adelaide Department of Electrical and Electronic Engineering Thesis submitted for the Degree of Doctor of Philosophy. Septemb er 26, L994 A*ond p.] l'ìq t Contents 1 1 Speech Coders for NoisY Channels 1.1 Introduction 1 2 I.2 Thesis Goals and Historical Perspective 2 Linear Predictive Coders Background I I 2.1 Introduction ' 10 2.2 Linear Predictive Models . 2.3 Autocorrelation of SPeech 15 17 2.4 Generic Code Excited Linear Prediction 3 Pitch Estimation Based on Chaos Theory 23 23 3.1 Introduction . 3.1.1 A Novel Approach to Pitch Estimation 25 3.2 The Non-Linear DYnamical Model 26 29 3.3 Determining the DelaY 32 3.4 Speech Database 3.5 Initial Findings 32 3.5.1 Poincaré Sections 34 35 3.5.2 Poincaré Section in Time 39 3.6 SpeechClassifrcation 40 3.6.1 The Zero Set 4l 3.6.2 The Devil's Staircase 42 3.6.3 Orbit Direction Change Indicator 45 3.7 Voicing Estimator 47 3.8 Pitch Tracking 49 3.9 The Pitch Estimation Algorithm ' 51 3.10 Results from the Pitch Estimator 55 3.11 Waveform Substitution 58 3.11.1 Replication of the Last Pitch Period 59 3.1I.2 Unvoiced Substitution 60 3.11.3 SPlicing Coders 61 3.11.4 Memory Considerations for CELP 61 3.12 Conclusion Data Problem 63 4 Vector Quantiser and the Missing 63 4.I Introduction bÐ 4.2 WhY Use Vector Quantisation? a Training Set 66 4.2.1 Generation of a Vector Quantiser Using Results 68 4.2.2 Some Useful Optimal Vector Quantisation ,7r 4.3 Heuristic Argument ùn .
    [Show full text]
  • Samia Dawood Shakir
    MODELLING TALKING HUMAN FACES Thesis submitted to Cardiff University in candidature for the degree of Doctor of Philosophy. Samia Dawood Shakir School of Engineering Cardiff University 2019 ii DECLARATION This work has not been submitted in substance for any other degree or award at this or any other university or place of learning, nor is being submitted concurrently in candidature for any degree or other award. Signed . (candidate) Date . STATEMENT 1 This thesis is being submitted in partial fulfillment of the requirements for the degree of PhD. Signed . (candidate) Date . STATEMENT 2 This thesis is the result of my own independent work/investigation, except where otherwise stated, and the thesis has not been edited by a third party beyond what is permitted by Cardiff Universitys Policy on the Use of Third Party Editors by Research Degree Students. Other sources are acknowledged by explicit references. The views expressed are my own. Signed . (candidate) Date . STATEMENT 3 I hereby give consent for my thesis, if accepted, to be available online in the Universitys Open Access repository and for inter-library loan, and for the title and summary to be made available to outside organisations. Signed . (candidate) Date . ABSTRACT This thesis investigates a number of new approaches for visual speech synthesis using data-driven methods to implement a talking face. The main contributions in this thesis are the following. The ac- curacy of shared Gaussian process latent variable model (SGPLVM) built using the active appearance model (AAM) and relative spectral transform-perceptual linear prediction (RASTAPLP) features is im- proved by employing a more accurate AAM.
    [Show full text]
  • End-To-End Video-To-Speech Synthesis Using Generative Adversarial Networks
    1 End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis Member, IEEE Bjorn¨ W. Schuller Fellow, IEEE, Maja Pantic Fellow, IEEE Video-to-speech is the process of reconstructing the audio speech from a video of a spoken utterance. Previous approaches to this task have relied on a two-step process where an intermediate representation is inferred from the video, and is then decoded into waveform audio using a vocoder or a waveform reconstruction algorithm. In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm. Our model consists of an encoder-decoder architecture that receives raw video as input and generates speech, which is then fed to a waveform critic and a power critic. The use of an adversarial loss based on these two critics enables the direct synthesis of raw audio waveform and ensures its realism. In addition, the use of our three comparative losses helps establish direct correspondence between the generated audio and the input video. We show that this model is able to reconstruct speech with remarkable realism for constrained datasets such as GRID, and that it is the first end-to-end model to produce intelligible speech for LRW (Lip Reading in the Wild), featuring hundreds of speakers recorded entirely ‘in the wild’. We evaluate the generated samples in two different scenarios – seen and unseen speakers – using four objective metrics which measure the quality and intelligibility of artificial speech.
    [Show full text]
  • Linear Predictive Modelling of Speech - Constraints and Line Spectrum Pair Decomposition
    Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing Espoo 2004 Report 71 LINEAR PREDICTIVE MODELLING OF SPEECH - CONSTRAINTS AND LINE SPECTRUM PAIR DECOMPOSITION Tom Bäckström Dissertation for the degree of Doctor of Science in Technology to be presented with due permission for public examination and debate in Auditorium S4, Department of Electrical and Communications Engineering, Helsinki University of Technology, Espoo, Finland, on the 5th of March, 2004, at 12 o'clock noon. Helsinki University of Technology Department of Electrical and Communications Engineering Laboratory of Acoustics and Audio Signal Processing Teknillinen korkeakoulu Sähkö- ja tietoliikennetekniikan osasto Akustiikan ja äänenkäsittelytekniikan laboratorio Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing P.O. Box 3000 FIN-02015 HUT Tel. +358 9 4511 Fax +358 9 460 224 E-mail [email protected] ISBN 951-22-6946-5 ISSN 1456-6303 Otamedia Oy Espoo, Finland 2004 Let the floor be the limit! HELSINKI UNIVERSITY OF TECHNOLOGY ABSTRACT OF DOCTORAL DISSERTATION P.O. BOX 1000, FIN-02015 HUT http://www.hut.fi Author Tom Bäckström Name of the dissertation Linear predictive modelling for speech - Constraints and Line Spectrum Pair Decomposition Date of manuscript 6.2.2004 Date of the dissertation 5.3.2004 Monograph 4 Article dissertation (summary + original articles) Department Electrical and Communications Engineering Laboratory Laboratory Acoustics and Audio Signal Processing Field of research Speech processing Opponent(s) Associate Professor Peter Händel Supervisor Professor Paavo Alku (Instructor) Abstract In an exploration of the spectral modelling of speech, this thesis presents theory and applications of constrained linear predictive (LP) models.
    [Show full text]
  • Audio Processing and Loudness Estimation Algorithms with Ios Simulations
    Audio Processing and Loudness Estimation Algorithms with iOS Simulations by Girish Kalyanasundaram A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved September 2013 by the Graduate Supervisory Committee: Andreas Spanias, Chair Cihan Tepedelenlioglu Visar Berisha ARIZONA STATE UNIVERSITY December 2013 ABSTRACT The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP.” These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing.
    [Show full text]
  • Code Excited Linear Prediction with Multi-Pulse Codebooks
    CODE EXCITED LINEAR PREDICTION WIT& '.' MU~TI-PULSECODEBOOI~S __- --- - - .. .. I. Lei Zhang R.X-Sc.. Iiniversi~yof British / .A THESIS SUBMITTED IZE PARTIAL F1-LFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF in the School Engineering Science 6 I& Zhang 1997 f SIIIOS FKXSER liN1~"'ESITJ' - 3 A11 rights reserved. This work may not be - rcprodu<eed in. whofe or in part. by pltotocopg~ or other means. without the permission of the author. *-. - . T BiMiothqpmtmak ~ationalLibrary P 1*m of Canada du Cana ". l ?F.r Acquisitions and - Acquisitions 6t Bibliographig Services services bibliographiques 395 Wellingtpn Street 395, rue Wellington Ottawa OW KMON4 Onaw ON K1A ON4 Canada Cairada . ... " 4 I T 4 -. .9 Our fie Notre dfBrmce a - i * ' 1 ' C \ . -- , . 2 3 - 0 * I- *. - + * I The author has granted a non- L'auteur a accorde une licence non exclusive licence allowing the - exclusive permettant B la National Library of Canada to Bibliotheque nationale du ~aqa&de reproduce, loan, distribute or sell reprodwe, preter, dishher ou ' copies of this thesis in microform, vendre dks copies de cette these sous paper or electronic' formats. la fonne de microfiche/^, de reproduction sur papier ou sur format I elecfronique. *. F, I The author retains ownership of the L'guteur conserve la propriete du copyright in this thesis. Neither the droit d7autekqui protege cette these. thesis nor substantial extracts fiom it Ni la these ni des extraits substantiels may be printed or otherwise f #+ de celle-cipe doivent etre imprimes reproduced without the author's ou autrement reproduits sans son permission. autorisation.
    [Show full text]
  • A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds
    applied sciences Review A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds Francesc Alías *, Joan Claudi Socoró and Xavier Sevillano GTM - Grup de recerca en Tecnologies Mèdia, La Salle-Universitat Ramon Llull, Quatre Camins, 30, 08022 Barcelona, Spain; [email protected] (J.C.S.); [email protected] (X.S.) * Correspondence: [email protected]; Tel.: +34-93-290-24-40 Academic Editor: Vesa Välimäki Received: 15 March 2016; Accepted: 28 April 2016; Published: 12 May 2016 Abstract: Endowing machines with sensing capabilities similar to those of humans is a prevalent quest in engineering and computer science. In the pursuit of making computers sense their surroundings, a huge effort has been conducted to allow machines and computers to acquire, process, analyze and understand their environment in a human-like way. Focusing on the sense of hearing, the ability of computers to sense their acoustic environment as humans do goes by the name of machine hearing. To achieve this ambitious aim, the representation of the audio signal is of paramount importance. In this paper, we present an up-to-date review of the most relevant audio feature extraction techniques developed to analyze the most usual audio signals: speech, music and environmental sounds. Besides revisiting classic approaches for completeness, we include the latest advances in the field based on new domains of analysis together with novel bio-inspired proposals. These approaches are described following a taxonomy that organizes them according to their physical or perceptual basis, being subsequently divided depending on the domain of computation (time, frequency, wavelet, image-based, cepstral, or other domains).
    [Show full text]
  • CELP and Speech Enhancement Ian Mcloughlin
    CELP and Speech Enhancement Ian McLoughlin PhD Thesis School of Electronic and Electrical Engineering Faculty of Engineering University of Birmingham October 1997 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. Synopsis This thesis addresses the intelligibility enhancement of speech that is heard within an acoustically noisy environment. In particular, a realistic target situation of a police vehicle interior, with speech generated from a CELP (codebook-excited linear prediction) speech compression-based communication system, is adopted. The research has centred on the role of the CELP speech compression algorithm, and its transmission parameters. In particular, novel methods of LSP-based (line spectral pair) speech analysis and speech modification are developed and described. CELP parameters have been utilised in the analysis and processing stages of a speech intelligibility enhancement system to minimise additional computational complexity over existing CELP coder requirements. Details are given of the CELP analysis process and its effects on speech, the development of speech analysis and alteration algorithms coexisting with a CELP system, their effects and performance. Both objective and subjective tests have been used to characterize the effectiveness of the analysis and processing methods.
    [Show full text]
  • Harmonic Coding of Speech at Low Bit Rates
    HARMONIC CODING OF SPEECH AT LOW BIT RATES Peter Lupini B. A.Sc. University of British Columbia, 1985 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOROF PHILOSOPHY in the School of Engineering Science @ Peter Lupini 1995 SIMON FRASER UNIVERSITY September, 1995 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author. APPROVAL Name: Peter Lupini Degree: Doctor of Philosophy Title of thesis : Harmonic Coding of Speech at Low Bit Rates Examining Committee: Dr. J. Cavers, Chairman Dr. V. Cuperm* Senior Supervisor Dr. Paul K.M. fio Supervisor Dr. J. Vaisey Supervisor Dr. S. Hardy Internal Examiner Dr. K. Rose Professor, UCSB, External Examiner Date Approved: PARTIAL COPYRIGHT LICENSE I hereby grant to Simon Fraser University the right to lend my thesis, project or extended essay (the title of which is shown below) to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. I further agree that permission for multiple copying of this work for scholarly purposes may be granted by me or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without my written permission. Title of Thesis/Project/Extended Essay "Harmonic Coding of Speech at Low Bit Rates" Author: September 15.
    [Show full text]