ECE438 - Digital Signal Processing with Applications 1
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Enhance Your DSP Course with These Interesting Projects
AC 2012-3836: ENHANCE YOUR DSP COURSE WITH THESE INTER- ESTING PROJECTS Dr. Joseph P. Hoffbeck, University of Portland Joseph P. Hoffbeck is an Associate Professor of electrical engineering at the University of Portland in Portland, Ore. He has a Ph.D. from Purdue University, West Lafayette, Ind. He previously worked with digital cell phone systems at Lucent Technologies (formerly AT&T Bell Labs) in Whippany, N.J. His technical interests include communication systems, digital signal processing, and remote sensing. Page 25.566.1 Page c American Society for Engineering Education, 2012 Enhance your DSP Course with these Interesting Projects Abstract Students are often more interested learning technical material if they can see useful applications for it, and in digital signal processing (DSP) it is possible to develop homework assignments, projects, or lab exercises to show how the techniques can be used in realistic situations. This paper presents six simple, yet interesting projects that are used in the author’s undergraduate digital signal processing course with the objective of motivating the students to learn how to use the Fast Fourier Transform (FFT) and how to design digital filters. Four of the projects are based on the FFT, including a simple voice recognition algorithm that determines if an audio recording contains “yes” or “no”, a program to decode dual-tone multi-frequency (DTMF) signals, a project to determine which note is played by a musical instrument and if it is sharp or flat, and a project to check the claim that cars honk in the tone of F. Two of the projects involve designing filters to eliminate noise from audio recordings, including designing a lowpass filter to remove a truck backup beeper from a recording of an owl hooting and designing a highpass filter to remove jet engine noise from a recording of birds chirping. -
Speech Signal Basics
Speech Signal Basics Nimrod Peleg Updated: Feb. 2010 Course Objectives • To get familiar with: – Speech coding in general – Speech coding for communication (military, cellular) • Means: – Introduction the basics of speech processing – Presenting an overview of speech production and hearing systems – Focusing on speech coding: LPC codecs: • Basic principles • Discussing of related standards • Listening and reviewing different codecs. הנחיות לשימוש נכון בטלפון, פלשתינה-א"י, 1925 What is Speech ??? • Meaning: Text, Identity, Punctuation, Emotion –prosody: rhythm, pitch, intensity • Statistics: sampled speech is a string of numbers – Distribution (Gamma, Laplace) Quantization • Information Theory: statistical redundancy – ZIP, ARJ, compress, Shorten, Entropy coding… ...1001100010100110010... • Perceptual redundancy – Temporal masking , frequency masking (mp3, Dolby) • Speech production Model – Physical system: Linear Prediction The Speech Signal • Created at the Vocal cords, Travels through the Vocal tract, and Produced at speakers mouth • Gets to the listeners ear as a pressure wave • Non-Stationary, but can be divided to sound segments which have some common acoustic properties for a short time interval • Two Major classes: Vowels and Consonants Speech Production • A sound source excites a (vocal tract) filter – Voiced: Periodic source, created by vocal cords – UnVoiced: Aperiodic and noisy source •The Pitch is the fundamental frequency of the vocal cords vibration (also called F0) followed by 4-5 Formants (F1 -F5) at higher frequencies. -
Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks
sensors Article Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks Roneel V. Sharan * , Hao Xiong and Shlomo Berkovsky Australian Institute of Health Innovation, Macquarie University, Sydney, NSW 2109, Australia; [email protected] (H.X.); [email protected] (S.B.) * Correspondence: [email protected] Abstract: Audio signal classification finds various applications in detecting and monitoring health conditions in healthcare. Convolutional neural networks (CNN) have produced state-of-the-art results in image classification and are being increasingly used in other tasks, including signal classification. However, audio signal classification using CNN presents various challenges. In image classification tasks, raw images of equal dimensions can be used as a direct input to CNN. Raw time- domain signals, on the other hand, can be of varying dimensions. In addition, the temporal signal often has to be transformed to frequency-domain to reveal unique spectral characteristics, therefore requiring signal transformation. In this work, we overview and benchmark various audio signal representation techniques for classification using CNN, including approaches that deal with signals of different lengths and combine multiple representations to improve the classification accuracy. Hence, this work surfaces important empirical evidence that may guide future works deploying CNN for audio signal classification purposes. Citation: Sharan, R.V.; Xiong, H.; Berkovsky, S. Benchmarking Audio Keywords: convolutional neural networks; fusion; interpolation; machine learning; spectrogram; Signal Representation Techniques for time-frequency image Classification with Convolutional Neural Networks. Sensors 2021, 21, 3434. https://doi.org/10.3390/ s21103434 1. Introduction Sensing technologies find applications in detecting and monitoring health conditions. -
4 – Synthesis and Analysis of Complex Waves; Fourier Spectra
4 – Synthesis and Analysis of Complex Waves; Fourier spectra Complex waves Many physical systems (such as music instruments) allow existence of a particular set of standing waves with the frequency spectrum consisting of the fundamental (minimal allowed) frequency f1 and the overtones or harmonics fn that are multiples of f1, that is, fn = n f1, n=2,3,…The amplitudes of the harmonics An depend on how exactly the system is excited (plucking a guitar string at the middle or near an end). The sound emitted by the system and registered by our ears is thus a complex wave (or, more precisely, a complex oscillation ), or just signal, that has the general form nmax = π +ϕ = 1 W (t) ∑ An sin( 2 ntf n ), periodic with period T n=1 f1 Plots of W(t) that can look complicated are called wave forms . The maximal number of harmonics nmax is actually infinite but we can take a finite number of harmonics to synthesize a complex wave. ϕ The phases n influence the wave form a lot, visually, but the human ear is insensitive to them (the psychoacoustical Ohm‘s law). What the ear hears is the fundamental frequency (even if it is missing in the wave form, A1=0 !) and the amplitudes An that determine the sound quality or timbre . 1 Examples of synthesis of complex waves 1 1. Triangular wave: A = for n odd and zero for n even (plotted for f1=500) n n2 W W nmax =1, pure sinusoidal 1 nmax =3 1 0.5 0.5 t 0.001 0.002 0.003 0.004 t 0.001 0.002 0.003 0.004 £ 0.5 ¢ 0.5 £ 1 ¢ 1 W W n =11, cos ¤ sin 1 nmax =11 max 0.75 0.5 0.5 0.25 t 0.001 0.002 0.003 0.004 t 0.001 0.002 0.003 0.004 ¡ 0.5 0.25 0.5 ¡ 1 Acoustically equivalent 2 0.75 1 1. -
Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications
Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications Cosmin Munteanu Nikhil Sharma Abstract University of Toronto Google, Inc. Traditional interfaces are continuously being replaced Mississauga [email protected] by mobile, wearable, or pervasive interfaces. Yet when [email protected] Frank Rudzicz it comes to the input and output modalities enabling Pourang Irani Toronto Rehabilitation Institute our interactions, we have yet to fully embrace some of University of Manitoba University of Toronto the most natural forms of communication and [email protected] [email protected] information processing that humans possess: speech, Sharon Oviatt Randy Gomez language, gestures, thoughts. Very little HCI attention Incaa Designs Honda Research Institute has been dedicated to designing and developing spoken [email protected] [email protected] language and multimodal interaction techniques, Matthew Aylett Keisuke Nakamura especially for mobile and wearable devices. In addition CereProc Honda Research Institute to the enormous, recent engineering progress in [email protected] [email protected] processing such modalities, there is now sufficient Gerald Penn Kazuhiro Nakadai evidence that many real-life applications do not require University of Toronto Honda Research Institute 100% accuracy of processing multimodal input to be [email protected] [email protected] useful, particularly if such modalities complement each Shimei Pan other. This multidisciplinary, two-day workshop -
Lecture 1-10: Spectrograms
Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed that the signal is stationary: that it has constant quality. We need a new kind of analysis for dynamic signals. A suitable analogy is that we have spectral ‘snapshots’ when what we really want is a spectral ‘movie’. Just like a real movie is made up from a series of still frames, we can display the spectral properties of a changing signal through a series of spectral snapshots. 2. The spectrogram: a spectrogram is built from a sequence of spectra by stacking them together in time and by compressing the amplitude axis into a 'contour map' drawn in a grey scale. The final graph has time along the horizontal axis, frequency along the vertical axis, and the amplitude of the signal at any given time and frequency is shown as a grey level. Conventionally, black is used to signal the most energy, while white is used to signal the least. 3. Spectra of short and long waveform sections: we will get a different type of movie if we choose each of our snapshots to be of relatively short sections of the signal, as opposed to relatively long sections. This is because the spectrum of a section of speech signal that is less than one pitch period long will tend to show formant peaks; while the spectrum of a longer section encompassing several pitch periods will show the individual harmonics, see figure 1-10.1. -
Analysis of EEG Signal Processing Techniques Based on Spectrograms
ISSN 1870-4069 Analysis of EEG Signal Processing Techniques based on Spectrograms Ricardo Ramos-Aguilar, J. Arturo Olvera-López, Ivan Olmos-Pineda Benemérita Universidad Autónoma de Puebla, Faculty of Computer Science, Puebla, Mexico [email protected],{aolvera, iolmos}@cs.buap.mx Abstract. Current approaches for the processing and analysis of EEG signals consist mainly of three phases: preprocessing, feature extraction, and classifica- tion. The analysis of EEG signals is through different domains: time, frequency, or time-frequency; the former is the most common, while the latter shows com- petitive results, implementing different techniques with several advantages in analysis. This paper aims to present a general description of works and method- ologies of EEG signal analysis in time-frequency, using Short Time Fourier Transform (STFT) as a representation or a spectrogram to analyze EEG signals. Keywords. EEG signals, spectrogram, short time Fourier transform. 1 Introduction The human brain is one of the most complex organs in the human body. It is considered the center of the human nervous system and controls different organs and functions, such as the pumping of the heart, the secretion of glands, breathing, and internal tem- perature. Cells called neurons are the basic units of the brain, which send electrical signals to control the human body and can be measured using Electroencephalography (EEG). Electroencephalography measures the electrical activity of the brain by recording via electrodes placed either on the scalp or on the cortex. These time-varying records produce potential differences because of the electrical cerebral activity. The signal generated by this electrical activity is a complex random signal and is non-stationary [1]. -
Introduction to Digital Speech Processing Schafer Rabiner and Ronald W
SIGv1n1-2.qxd 11/20/2007 3:02 PM Page 1 FnT SIG 1:1-2 Foundations and Trends® in Signal Processing Introduction to Digital Speech Processing Processing to Digital Speech Introduction R. Lawrence W. Rabiner and Ronald Schafer 1:1-2 (2007) Lawrence R. Rabiner and Ronald W. Schafer Introduction to Digital Speech Processing highlights the central role of DSP techniques in modern speech communication research and applications. It presents a comprehensive overview of digital speech processing that ranges from the basic nature of the speech signal, through a variety of methods of representing speech in digital form, to applications in voice communication and automatic synthesis and recognition of speech. Introduction to Digital Introduction to Digital Speech Processing provides the reader with a practical introduction to Speech Processing the wide range of important concepts that comprise the field of digital speech processing. It serves as an invaluable reference for students embarking on speech research as well as the experienced researcher already working in the field, who can utilize the book as a Lawrence R. Rabiner and Ronald W. Schafer reference guide. This book is originally published as Foundations and Trends® in Signal Processing Volume 1 Issue 1-2 (2007), ISSN: 1932-8346. now now the essence of knowledge Introduction to Digital Speech Processing Introduction to Digital Speech Processing Lawrence R. Rabiner Rutgers University and University of California Santa Barbara USA [email protected] Ronald W. Schafer Hewlett-Packard Laboratories Palo Alto, CA USA Boston – Delft Foundations and Trends R in Signal Processing Published, sold and distributed by: now Publishers Inc. -
Improved Spectrograms Using the Discrete Fractional Fourier Transform
IMPROVED SPECTROGRAMS USING THE DISCRETE FRACTIONAL FOURIER TRANSFORM Oktay Agcaoglu, Balu Santhanam, and Majeed Hayat Department of Electrical and Computer Engineering University of New Mexico, Albuquerque, New Mexico 87131 oktay, [email protected], [email protected] ABSTRACT in this plane, while the FrFT can generate signal representations at The conventional spectrogram is a commonly employed, time- any angle of rotation in the plane [4]. The eigenfunctions of the FrFT frequency tool for stationary and sinusoidal signal analysis. How- are Hermite-Gauss functions, which result in a kernel composed of ever, it is unsuitable for general non-stationary signal analysis [1]. chirps. When The FrFT is applied on a chirp signal, an impulse in In recent work [2], a slanted spectrogram that is based on the the chirp rate-frequency plane is produced [5]. The coordinates of discrete Fractional Fourier transform was proposed for multicompo- this impulse gives us the center frequency and the chirp rate of the nent chirp analysis, when the components are harmonically related. signal. In this paper, we extend the slanted spectrogram framework to Discrete versions of the Fractional Fourier transform (DFrFT) non-harmonic chirp components using both piece-wise linear and have been developed by several researchers and the general form of polynomial fitted methods. Simulation results on synthetic chirps, the transform is given by [3]: SAR-related chirps, and natural signals such as the bat echolocation 2α 2α H Xα = W π x = VΛ π V x (1) and bird song signals, indicate that these generalized slanted spectro- grams provide sharper features when the chirps are not harmonically where W is a DFT matrix, V is a matrix of DFT eigenvectors and related. -
Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden
Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9932-9937, October 1995 Colloquium Paper This paper was presented at a colloquium entitled "Human-Machine Communication by Voice," organized by Lawrence R. Rabiner, held by the National Academy of Sciences at The Arnold and Mabel Beckman Center in Irvine, CA, February 8-9,1993. Models of speech synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden ABSTRACT The term "speech synthesis" has been used need large amounts of speech data. Models working close to for diverse technical approaches. In this paper, some of the the waveform are now typically making use of increased unit approaches used to generate synthetic speech in a text-to- sizes while still modeling prosody by rule. In the middle of the speech system are reviewed, and some of the basic motivations scale, "formant synthesis" is moving toward the articulatory for choosing one method over another are discussed. It is models by looking for "higher-level parameters" or to larger important to keep in mind, however, that speech synthesis prestored units. Articulatory synthesis, hampered by lack of models are needed not just for speech generation but to help data, still has some way to go but is yielding improved quality, us understand how speech is created, or even how articulation due mostly to advanced analysis-synthesis techniques. can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages Flexibility and Technical Dimensions are discussed as special challenges facing the speech synthesis community. -
Audio Processing Laboratory
AC 2010-1594: A GRADUATE LEVEL COURSE: AUDIO PROCESSING LABORATORY Buket Barkana, University of Bridgeport Page 15.35.1 Page © American Society for Engineering Education, 2010 A Graduate Level Course: Audio Processing Laboratory Abstract Audio signal processing is a part of the digital signal processing (DSP) field in science and engineering that has developed rapidly over the past years. Expertise in audio signal processing - including speech signal processing- is becoming increasingly important for working engineers from diverse disciplines. Audio signals are stored, processed, and transmitted using digital techniques. Creating these technologies requires engineers that understand real-time application issues in the related areas. With this motivation, we designed a graduate level laboratory course which is Audio Processing Laboratory in the electrical engineering department in our school two years ago. This paper presents the status of the audio processing laboratory course in our school. We report the challenges that we have faced during the development of this course in our school and discuss the instructor’s and students’ assessments and recommendations in this real-time signal-processing laboratory course. 1. Introduction Many DSP laboratory courses are developed for undergraduate and graduate level engineering education. These courses mostly focus on the general signal processing techniques such as quantization, filter design, Fast Fourier Transformation (FFT), and spectral analysis [1-3]. Because of the increasing popularity of Web-based education and the advancements in streaming media applications, several web-based DSP Laboratory courses have been designed for distance education [4-8]. An internet-based signal processing laboratory that provides hands-on learning experiences in distributed learning environments has been developed by Spanias et.al[6] . -
Attention, I'm Trying to Speak Cs224n Project: Speech Synthesis
Attention, I’m Trying to Speak CS224n Project: Speech Synthesis Akash Mahajan Management Science & Engineering Stanford University [email protected] Abstract We implement an end-to-end parametric text-to-speech synthesis model that pro- duces audio from a sequence of input characters, and demonstrate that it is possi- ble to build a convolutional sequence to sequence model with reasonably natural voice and pronunciation from scratch in well under $75. We observe training the attention to be a bottleneck and experiment with 2 modifications. We also note interesting model behavior and insights during our training process. Code for this project is available on: https://github.com/akashmjn/cs224n-gpu-that-talks. 1 Introduction We have come a long way from the ominous robotic sounding voices used in the Radiohead classic 1. If we are to build significant voice interfaces, we need to also have good feedback that can com- municate with us clearly, that can be setup cheaply for different languages and dialects. There has recently also been an increased interest in generative models for audio [6] that could have applica- tions beyond speech to music and other signal-like data. As discussed in [13] it is fascinating that is is possible at all for an end-to-end model to convert a highly ”compressed” source - text - into a substantially more ”decompressed” form - audio. This is a testament to the sheer power and flexibility of deep learning models, and has been an interesting and surprising insight in itself. Recently, results from Tachibana et. al. [11] reportedly produce reasonable-quality speech without requiring as large computational resources as Tacotron [13] and Wavenet [7].