A Visualization Scheme for Complex Spectrograms with Complete Amplitude and Phase Information

2016 3rd International Conference on Smart Materials and Nanotechnology in Engineering (SMNE 2016) ISBN: 978-1-60595-338-0 A Visualization Scheme for Complex Spectrograms with Complete Amplitude and Phase Information Caixia Zheng School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China. School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China. Guangyan Li School of Physics, Northeast Normal University, Changchun 130024, China Tingfa Xu Photoelectric Imaging and Information Engineering Institute, Beijing Institute of Technology, Beijing 100081, China Shili Liang School of Physics, Northeast Normal University, Changchun 130024, China Xiaolin Cao College of Automotive Engineering, Jilin University, Changchun 130025, China Shuangwei Wang * School of Physics, Northeast Normal University, Changchun 130024, China *Corresponding author: [email protected] ABSTRACT: We present a novel visualization scheme for complex spectrograms of speech and other non-stationary signals, in which both the amplitude and phase information is visualized simultaneously using the conventional RGB (red, green, blue) color model. The three-layer image matrix is constructed in such a way that the absolute values of the real and imaginary parts of the time-frequency analysis of speech are used to fill the first (red) and third (blue) layers, respectively, and an encoding scheme is adopted to represent both signs of the real and imaginary parts and the computed values based on this encoding scheme are used to fill the second (green) layer. The importance of phase in spectrogram applications has been gradually recognized, and imaging processing techniques are increasingly used in spectral analysis of non-stationary one-dimensional (1D) signals such as speech signal. The ability to present complete spectrogram information (both the amplitude and phase information) on a single RGB image is potentially useful, especially for those applications in which phase information plays a complementary role. 1 INTRODUCTION instrument called sound spectrograph for mapping a voice onto a graph (Koenig et al., 1946). The Spectrogram visualizes the spectrum of frequencies instrument was initially used to identify enemy in a sound or other signals as they vary with time voices on telephones and radios, and has gradually (Cohen, 1989). The most common format is a become an indispensable tool in speech analysis and gray-scale or pseudo-color two dimensional (2D) phonetics studies. image with vertical axis representing frequency and As a visual representation of sound, spectrogram horizontal axis representing time. The gray or color simultaneously displays the characteristics of voice scale of each point represents the amplitude of a information in the closely related time and particular frequency at a particular time. Sound frequency domains, and obviously, spectrogram is spectrogram was first introduced during World War compact and efficient, and shows more pattern II for military purposes. The Bell laboratories in information than a time-domain or 1941 successfully developed a voice visualization frequency-domain signal does alone. Speech 289 waveforms in the time-domain vary greatly between With the advances of modern sensor and digital utterances but the frequency-domain representations signal processing (DSP) technology, spectrograms usually show very consistent patterns. By tracking can be efficiently generated using short time Fourier the frequency content of a sound over time, speech transform (STFT), and this greatly expands the spectrogram additionally provides the characteristics concept of spectrogram. Firstly, the frequency range of speech as well as its speaker. Spectrogram that spectrograms represent is extended from audible reading involves many linguistic aspects such as sound to infrasonic and ultrasonic frequencies. articulatory and acoustic phonetics, recognition of Secondly, spectrogram analysis goes beyond speech various phonemes, and thorough understanding and and other audio signals. As a matter of fact, any knowledge of speech (Zue and Lamel, 1986). temporally or spatially varying signal can be Human beings excel at extracting features from analyzed from the spectrogram approach. images. With intensive training spectrogram experts Traditionally spectrograms have been widely used in can reach impressive recognition rates. In the past the development of the fields of audio, speech, few decades, researchers began using advanced music, sonar, radar, and so on. In addition, they have digital signal and image processing techniques for also found uses in other fields. Spectrograms as a removing noise and extracting useful features in complementary tool was used to study ultrashort spectrograms, and noticeable progresses have been pulse lasers (Kane and Trebino, 1993). With proper made in speech and speaker recognition. mapping of character strings to numerical sequences, Conventionally only peak amplitude or power is genomic sequences can be studied using a variety of required to construct speech spectrograms, and modern DSP techniques including color phase information has long been considered of little spectrograms (Anastassiou, 2001). Indeed, use and was often conveniently neglected in audio spectrograms are a useful tool for studying 1D and speech signal processing applications (Mowlaee non-stationary signals. et al., 2014) (Hegde et al., 2007; Oppenheim and The complex Fourier transform results can be Lim, 1981; Wang and Lim, 1982). Although phase represented either by phase/magnitude or separately information is usually not required to describe the by real and imaginary parts. To our best knowledge, time-frequency characteristics of a non-stationary there is no report in the literature that attempt to signal, it found uses in some situations that phase display the complete amplitude and phase information plays an important role. Both phase and information in a spectrogram. Xu et al in their power spectrograms were used as a diagnostics tool modified nonlocal means (NL-means) algorithm to detect cracks in vibrational beams, and the phase considered the real and imaginary parts of the information was also graphically presented in the complex speech spectrogram each as a separate study (Léonard, 2007). Three-dimensional (3D) image but with the sign (or partial phase) phase and frequency spectrograms were constructed information discarded (Xu et al., 2008). Such to analyze specific small slope linear chirp, phase spectrograms do not contain the complete jump and small frequency jump in musical signals information of the source signals. Because phase can (Navarro et al., 2008). Even in the traditional audio play a complementary role in a variety of and speech processing applications, the importance spectrogram applications, a proper way to preserve of phase information has been recognized. The and present phase information graphically in a usefulness of phase in human speech perception was spectrogram may be desired. examined to show that the phase spectrum contributes to speech intelligibility (Paliwal and Alsteris, 2003). A novel method for using phase 2 VISUALIZATION OF COMPLEX information was proposed to demonstrate the SPECTROGRAM potential of phase spectra in spectral subtractive enhancement applications (Kleinschmidt et al., We propose a visualization method to display both 2011). The discontinuity problems of phase was the amplitude and phase characteristics of the overcome and the phase information was used as an time-frequency analysis of speech signals on a same additional feature stream to achieve significant image using the conventional RGB color model. In improvement in audio retrieval and classification the proposed method, the complete amplitude and (Paraskevas and Chilton, 2004). The phase spectrum phase information is preserved and displayed information was used to improve speech recognition simultaneously. The absolute values of the real part performance (Schluter and Ney, 2001). These of the time-frequency analysis of speech signal fill examples illustrate that phase can play a the first (red) layer matrix; the absolute values of the complementary role in a broad range of spectrogram imaginary part fill the third (blue) layer matrix; the applications. signs of both the real and imaginary parts are In the early years, spectrograms were created combined and encoded to make a sign (green) using a series of bandpass analog filters, and such matrix. This three-layer image matrix is used to laborious process limited the use of spectrograms. drive the RGB color model for visualizing the 290 time-frequency analysis of speech signals. In this Xik= R ik+ jI ik (4) visualization scheme, a pair of unsigned complex . data in the time-frequency domain is uniquely i =1,2, ⋅⋅⋅⋅⋅⋅ ,N ; k = 1,2, ⋅⋅⋅⋅⋅⋅ , M mapped to a point in the red-blue (RB) color space and, the partial phase information, or the To properly display the data in the RGB color combination of the signs of both the real and space, the two parts of X need to be normalized. Let imaginary parts, is shown in the shades of green. d be the maximum absolute value of all elements in R and I, normalization of R and I is realized using 2.1 Frame size and window function the following formula, To conduct time-frequency analysis, the original 1 (5) R = abs( R), audio recording is divided into windowed frames of n d 10~20 ms long. A proper filter window is added before the STFT analysis of each frame (Schilling 1 (6) and

A Visualization Scheme for Complex Spectrograms with Complete Amplitude and Phase Information

Download Article (PDF)

Relations Between Speech Production and Speech Perception: Some Behavioral and Neurological Observations

Colored-Speech Synaesthesia Is Triggered by Multisensory, Not Unisensory, Perception Gary Bargary,1,2,3 Kylie J

Perception and Awareness in Phonological Processing: the Case of the Phoneme

Theories of Speech Perception

Effects of Language on Visual Perception

How Infant Speech Perception Contributes to Language Acquisition Judit Gervain* and Janet F

Speech Perception

Localising Foreign Accents in Speech Perception, Storage and Production

The Lexical Restructuring Hypothesis: Two Claims of PA Evaluated Michelle E

Prosody in the Production and Processing of L2 Spoken Language and Implications for Assessment

Speech Perception