Temporal Derivative-Based Spectrum and Mel-Cepstrum Audio Steganalysis Qingzhong Liu, Andrew H

Total Page:16

File Type:pdf, Size:1020Kb

Temporal Derivative-Based Spectrum and Mel-Cepstrum Audio Steganalysis Qingzhong Liu, Andrew H IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 4, NO. 3, SEPTEMBER 2009 359 Temporal Derivative-Based Spectrum and Mel-Cepstrum Audio Steganalysis Qingzhong Liu, Andrew H. Sung, and Mengyu Qiao Abstract—To improve a recently developed mel-cepstrum audio the interbands of the discrete cosine transform (DCT) domains steganalysis method, we present in this paper a method based on and combined the expanded features and the polynomial fitting Fourier spectrum statistics and mel-cepstrum coefficients, derived of the histogram of the DCT coefficients, and successfully im- from the second-order derivative of the audio signal. Specifically, the statistics of the high-frequency spectrum and the mel-cepstrum proved the steganalysis performance in multiple JPEG images coefficients of the second-order derivative are extracted for use [4]. Other works on image steganalysis have been done by in detecting audio steganography. We also design a wavelet-based Fridrich [5], Pevny and Fridrich [6], Lyu and Farid [7], Liu and spectrum and mel-cepstrum audio steganalysis. By applying sup- Sung [8], and Liu et al. [9]–[11]. port vector machines to these features, unadulterated carrier sig- Due to different characteristics of audio signals and images, nals (without hidden data) and the steganograms (carrying covert data) are successfully discriminated. Experimental results show methods developed for image steganalysis are not directly that proposed derivative-based and wavelet-based approaches re- suitable for detecting information hiding in audio streams, and markably improve the detection accuracy. Between the two new many research groups have investigated audio steganalysis. methods, the derivative-based approach generally delivers a better Ru et al. presented a method by measuring the features be- performance. tween the signal under detection and a self-generated reference Index Terms—Audio, mel-cepstrum, second-order derivative, signal via linear predictive coding [12], [13], but the detection spectrum, steganalysis, support vector machine (SVM), wavelet. performance is poor. Avcibas designed a feature set of con- tent-independent distortion measures for classifier design [14]. Ozer et al. constructed a detector based on the characteristics I. INTRODUCTION of the denoised residuals of the audio file [15]. Johnson et al. set up a statistical model by building a linear basis that captures TEGANOGRAPHY is the art and science of hiding data in certain statistical properties of audio signals [16]. Craver et S digital media such as image, audio, and video files, etc. To al. employed cepstral analysis to estimate a stego-signal’s the contrary, steganalysis is the art and science of detecting the probability density function in audio signals [17]. Kraetzer and information-hiding behaviors in digital media. Dittmann recently proposed a mel-cepstrum-based analysis to In recent years, many steganalysis methods have been de- perform detection of embedded hidden messages [18], [19]. signed for detecting information-hiding in multiple steganog- By expanding the Markov approach proposed by Shi et al. for raphy systems. Most of these methods are focused on de- image steganalysis [3], Liu et al. designed expanded Markov tecting digital image steganography. For example, one of the features for audio steganalysis [20]. Additionally, Zeng et al. well-known detectors, histogram characteristic function center presented new algorithms to detect phase coding steganog- of mass (HCFCOM), was successful in detecting noise-adding raphy based on analysis of the phase discontinuities [21] and steganography [1]. Another well-known method is to construct to detect echo steganography based on statistical moments the high-order moment statistical model in the multiscale of peak frequency [22]. In all these methods, Kraetzer and decomposition using wavelet-like transform and then to apply Dittmann’s proposed mel-cepstrum audio analysis is particu- a learning classifier to the high-order feature set [2]. Shi et larly noticeable, because it is the first time that mel-frequency al. proposed a Markov-process-based approach to detect the cepstral coefficients (MFCCs), which are widely used in speech information-hiding behaviors in JPEG images [3]. Based on the recognition, are utilized for audio steganalysis. Markov approach, Liu et al. expanded the Markov features to In this paper, we propose an audio steganalysis method based on spectrum analysis and mel-cepstrum analysis of the second- Manuscript received December 04, 2008; revised May 04, 2009. First order derivative of audio signal. In spectrum analysis, the statis- published June 10, 2009; current version published August 14, 2009. This tics of the high-frequency spectrum of the second-order deriva- work was supported by Institute for Complex Additive Systems Analysis tive are extracted as spectrum features. To improve Kraetzer (ICASA), a research division of New Mexico Tech. The associate editor coordinating the review of this manuscript and approving it for publication was and Dittmann’s work [18], we design the features of mel-cep- Dr. Jessica J. Fridrich. strum coefficients that are derived from the second-order deriva- Q. Liu and A. H. Sung are with the Department of Computer Science and tive. Additionally, in comparison to the second-order derivative- Engineering and Institute for Complex Additive Systems Analysis, New Mexico Tech, Socorro, NM 87801 USA (e-mail: [email protected]; [email protected]). based approach, a wavelet-based spectrum and mel-cepstrum M. Qiao is with the Department of Computer Science and Engineering, New method is also designed. Support vector machines (SVMs) with Mexico Tech, Socorro, NM 87801 USA (e-mail: [email protected]). radial basis function (RBF) kernels [35] are employed to detect Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. and differentiate steganograms from innocent signals. Results Digital Object Identifier 10.1109/TIFS.2009.2024718 show that our derivative-based and wavelet-based methods are 1556-6013/$26.00 © 2009 IEEE Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply. 360 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 4, NO. 3, SEPTEMBER 2009 (5) (6) where and is the number of samples of the derivatives. We have (7) Assume that is the angle between the vectors and , then (8) Since is arbitrary, the expected value of is calculated as follows: Fig. 1. Example of edge detection using derivatives [23]. very promising and possess remarkable advantage over Kraetzer (9) and Dittmann’s work. The rest of the paper is organized as follows: Section II presents the second-order derivative for audio steganalysis Divide both sides by and the Fourier analysis; Section III introduces Kraetzer and Dittmann’s mel-cepstrum analysis and describes improved (10) mel-cepstrum methods; Section IV presents experiments, fol- lowed by discussion in Section V and conclusion in Section VI. Generally speaking, is far smaller than at II. TEMPORAL DERIVATIVE AND SPECTRUM ANALYSIS low-frequency and middle-frequency components, where In image processing, second-order derivative is widely em- the modification caused by the addition of hidden data is ployed for detecting isolated points, edges, etc. [23]. Fig. 1 negligible. However, the situation changes at high-frequency shows an example of edge detection by using second-order components. Digital audio signals are generally band-limited, derivative. With this in mind, we developed a scheme based on the power spectral density is zero or very close to zero above the second-order derivative for audio steganalysis, details of a certain finite frequency. On the other side, the error term which are described as follows. is assumed to be broadband; in such cases, the modification A digital audio signal is denoted as caused by the addition of hidden data is not negligible in . The second derivative of is , defined as high-frequency components. Assume an error to be a random signal with the expected value of zero. The spectrum is approximately depicted by a (1) Gaussian-like distribution [24]. The power is zero at the lowest frequency; as the frequency increases, the spectrum increases. The stego-signal is denoted , which is modeled by adding Fig. 2(a) shows a simulated error signal, consisting of 25% for a noise or error signal into the original signal 1s, 50% for 0 s, and 25% for 1 s. In this example, we assume the sampling rate is 1000 Hz. Fig. 2(b) is the spectrum distribu- (2) tion of second-order derivatives (only half the values are plotted due to data symmetry). It demonstrates that the energy of the The second-order derivatives of error term and signal derivatives is concentrated in high frequency. are denoted as and , respectively. Thus, Regarding the second-order derivative, at the low and middle (3) frequency components, the power spectrum of an audio signal is normally much stronger than the power spectrum of the error The discrete Fourier transforms of , , and term caused by data hiding, in other words, is al- are denoted as , , and , respectively, most equal to zero, based on (10); the difference of the spectrum between a cover and the stego-signal is suppressed at low and (4) 1Available: http://mathworld.wolfram.com/FourierTransformGaussian.html Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply. LIU et al.: TEMPORAL DERIVATIVE-BASED SPECTRUM AND MEL-CEPSTRUM AUDIO STEGANALYSIS 361 Fig. 2. (a) Random error signal consisting of 25% for 1s, 50% for 0 s, and 25% for 1 s; (b) spectrum of the second-order derivative. Fig. 3. Spectra of the second derivatives of a cover signal (left) and the stego-signal (right). middle frequency components. However, the situation is very In comparing signal spectrum to derivative spectrum, we also different at the high-frequency components. As frequency in- observe that Fig.
Recommended publications
  • Cepstrum Analysis and Gearbox Fault Diagnosis
    Cepstrum Analysis and Gearbox Fault Diagnosis 13-150 CEPSTRUM ANALYSIS AND GEARBOX FAULT DIAGNOSIS by R.B. Randall B.Tech., B.A. INTRODUCTION Cepstrum Analysis is a tool for the detection of periodicity in a frequency spectrum, and seems so far to have been used mainly in speech analysis for voice pitch determination and related questions. (Refs. 1, 2) In that case the periodicity in the spectrum is given by the many harmonics of the fundamental voice frequency, but another form of periodicity which can also be detected by cepstrum analysis is the presence of sidebands spaced at equal intervals around one or a number of carrier frequencies. The presence of such sidebands is of interest in the analysis of gearbox vibration signals, since a number of faults tend to cause modulation of the vibration pattern resulting from tooth meshing, and this modulation (either amplitude or frequency modulation) gives rise to sidebands in the frequency spectrum. The sidebands are grouped around the tooth-meshing frequency and its harmonics, spaced at multiples of the modulating frequencies (Refs. 3, 4, 5), and determination of these modulation frequencies can be very useful in diagnosis of the fault. The cepstrum is defined (Refs. 6, 1) as the power spectrum of the logarithmic power spectrum (i.e. in dB amplitude form), and is thus related to the autocorrelation function, which can be obtained by inverse Fourier transformation of the power spectrum with linear ordinates. The advantages of using the cepstrum instead of the autocorrelation function in certain circumstances can be inter­ preted in two different ways.
    [Show full text]
  • A Thesis Entitled Nocturnal Bird Call Recognition System for Wind Farm
    A Thesis entitled Nocturnal Bird Call Recognition System for Wind Farm Applications by Selin A. Bastas Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Electrical Engineering _______________________________________ Dr. Mohsin M. Jamali, Committee Chair _______________________________________ Dr. Junghwan Kim, Committee Member _______________________________________ Dr. Sonmez Sahutoglu, Committee Member _______________________________________ Dr. Patricia R. Komuniecki, Dean College of Graduate Studies The University of Toledo December 2011 Copyright 2011, Selin A. Bastas. This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author. An Abstract of Nocturnal Bird Call Recognition System for Wind Farm Applications by Selin A. Bastas Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Electrical Engineering The University of Toledo December 2011 Interaction of birds with wind turbines has become an important public policy issue. Acoustic monitoring of birds in the vicinity of wind turbines can address this important public policy issue. The identification of nocturnal bird flight calls is also important for various applications such as ornithological studies and acoustic monitoring to prevent the negative effects of wind farms, human made structures and devices on birds. Wind turbines may have negative impact on bird population. Therefore, the development of an acoustic monitoring system is critical for the study of bird behavior. This work can be employed by wildlife biologist for developing mitigation techniques for both on- shore/off-shore wind farm applications and to address bird strike issues at airports.
    [Show full text]
  • Cepstral Peak Prominence: a Comprehensive Analysis
    Cepstral peak prominence: A comprehensive analysis Ruben Fraile*, Juan Ignacio Godino-Llorente Circuits & Systems Engineering Department, ETSIS Telecomunicacion, Universidad Politecnica de Madrid, Campus Sur, Carretera de Valencia Km.7, 28031 Madrid, Spain ABSTRACT An analytical study of cepstral peak prominence (CPP) is presented, intended to provide an insight into its meaning and relation with voice perturbation parameters. To carry out this analysis, a parametric approach is adopted in which voice production is modelled using the traditional source-filter model and the first cepstral peak is assumed to have Gaussian shape. It is concluded that the meaning of CPP is very similar to that of the first rahmonic and some insights are provided on its dependence with fundamental frequency and vocal tract resonances. It is further shown that CPP integrates measures of voice waveform and periodicity perturbations, be them either amplitude, frequency or noise. 1. Introduction Merk et al. proposed the variability in CPP to be used as a cue to detect neurogenic voice disorders [16], Rosa et al. reported that Cepstral peak prominence (CPP) is an acoustic measure of voice the combination of CPP with other acoustic cues is relevant for the quality that has been qualified as the most promising and perhaps detection of laryngeal disorders [17], Hartl et al. [18] [19] and Bal- robust acoustic measure of dysphonia severity [1]. Such a definite asubramanium et al. [20] concluded that patients suffering from statement made by Maryn et al is based on a meta-analysis that unilateral vocal fold paralysis exhibited significantly lower values considered previous results by Wolfe and Martin [2], Wolfe et al.
    [Show full text]
  • Exploiting the Symmetry of Integral Transforms for Featuring Anuran Calls
    S S symmetry Article Exploiting the Symmetry of Integral Transforms for Featuring Anuran Calls Amalia Luque 1,* , Jesús Gómez-Bellido 1, Alejandro Carrasco 2 and Julio Barbancho 3 1 Ingeniería del Diseño, Escuela Politécnica Superior, Universidad de Sevilla, 41004 Sevilla, Spain; [email protected] 2 Tecnología Electrónica, Escuela Ingeniería Informática, Universidad de Sevilla, 41004 Sevilla, Spain; [email protected] 3 Tecnología Electrónica, Escuela Politécnica Superior, Universidad de Sevilla, 41004 Sevilla, Spain; [email protected] * Correspondence: [email protected]; Tel.: +34-955-420-187 Received: 23 February 2019; Accepted: 18 March 2019; Published: 20 March 2019 Abstract: The application of machine learning techniques to sound signals requires the previous characterization of said signals. In many cases, their description is made using cepstral coefficients that represent the sound spectra. In this paper, the performance in obtaining cepstral coefficients by two integral transforms, Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT), are compared in the context of processing anuran calls. Due to the symmetry of sound spectra, it is shown that DCT clearly outperforms DFT, and decreases the error representing the spectrum by more than 30%. Additionally, it is demonstrated that DCT-based cepstral coefficients are less correlated than their DFT-based counterparts, which leads to a significant advantage for DCT-based cepstral coefficients if these features are later used in classification algorithms. Since the DCT superiority is based on the symmetry of sound spectra and not on any intrinsic advantage of the algorithm, the conclusions of this research can definitely be extrapolated to include any sound signal. Keywords: spectrum symmetry; DCT; MFCC; audio features; anuran calls 1.
    [Show full text]
  • The Articulatory and Acoustic Characteristics of Polish Sibilants and Their Consequences for Diachronic Change
    The articulatory and acoustic characteristics of Polish sibilants and their consequences for diachronic change Veronique´ Bukmaier & Jonathan Harrington Institute of Phonetics and Speech Processing, University of Munich, Germany [email protected] [email protected] The study is concerned with the relative synchronic stability of three contrastive sibilant fricatives /s ʂɕ/ in Polish. Tongue movement data were collected from nine first-language Polish speakers producing symmetrical real and non-word CVCV sequences in three vowel contexts. A Gaussian model was used to classify the sibilants from spectral information in the noise and from formant frequencies at vowel onset. The physiological analysis showed an almost complete separation between /s ʂɕ/ on tongue-tip parameters. The acoustic analysis showed that the greater energy at higher frequencies distinguished /s/ in the fricative noise from the other two sibilant categories. The most salient information at vowel onset was for /ɕ/, which also had a strong palatalizing effect on the following vowel. Whereas either the noise or vowel onset was largely sufficient for the identification of /s ɕ/ respectively, both sets of cues were necessary to separate /ʂ/ from /s ɕ/. The greater synchronic instability of /ʂ/ may derive from its high articulatory complexity coupled with its comparatively low acoustic salience. The data also suggest that the relatively late stage of /ʂ/ acquisition by children may come about because of the weak acoustic information in the vowel for its distinction from /s/. 1 Introduction While there have been many studies in the last 30 years on the acoustic (Evers, Reetz & Lahiri 1998, Jongman, Wayland & Wong 2000,Nowak2006, Shadle 2006, Cheon & Anderson 2008, Maniwa, Jongman & Wade 2009), perceptual (McGuire 2007, Cheon & Anderson 2008,Li et al.
    [Show full text]
  • A Processing Method for Pitch Smoothing Based on Autocorrelation and Cepstral F0 Detection Approaches
    A Processing Method for Pitch Smoothing Based on Autocorrelation and Cepstral F0 Detection Approaches *Xufang Zhao, Douglas O’Shaughnessy, Nguyen Minh-Quang Institut National de la Recherche Scientifique, Université du Québec Abstract-Chinese is known as a syllabic and tonal language and tone recognition plays an important role and provides very strong discriminative information for Chinese speech recognition [1]. Usually, the tone classification is based on the F0 (fundamental frequency) contours [2]. It is possible to infer a speaker's gender, age and emotion from his/her pitch, regardless of what is said. Meanwhile, the same sequence of words can convey very different meanings with variations in intonation. However, accurate pitch detection is difficult partly because tracked pitch contours are not ideal smooth curves. In this paper, we present a new smoothing algorithm for detected pitch contours. The effectiveness of the proposed method was shown using the HUB4-NE [3] natural speech corpus. Index Terms—Acoustic signal analysis, Acoustic signal detection I. INTRODUCTION OF PREVIOUS WORKS Traditionally, detailed tone features such as the entire pitch curve are used for tone recognition. Various pitch tracking (a) (b) algorithms based on different approaches have been developed Fig. 1: An illustration of triangular smoothing and proposed pitch smoothing for fast and reliable estimation of fundamental frequency. approaches Generally, pitch analyses are performed in the time domain, smoothing method, and the right column (b) is an example frequency domain, or a combination of both domains [4]. [5] using the proposed pitch smoothing method in this paper. In compared the performance of several pitch detection Fig.
    [Show full text]
  • Detection and Classification of Whale Acoustic Signals
    Detection and Classification of Whale Acoustic Signals by Yin Xian Department of Electrical and Computer Engineering Duke University Date: Approved: Loren Nolte, Supervisor Douglas Nowacek (Co-supervisor) Robert Calderbank (Co-supervisor) Xiaobai Sun Ingrid Daubechies Galen Reeves Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Electrical and Computer Engineering in the Graduate School of Duke University 2016 Abstract Detection and Classification of Whale Acoustic Signals by Yin Xian Department of Electrical and Computer Engineering Duke University Date: Approved: Loren Nolte, Supervisor Douglas Nowacek (Co-supervisor) Robert Calderbank (Co-supervisor) Xiaobai Sun Ingrid Daubechies Galen Reeves An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Electrical and Computer Engineering in the Graduate School of Duke University 2016 Copyright c 2016 by Yin Xian All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial Licence Abstract This dissertation focuses on two vital challenges in relation to whale acoustic signals: detection and classification. In detection, we evaluated the influence of the uncertain ocean environment on the spectrogram-based detector, and derived the likelihood ratio of the proposed Short Time Fourier Transform detector. Experimental results showed that the proposed detector outperforms detectors based on the spectrogram. The proposed detector is more sensitive to environmental changes because it includes phase information. In classification, our focus is on finding a robust and sparse representation of whale vocalizations. Because whale vocalizations can be modeled as polynomial phase sig- nals, we can represent the whale calls by their polynomial phase coefficients.
    [Show full text]
  • Speech Signal Representation
    Speech Signal Representation • Fourier Analysis – Discrete-time Fourier transform – Short-time Fourier transform – Discrete Fourier transform • Cepstral Analysis – The complex cepstrum and the cepstrum – Computational considerations – Cepstral analysis of speech – Applications to speech recognition – Mel-Frequency cepstral representation • Performance Comparison of Various Representations 6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 1 Discrete-Time Fourier Transform �+∞ jω −jωn X (e )= x[n]e n=−∞ � π 1 jω jωn x[n]=2π X (e )e dω −π � � �+∞ � � • � � ∞ Sufficient condition for convergence: � x[n]� < + n=−∞ • Although x[n] is discrete, X (ejω) is continuous and periodic with period 2π. • Convolution/multiplication duality: y[n]=x[ n] ∗ h[n] Y (ejω)=X ( ejω)H(ejω) y[n]=x[ n]w[n] � π jω 1 jθ j(ω−θ) Y (e )=2π W (e )X (e )dθ −π 6.345 Automatic Speech Recognition (2003) Speech Signal Representaion 2 Short-Time Fourier Analysis (Time-Dependent Fourier Transform) w [ 50 - m ] w [ 100 - m ] w [ 200 - m ] x [ m ] m 0 n = 50 n = 100 n = 200 ∞ �+ jω −jωm Xn(e )= w[n − m]x[m]e m=−∞ • If n is fixed, then it can be shown that: � π jω 1 jθ jθn j(ω+θ) Xn(e )= 2π W (e )e X (e )dθ −π • The above equation is meaningful only if we assume that X (ejω) represents the Fourier transform of a signal whose properties continue outside the window, or simply that the signal is zero outside the window. jω jω jω • In order for Xn(e ) to correspond to X (e ), W (e ) must resemble an impulse jω with respect to X (e ).
    [Show full text]
  • 'Specmurt Anasylis' of Multi-Pitch Signals
    ‘Specmurt Anasylis’ of Multi-Pitch Signals Shigeki Sagayama, Hirokazu Kameoka, Shoichiro Saito and Takuya Nishimoto The University of Tokyo, Hongo, Bunkyo-ku, Tokyo 113-8656 Japan E-mail: fsagayama,kameoka,saito,[email protected] Abstract— In this paper, we discuss a new concept of specmurt [8], [9], [10], [11], [12], [13]. Reliable determination of the to analyze multi-pitch signals. It is applied to polyphonic music number of sound sources is discussed only recently [14]. signal to extract fundamental frequencies, to produce a piano- As for spectral analysis, wavelet transform using the Gabor roll-like display, and to convert the saound into MIDI data. In contrast with cepstrum which is the inverse Fourier trans- function is one of the popular approach to derive short-time form of log-scaled power spectrum with linear frequency, spec- power spectrum of music signals along logarithmically-scaled murt is defined as the inverse Fourier transform of linear power frequency axis that appropriately suits the music pitch scaling. spectrum with log-scaled frequency. If all tones in a polyphonic Spectrogram, i.e., a 2-dimensional time-frequency display of sound have a common harmonic pattern, it can be regarded as the sequence of short-time spectra, however, is apparently a sum of frequency-stretched common harmonic structure. In the log-frequency domain, it is formulated as the convolution of messed up because of the existence of many overtones (i.e., the distribution density of fundamental frequencies of multiple tones harmonic components of multiple fundamental frequencies), and the common harmonic structure. The fundamental frequency that often prevents us from discovering music notes.
    [Show full text]
  • The Exit-Wave Power-Cepstrum Transform for Scanning Nanobeam Electron Diffraction: Robust Strain Mapping at Subnanometer Resolution and Subpicometer Precision
    The Exit-Wave Power-Cepstrum Transform for Scanning Nanobeam Electron Diffraction: Robust Strain Mapping at Subnanometer Resolution and Subpicometer Precision Elliot Padgett1, Megan E. Holtz1,2, Paul Cueva1, Yu-Tsun Shao1, Eric Langenberg2, Darrell G. Schlom2,3, David A. Muller1,3 1 School of Applied and Engineering Physics, Cornell University, Ithaca, New York 14853, United States 2 Department of Materials Science and Engineering, Cornell University, Ithaca, New York 14853, United States 3 Kavli Institute at Cornell for Nanoscale Science, Ithaca, New York 14853, United States Abstract Scanning nanobeam electron diffraction (NBED) with fast pixelated detectors is a valuable technique for rapid, spatially resolved mapping of lattice structure over a wide range of length scales. However, intensity variations caused by dynamical diffraction and sample mistilts can hinder the measurement of diffracted disk centers as necessary for quantification. Robust data processing techniques are needed to provide accurate and precise measurements for complex samples and non-ideal conditions. Here we present an approach to address these challenges using a transform, called the exit wave power cepstrum (EWPC), inspired by cepstral analysis in audio signal processing. The EWPC transforms NBED patterns into real-space patterns with sharp peaks corresponding to inter-atomic spacings. We describe a simple analytical model for interpretation of these patterns that cleanly decouples lattice information from the intensity variations in NBED patterns caused by tilt and thickness. By tracking the inter-atomic spacing peaks in EWPC patterns, strain mapping is demonstrated for two practical applications: mapping of ferroelectric domains in epitaxially strained PbTiO3 films and mapping of strain profiles in arbitrarily oriented core-shell Pt-Co nanoparticle fuel-cell catalysts.
    [Show full text]
  • Music Information Retrieval Using Social Tags and Audio
    A Survey of Audio-Based Music Classification and Annotation Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang IEEE Trans. on Multimedia, vol. 13, no. 2, April 2011 presenter : Yin-Tzu Lin (阿孜孜^.^) 2011/08 Types of Music Representation • Music Notation – Scores – Like text with formatting symbolic • Time-stamped events – E.g. Midi – Like unformatted text • Audio – E.g. CD, MP3 – Like speech 2 Image from: http://en.wikipedia.org/wiki/Graphic_notation Inspired by Prof. Shigeki Sagayama’s talk and Donald Byrd’s slide Intra-Song Info Retrieval Composition Arrangement probabilistic Music Theory Symbolic inverse problem Learning Score Transcription MIDI Conversion Accompaniment Melody Extraction Performer Structural Segmentation Synthesize Key Detection Chord Detection Rhythm Pattern Tempo/Beat Extraction Modified speed Onset Detection Modified timbre Modified pitch Separation Audio 3 Inspired by Prof. Shigeki Sagayama’s talk Inter-Song Info Retrieval • Generic-similar – Music Classification • Genre, Artist, Mood, Emotion… Music Database – Tag Classification(Music Annotation) – Recommendation • Specific-similar – Query by Singing/Humming – Cover Song Identification – Score Following 4 Classification Tasks • Genre Classification • Mood Classification • Artist Identification • Instrument Recognition • Music Annotation 5 Paper Outline • Audio Features – Low-level features – Middle-level features – Song-level feature representations • Classifiers Learning • Classification Task • Future Research Issues 6 Audio Features 7 Low-level Features
    [Show full text]
  • Study Paper for Timbre Identification in Sound
    International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 2 Issue 10, October - 2013 Study paper for Timbre identification in Sound Abhilasha Preety Goswami Prof. Makarand Velankar Cummins College of Cummins College of Cummins College of Engineering, Pune Engineering, Pune Engineering,Pune movement of the tongue along the horizontal and Abstract vertical axes. Well specified positions of the mouth Acoustically, all kind of sounds are similar, but they and the tongue can produce the standardized sounds are fundamentally different. This is partly because of the speech vowels, and are also capable of they encode information in fundamentally different imitating many natural sounds. ways. Timbre is the quality of sound which This paper attempts to analyze different methods for differentiates different sounds. These differences are timbre identification and detail study for two methods related to some interesting but difficult questions is done. MFCC (Mel Frequency Cepstrum about timbre. Study has been done to distinguish Coefficients) and Formant analysis are the two main between sounds of different musicals instruments and methods focused on in this paper. Comparative study other sounds. This requires analysis of fundamental of these methods throws more light on the possible properties of sound .We have done exploration about outcomes using them. We have also explored formant timbre identification methods used by researchers in analysis using tool PRAAT and its possible use for music information retrieval. We have observed the timbre identification. major method used by many as MFCC analysis. We have covered two methods i.e. MFCC and Formant 2. Properties of sounds for timbre analysis in more details and done comparison of them.
    [Show full text]