Additive Noise Detection and Its Application to Audio Forensics
Total Page:16
File Type:pdf, Size:1020Kb
Additive Noise Detection and Its Application To Audio Forensics Rui Yang∗ ∗School of Information Management, Sun Yat-sen University, Guangzhou, R.P.C. E-mail: [email protected] Abstract—Digital audio recordings can be manipulated by focus on forgery detection[1], [2], [3], recorder identification pervasive audio editing software easily. Often forgery would not [4], [5], reverberation[6] and compression history analysis[7], be naive splicing. Post-processing would be a part of tampering. [8], but there are no work about post-processing detection on Post-processing can eliminate the obvious traces of forgery. Noise can cover audible evidence of forgery and destroy traces of digital audio. Work about detection of additive noise on audio other tampering operations. The detection of additive noise in is also not reported. However, in the research area of image audio signal is a useful tool for audio forensics. In this paper, forensics, lots of work about post-processing detection have we investigate the effect of additive noise on audio signal, and been reported, such as detection of filtering[9], detection of propose a feature named ”sign change rate” for detecting additive sharpen[10]. Since adding noise is not a good way to hide the noise. Via theoretical analyze and extensive experiments, it shows the proposed feature is effective in additive noise detection. Also forgery trace of image, detection of additive noise on image is the method can be a potential tool for forgery localization of of small value in digital image forensics. The case is different digital audio. for audio, since weak noise is inherent in audio recording, and it doesn’t influence the perceptual quality of audio much. I. INTRODUCTION In this paper we focus on additive noise. We propose an Digital audio forensics has recently become a widely stud- additive noise detection method for audio signal. The key idea ied stream of research in multimedia security. Often audio of the proposed method is that: if the audio signal is processed forgery would not be naive splicing, post-processing would twice, the modification introduced by the second process is be performed after tampering, otherwise there will be audible less than the first process. Adding noise is one kind of this trace of forgery. Adding noise is a common post-processing process. We introduce a feature named ”sign change rate” to after audio forgery. Frequently-used audio editing software, measure the modification. such as CoolEdit, GoldWave, Audacity, always has a function The rest of this paper is organized as follows. In Section of adding noise. Shown as Fig.1, the widely used software 2, we investigate the effect of additive noise on audio signal, Audacity contains an adding white noise function. Nowadays and show how it eliminate the traces of forgery. Then we pro- even the user without any knowledge of audio processing can pose the sign-change-rate feature and additive noise detection perform adding weak noise to audio recording. The perceptual method in Section 3. The experimental results are shown in quality of the audio is almost not degraded after adding the Section 4. Finally, we conclude our paper with a discussion weak noise. Additive noise may be applied to audio not and future work in Section 5. only to cover audible evidence of forgery, but also in an attempt to destroy traces of other tampering operations. Thus II. EFFECT OF ADDITIVE NOISE the detection of additive noise in audio signal is certainly The forgery trace of audio signal is easily covered by weak significant for the authenticity of the audio and its content. noise. Figure 2 shows an example of adding noise to cover evidence of forgery. Several samples of the original audio are removed, then an obvious change appears at the splicing point. After adding weak noise on the samples around the splicing point, the splicing point is not perceptual again, no matter listing or viewing the waveform. Without a reference signal, it is very difficult to determine speech signal with additive noise or not from the waveform. Since the speech signal is short-time stationary, the values of neighbor samples have a small fluctuation. After adding weak noise, the neighbor samples of speech will overlay with different values, and this will enlarge the difference between Fig. 1. Adding white noise is a function of Audacity neighbor samples. It means that the variance of the differential signal will become larger after adding noise. As shown in Nowadays the reported work about digital audio forensics Fig.3, the first column shows the differential signal of original 978-616-361-823-8 © 2014 APSIPA APSIPA 2014 of differential signal of original speech and the noise version, respectively, then observe the sign of dot product between differential signal and differential signal with active noise. As shown in Fig.4, the sign change rate of original speech is obviously larger than that of noise version. (a) Differential signal of x 0.2 Δ x Δ y 0.1 sign(Δ x* Δy) 0 −0.1 −0.2 0 10 20 30 40 50 60 70 80 90 100 (b) Differential signal of x’ 0.2 Δ x’ Δ y’ 0.1 sign(Δ x’* Δy’) Fig. 2. Example of adding noise to cover evidence of forgery. 0 −0.1 −0.2 0 10 20 30 40 50 60 70 80 90 100 speech and speech with additive noise. In order to detect additive noise in speech without reference signal, we actively add white noise to two kinds of speech, and investigate the Fig. 4. Sign change of audio signal after adding noise, the case of original effect of white noise on the differential signal, as shown in the speech is at the top, and the case of noise version is at the bottom. second column of Fig.3. Since the additive white noise would flip the sign of some value of differential signal, we perform B. Theoretical Proof dot product between differential signal and the noise version. Due to the variation of speech signal, however, theoretical Then we find that there are much more negative samples in analysis of the general relation between the speech and its the dot product for the original speech, as shown in the fourth noise version is highly non-trivial. For this reason, it is often column of Fig.3. Obviously, few samples changing sign after assumed that the input speech samples are i.i.d. We denote the actively additive noise would be a very strong indication of differential signal of x and x + n2 as y1 and y2, respectively. previously adding noise. In the next section, we will introduce That is y1 =Δ(x) and y2 =Δ(x + n2). The case of signal sign change rate to measure the influence by adding noise. without additive noise: 2 2 y y −3 y *y −3 sort(y *y ) 1 2 x 10 1 2 x 10 1 2 x ∼ N(0,σ ) ⇒ y ∼ N(0, 2σ ) 0.1 0.1 20 20 0 1 0 (1) 15 15 0.05 0.05 x + n ∼ N(0,σ2 + σ2) ⇒ y ∼ N(0, 2σ2 +2σ2) 10 10 2 0 2 2 0 2 (2) 0 0 5 5 −0.05 −0.05 x + n x + n + n 0 0 We denote the differential signal of 1 and 1 2 −0.1 −0.1 −5 −5 y y y =Δ(x + n ) y = 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 as 3 and 4, respectively. That is 3 1 and 4 Δ(x + n + n ) y y −3 y *y −3 sort(y *y ) 3 4 x 10 3 4 x 10 3 4 1 2 . The case of signal with additive noise: 0.1 0.1 20 20 15 15 2 2 2 2 0.05 0.05 x + n1 ∼ N(0,σ0 + σ1) ⇒ y3 ∼ N(0, 2σ0 +2σ1) (3) 10 10 0 0 5 5 x + n + n ∼ N(0,σ2 + σ2 + σ2)⇒y ∼N(0, 2σ2 +2σ2 +2σ2) −0.05 −0.05 1 2 0 1 2 4 0 1 2 0 0 −0.1 −0.1 −5 −5 (4) 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 +∞ 0 0 +∞ E(θ1)= p(y1,y2)dy2dy1 + p(y1,y2)dy2dy1 0 −∞ −∞ 0 Fig. 3. Effect of additive noise on differential signal, the case of original (5) 2 2 speech is at the top, and the case of noise version is at the bottom. 1 −y1 −(y2 − y1) p(y1,y2)= exp( + ) (6) 2 2 2σ2 2σ2 2π (2σ0 )(2σ2 ) 0 2 III. SIGN CHANGE RATE AND PROPOSED METHOD 1 arctan σ2/2σ2 E(θ )= − 0 2 A. Sign change rate 1 2 π (7) Given a sequence X of length L and its processed version +∞ 0 0 +∞ Y = f(X), the number of sign change K is the number of E(θ2)= p(y3,y4)dy4dy3 + p(y3,y4)dy4dy3 (8) 0 −∞ −∞ 0 element in {i|xi ∗ yi < 0,i=1, ..., L}. The sign change rate θ θ = K/L θ 1 −y2 −(y − y )2 is defined as follow: . We use sign change rate p(y ,y )= exp( 3 + 4 3 ) 3 4 2 2 2 2σ2 2σ2 to measure the effect of additive noise on the audio signal. To 2π (2σ0 +2σ1)(2σ2 ) 0 2 illustrate the sign change rate, we randomly select 100 samples (9) 1 arctan (σ2 + σ2)/2σ2 0 1 2 data has a significant difference.