applied sciences

Article A Novel Robust Audio Watermarking Algorithm by Modifying the Average Amplitude in Transform Domain

Qiuling Wu 1,2,* and Meng Wu 1 1 College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China; [email protected] 2 College of Zijin, Nanjing University of Science and Technology, Nanjing 210046, China * Correspondence: [email protected]; Tel.: +86-139-5103-2810

 Received: 18 April 2018; Accepted: 1 May 2018; Published: 4 May 2018 

Featured Application: This algorithm embeds a binary image into an audio signal as a marker to prove the ownership of this audio signal. With large payload capacity and strong robustness against common signal processing attacks, it can be used for copyright protection, broadcast monitoring, fingerprinting, data authentication, and medical safety.

Abstract: In order to improve the robustness and imperceptibility in practical application, a novel audio watermarking algorithm with strong robustness is proposed by exploring the multi-resolution characteristic of discrete transform (DWT) and the energy compaction capability of discrete cosine transform (DCT). The human auditory system is insensitive to the minor changes in the frequency components of the audio signal, so the watermarks can be embedded by slightly modifying the frequency components of the audio signal. The audio fragments segmented from the cover audio signal are decomposed by DWT to obtain several groups of wavelet coefficients with different frequency bands, and then the fourth level detail coefficient is selected to be divided into the former packet and the latter packet, which are executed for DCT to get two sets of transform domain coefficients (TDC) respectively. Finally, the average amplitudes of the two sets of TDC are modified to embed the binary image watermark according to the special embedding rule. The watermark extraction is blind without the carrier audio signal. Experimental results confirm that the proposed algorithm has good imperceptibility, large payload capacity and strong robustness when resisting against various attacks such as MP3 compression, low-pass filtering, re-sampling, re-quantization, amplitude scaling, echo addition and noise corruption.

Keywords: audio watermarking; robustness; blind extraction; discrete wavelet transform; Discrete cosine transform

1. Introduction With the rapid development of the Internet, multimedia data stored in digital form can be easily replicated and destroyed by illegal users, so protection against intellectual property infringement increasingly becomes an important issue. There are two primary methods to overcome the above problems, which are digital signature [1,2] and digital watermarking [3,4]. Digital signature is a kind of number string which can be used as the secret key for both senders and receivers [5], and it easily stimulates the desire of illegal users to destroy the multimedia data. Digital watermarking technology conceals the watermarks into the multimedia data and later extracts such watermarks to prove the owner of multimedia data, so it is an efficient approach to protect the media contents, widely used for copyright protection, broadcast monitoring, fingerprinting, data authentication, and medical

Appl. Sci. 2018, 8, 723; doi:10.3390/app8050723 www.mdpi.com/journal/applsci Appl. Sci. 2018, 8, 723 2 of 17 safety, and it has become a hot topic in the field of communication and information security in recent years [6–8]. According to the different application carriers, digital watermarking technology can be divided into image watermarking technology [9,10], watermark technology [11], audio watermark technology, and so on. Compared with the image watermarking technology, it is harder to develop the audio watermarking technology, mainly because the human auditory system is more sensitive than the visual system. With the widespread presence of audio media on the Internet, more and more people are beginning to pay attention to the research about audio watermarking technology, and more and more research algorithms have appeared. An audio watermarking algorithm generally takes into consideration four aspects including imperceptibility, robustness, security and payload capacity [12]. These four indexes are in conflict with each other. The increase of one index may cause a decrease in other indexes. An excellent audio watermarking algorithm should not only guarantee that the watermarked audio has good imperceptibility, but also provide enough payload capacity to accommodate necessary information. In addition, it should have a strong robustness to resist various signal processing attacks in practical application, so as to ensure the security of the extracted watermark. At present, most audio watermarking algorithms have some shortcomings, such as low capacity, poor robustness, and serious decline of the carrier audio quality. The audio watermarking algorithm for copyright protection must have good imperceptibility and strong robustness in resisting most common signal processing attacks that the audio may suffer during the transmission process to ensure that the watermark can be extracted accurately. This paper presents an audio watermarking algorithm in the discrete wavelet transform (DWT) and discrete cosine transform (DCT) domains. The multi-resolution analysis characteristic of DWT renders excellent robustness against most attacks, and DCT has the merits of high energy compaction in low-frequency coefficients, which makes it suitable for compression applications, so these two methods are widely used in signal processing [13–15]. In this study, our aim is to explore all useful properties of the DWT and DCT for audio watermarking so that the issues of robustness, imperceptibility, security and payload capacity can be improved as much as possible. In this algorithm, the carrier audio signal is first segmented into multiple audio fragments (the number of fragments should be larger than the length of watermark), and then each audio fragment is followed by DWT and DCT in order to obtain the two sets of TDC which are modified to embed the binary watermark. The major contributions of this algorithm are as follows. Firstly, a novel audio watermarking algorithm based on DWT and DCT is presented by modifying the average amplitudes of the transform coefficients to embed the watermark. Secondly, the algorithm has strong robustness when resisting against various common signal processing attacks, so it can be used to prove the owner of the audio media. Thirdly, the payload capacity of the algorithm reaches 172.27 bps which is higher than most related algorithms; thus, more watermarking information can be stored in the audio media. Finally, the algorithm does not require the participation of the original audio signal when extracting the watermark, which is very convenient for practical applications. The remaining parts of this paper are organized as follows: We review some related works about audio watermarking algorithm in Section2. Section3 describes the principle of the proposed watermarking algorithm in detail. This section is divided into three subjects, including the principle of watermark embedding, the principle of watermark extracting and the impact of the embedding depth on algorithm performance. The implementation of the proposed algorithm is described in Section4, including the process of embedding and extracting. Section5 evaluates the performance of this algorithm, including imperceptibility, capacity, and robustness, and then compares such experimental results with other algorithms in recent years. Finally, Section6 draws up the conclusions and gives the possible future research task. Appl. Sci. 2018, 8, 723 3 of 17

2. Related Works In this section, we recall some previous related works on audio watermarking algorithms. Over the past decades, many audio watermarking algorithms have appeared. Audio watermarking technology can be generally implemented in either the time domain [16–18] or transform domains. Lei [16] proposed an audio algorithm by modifying the group amplitude. This algorithm had low payload capacity because it utilized three fragments to present a one-bit watermark. Erfani [17] presented an audio watermarking method with less robustness based on the time spread echo. Basia [18] presented an audio watermarking algorithm for copyright protection by modifying the amplitude of each audio sample. In general, the time-domain algorithms can be implemented easily and require less computation, but are usually less robust to many kinds of digital signal processing attacks [12]. Compared with the time-domain algorithms, transform-domain algorithms are more robust because they take advantage of the audio signal characteristics and human auditory properties [19]. There are many transform domain algorithms, such as discrete (DFT) [20–22], DCT [6,23], DWT [8,19,24–28] and singular value decomposition (SVD) [29]. Asmara [30] compared the characteristics of DFT, DCT and DWT when they were applied to watermarking algorithm. Natgunanathan [20] presented a patchwork-based watermarking algorithm for stereo audio signals by exploiting the similarity of the two audio channels of stereo signals in the DFT domain. This algorithm had good robustness against several conventional attacks, but the payload capacity was not high. Megias [21] presented a blind watermarking algorithm for audio signal to resist against self-synchronization by using fast Fourier transform. The algorithm embedded synchronization signals in the time domain and watermarks in the . However, the self-synchronization code in time domain was vulnerable to some attacks, which would lead to the watermark being unable to survive. Tewari [22] proposed a digital audio watermarking algorithm which modified the middle frequency-band of DCT coefficients to embed the watermarks. Natgunanathan [6] designed another patchwork-based audio watermarking method which embedded and extracted watermark bits in a multilayer framework by modifying the mean values of selected fragments in the DCT domain. The payload capacity was higher than that in paper [20]. Hu [23] presented a large capacity audio watermarking algorithm by developing perceptual masking in the DCT domain. The authors claimed that the payload capacity of their algorithm reached 848.08 bps because they embedded the watermarks into three DCT coefficients respectively. However, they did not verify the overall performance of the algorithm when embedding the watermarks into those three DCT coefficients simultaneously. Due to the multi-resolution characteristics of DWT, many audio watermarking algorithms used DWT to analyze the frequency components of the audio signal in order to improve the performance of the algorithm. A variable-dimensional vector modulation (VDVM) algorithm was presented in paper [8], this algorithm maximized the efficiency of the -space DWT-based audio watermarking algorithm to achieve higher payload capacity, but its robustness was not very satisfactory. Kumsawat [19] used a genetic algorithm to search the optimal quantization step in order to improve both audio quality and robustness. This approach achieved good robustness against most of the attacks except for low-pass filtering, but its payload capacity was very low. Li [24] proposed a content-dependent localized audio watermarking algorithm to combat random cropping and time-scale modification. The watermark bits were embedded into the steady high-energy local regions to improve the robustness. This algorithm could resist synchronization attacks, but it had poor robustness against conventional signal processing attacks, such as equalization, re-sample and echo. Chen [25] proposed an adaptive method by modifying the average values of the wavelet-based entropy to embed the watermarks, but the robustness to re-sampling and low-pass-filtering attacks was quite low. An audio watermarking algorithm was proposed based on DWT in paper [26]. The algorithm changed the energy values of the former and the latter part of each audio fragment to hide confidential information. It had good robustness against several common attacks, but the author did not verify the robust performance when resisting MP3 compression which was the most common format for audio media. Wu [27] presented a self-synchronized audio watermarking algorithm based on DWT by Appl. Sci. 2018, 8, 723 4 of 17 embedding the watermarks into the low frequency-band. The algorithm had large payload capacity, but its robustness was poor against MP3 compression and noise corruption. Wu [28] proposed a self-synchronized audio watermarking method in which the synchronization code and the watermarks are embedded with the low-frequency sub-band in the DWT domain, but this algorithm has a high bit error rate (BER) against MP3 compression. Abd [29] utilized a twofold strategy to embed the image watermark into audio signal based on SVD. The algorithm blended the watermark with the diagonal matrix holding singular values and then performed the second SVD on the modified matrix after applying the first SVD to a 2-D matrix. The matrices which contained left- and right-singular vectors must be conserved in order to extract the watermark. All the above algorithms were designed in a single transform domain. In recent years, there have been many audio watermarking algorithms in multiple transform domains. An audio watermarking algorithm against desynchronization attacks was proposed based on support vector regression in paper [31]. The algorithm used the support vector machines (SVM) theory to locate the optimal embedding positions, and embedded the watermarks into the statistical average value of low-frequency components in DWT and DCT domains. Wang [32] proposed an audio watermarking algorithm according to the multi-resolution characteristic of DWT and the energy compression capability of DCT, but its robustness was poor against low-pass filtering and amplitude scaling. Vivekananda [33] utilized DWT and SVD to propose an adaptive audio watermarking by applying a quantization index modulation (QIM) process on the SVD values in the DWT domain. Bhat [34] presented a SVD–DWT blind watermarking algorithm to embed watermark into the audio signals in which the quantization steps were determined by the statistical properties of the involved DWT coefficients. Lei [35] attempted to embed the watermark into the high frequency-band of the SVD–DCT block. They claimed the performance generally better than the previous SVD-based methods. Hu [12] integrated discrete wavelet packet transformation (DWPT), SVD and QIM to achieve an approach for blind audio watermarking. The SVD was employed to analyze the matrix formed by the DWPT coefficients and embedded the watermarks by controlling singular values subject to perceptual criteria. Vivekananda [36] proposed a robust and blind audio watermarking algorithm based on SVD and QIM. Apart from the above algorithms, there are other audio watermarking algorithms in papers [4,7]. Wang [4] integrated exponent moments (EMs) and QIM to achieve an audio watermarking algorithm which used EMs to improve the robustness and QIM to realize the blind extraction of watermark, but this algorithm was not robust enough to resist amplitude scaling. Xiang [7] presented a reversible audio hiding scheme by using non-causal prediction. The scheme used the minimum error power method to calculate the optimum order and the prediction correlation of the audio data which could be used to embed the watermarks. It can be seen from the above introduction of the related works that the performance of the algorithms is not only related to the transform domain of signal processing, but also related to the embedding rules. Even though the algorithms use the same transform domain, their performance varies greatly due to the different embedding rules. The proposed algorithm in this paper combines the characteristics of DWT and DCT to process audio signal, and uses special embedding rules to embed the watermarks, which provides this algorithm with good robustness.

3. Principle of the Watermarking Algorithm in Transform Domain

3.1. Principle of Watermark Embedding The multi-resolution analysis characteristic of DWT renders excellent robustness against most attacks, so DWT can be used in audio watermarking algorithm to decompose the audio signal into wavelet coefficients with different frequency bands used for carrying watermarks. The energy compression capability of DCT can concentrate the main energy of the audio signal on the low frequency coefficient of DCT. This study will take advantage of these two methods to develop a novel robust audio watermarking algorithm. Since the human auditory system is insensitive to minor Appl. Sci. 2018, 8, 723 5 of 17 changes in the high-frequency components of the audio signal, the watermark information can be hidden in these high-frequency components obtained by DWT and DCT on the carrier audio signal. Suppose that the carrier audio signal is A, which has K sample points, and it can be expressed by the following formula: A = {a(k), 1 ≤ k ≤ K} , (1) where a(k) is the kth sample value of this audio signal. A is divided into M audio fragments Al (1 ≤ l ≤ M) with N sample points, and then the r-level DWT is performed on Al to obtain the wavelet coefficient Dl showed in Formula (2), including the approximation coefficient Ce(r) and the detail coefficients De(i) (i = 1, 2, ··· r):

Dl = DWT(Al) = Ce(r) ⊕ De(r) ⊕ De(r − 1) · · · ⊕ De(2) ⊕ De(1), (2) where Ce(r) is the rth level approximation coefficient decomposed by DWT, containing the lowest frequency component of the audio signal. The minor changes in Ce(r) will cause a significant drop in the audio quality, so usually watermarks cannot be embedded into this frequency-band. De(i) (i = 1, 2, ··· r) is the ith level detail coefficient. The smaller i is, the higher the frequency component contained in De(i) will be, and the smaller the impact of minor changes of De(i) on the audio quality will be. Therefore, the watermarks may be embedded into De(i). However, the high-frequency components are vulnerable to malicious attacks, so the detail components near the approximate components can be chosen to conceal watermark, which not only has little influence on the audio quality, but also can resist malicious attacks. In this study, the rth level detail coefficient De(r, n) (n = 1, 2, ... , N/2r) can be selected as the embedding frequency band. Divide De(r, n) into two packets respectively according to Formulas (3) and (4), including the former packet De1(r, j) and the r+1 latter packet De2(r, j) with the length of N/2 :

r+1 De1(r, j) = De(r, j) j = 1, 2, . . . , N/2 , (3)

N r+1 D (r, j) = De(r, + j) j = 1, 2, . . . , N/2 . (4) e2 2r+1

Perform DCT on De1(r, j) and De2(r, j) to obtain two transform-domain coefficients C1(r, j) and r+1 C2(r, j) with the length of N/2 , and then connect C1(r, j) and C2(r, j) to form an array C(r, n) with r the length of N/2 . Calculate the average amplitudes of |C(r, n)|, |C1(r, j)| and C2(r, j) according to Formulas (5)–(7). The average amplitude of |C(r, n)| is

r N/2r 2 r Mc = ∑ |C(r, n)|, n = 1, 2, . . . , N/2 . (5) N n = 1

The average amplitude of the former packet C1(r, j) is

r+1 N/2r+1 2 r+1 Mc1 = ∑ |C1(r, j)|, j = 1, 2, . . . , N/2 . (6) N j = 1

The average amplitude of the latter packet C2(r, j) is

r+1 N/2r+1 2 r+1 Mc2 = ∑ |C2(r, j)|, j = 1, 2, . . . , N/2 . (7) N j = 1 Suppose that the binary image watermark to be embedded is W = {w(q), 1 ≤ q ≤ L}, where w(q) ∈ {0, 1} , L is the length of the watermarks, L ≤ M. The average amplitudes of the two packets are modified to embed the watermark. The embedding rules are as follows: Appl. Sci. 2018, 8, 723 6 of 17

If w(q) = 1, modify C1(r, j) and C2(r, j) according to the following Formulas (8) and (9):

0 (1 + λ)Mc C1(r, j) = C1(r, j) × , (8) Mc1

0 (1 − λ)Mc C2(r, j) = C2(r, j) × . (9) Mc2

If w(q) = 0, modify C1(r, j) and C2(r, j) according to the following Formulas (10) and (11):

0 (1 − λ)Mc C1(r, j) = C1(r, j) × , (10) Mc1

0 (1 + λ)Mc C2(r, j) = C2(r, j) × , (11) Mc2 0 where λ is the embedding depth and its span is within the interval of (0,1). C1(r, j) is the modified 0 coefficient of the former packet, and C2(r, j) is the modified coefficient of the latter packet. Perform the 0 0 0 0 inverse DCT on C1(r, j) and C2(r, j) to obtain De1(r, j) and De2(r, j) respectively, and then recombine 0 0 0 them into De (r, n). Replace De(r) with De (r, n) in Formula (2) to get the watermarked coefficient Dl. 0 0 Finally, perform the inverse DWT on Dl to reconstruct the watermarked audio fragment Al and then recombine the watermarked audio signal A0.

3.2. Principle of Watermark Extracting A0 M A0 When extracting the watermark, the watermarked audio is divided into audio fragments l (1 ≤ l ≤ M) with N sample points, and then perform r-level DWT on each audio fragment to obtain Ce0(r) De0(i) i = ··· r De0(r) De0 ( ) the wavelet coefficient and ( 1, 2, ). Divide into the former packet 1 r, j De0 ( ) C0 (r j) C0 (r j) and the latter packet 2 r, j . Perform DCT on the two packets to obtain 1 , and 2 , and then calculate their average amplitudes respectively according to Formulas (6) and (7). w(q) C0 (r j) If = 1, according to Formulas (6)–(9), the average amplitude of 1 , is

r+1 r+1 N/2 M0 = 2 |C0 (r j)| c1 N ∑ 1 , j = 0

r+1 r+1 N/2 ( + ) = 2 |C (r, j) × 1 λ Mc | N ∑ 1 Mc j = 0 1

r+1 (12) ( + ) r+1 N/2 = 1 λ Mc 2 |C (r, j)| Mc N ∑ 1 1 j = 0 ( + ) = 1 λ Mc M Mc1 c1 r+1 = (1 + λ)Mc, j = 1, 2, . . . , N/2 . 0 The average amplitude of C2(r, j) is

N/2r+1 0 2r+1 0 Mc2 = N ∑ |C2(r, j)| j = 0 r+1 r+1 N/2 ( − ) = 2 |C (r, j) × 1 λ Mc | N ∑ 2 Mc j = 0 2 r+1 (13) ( − ) r+1 N/2 = 1 λ Mc 2 |C (r, j)| Mc N ∑ 2 2 j = 0 ( − ) = 1 λ Mc M Mc2 c2 r+1 = (1 − λ)Mc, j = 1, 2, . . . , N/2 . Appl. Sci. 2018, 8, 723 7 of 17

0 0 According to Formulas (12) and (13), when λ > 0, Mc1 ≥ Mc2. Similar to the above analysis process, if w(q) = 0, according to Formulas (10) and (11), the average C0 (r j) amplitude of 1 , is

r+1 2r+1 N/2 M0 = |C0 (r, j)| = (1 − λ)M , j = 1, 2, . . . , N/2r+1 (14) c1 ∑ 1 c N j = 0 0 The average amplitude of C2(r, j) is

r+1 N/2r+1 0 2 0 r+1 Mc2 = ∑ |C2(r, j)| = (1 + λ)Mc, j = 1, 2, . . . , N/2 (15) N j = 0

0 0 It can be seen from Formula (14) and (15), when λ > 0, Mc1 ≤ Mc2. Based on the above analysis, A0 the binary watermark can be extracted from the audio fragment l according to Formula (16):

( 0 0 0 1 , if Mc1 ≥ Mc2 w (q) = 0 0 , q = 1, 2, . . . , L (16) 0 , if Mc1 < Mc2

3.3. Impact of the Embedding Depth on Algorithm Performance The principle of watermark embedding in Section 3.1 shows that a watermark is embedded by C (r j) C (r j) modifying the amplitudes of 1 , and 2 , ; the larger the variation of the amplitude is, the worse the quality of the carrier audio is. Otherwise, the watermark cannot be accurately extracted if the variation is too small. Therefore, the variation of the amplitude should be maintained within a certain range, which not only guarantees the imperceptibility of the algorithm, but also maintains good C0 (r j) C0 (r j) robustness. It can be seen from Formulas (8)–(11) that the modified 1 , and 2 , are related to the embedding depth, so the following experiment tests the impact of embedding depth on the performance of the algorithm. The tested carrier audio signal is a song downloaded from the Internet, lasting for 60 s approximately, sampled at 44,100Hz and 16-bit quantization. The watermark bits were a of random binary string, only including 1 and 0, and long enough to cover the entire carrier signal. In order to objectively evaluate the performance of this watermarking algorithm, the quality of the watermarked audio can be determined using signal-to-noise ratio (SNR) as the performance index formulated as  K   2   ∑ A   =  SNR(A, A0) = 10lg k 1 , (17) K  ( 0 − )2   ∑ A A  k = 1 where A and A0 denote the carrier audio signal and the watermarked audio signal respectively. The larger the SNR is, the smaller the decrease of the audio quality will be, and the better the imperceptibility of algorithm will be. Usually, when SNR is over 20 dB, the audio quality is good. The robustness of the proposed algorithm to resist various attacks is evaluated using the bit error rate (BER), which is defined as

L 0 ∑ w(q) ⊕ w(q) q = 1 BER(w, w0) = × 100%, (18) L 0 where w(q) and w(q) denote the original watermark and the extracted watermark respectively, ⊕ stands for the exclusive-OR operator, and L is the length of the watermark. Generally, the smaller value of BER implies that the algorithm has good robustness against attacks. Appl. Sci. 2018, 8, x FOR PEER REVIEW 8 of 18

K 2 ∑ A ' k =1 SNR(AA , )= 10lg K , (17) ∑()AA'2− k =1

where A and A' denote the carrier audio signal and the watermarked audio signal respectively. The larger the SNR is, the smaller the decrease of the audio quality will be, and the better the imperceptibility of algorithm will be. Usually, when SNR is over 20 dB, the audio quality is good. The robustness of the proposed algorithm to resist various attacks is evaluated using the bit error rate (BER), which is defined as

L ∑ wq()⊕ wq ()' q=1 (18) BER(ww ,' ) = ×100% , L

' Appl. Sci.where2018 ,w8(,q 723) and w(q) denote the original watermark and the extracted watermark respectively, ⊕8 of 17 stands for the exclusive-OR operator, and L is the length of the watermark. Generally, the smaller value of BER implies that the algorithm has good robustness against attacks. NormalizedNormalized correlation correlation (NC) (NC) coefficient coefficient can can be usedbe used to compare to compare the similaritythe similarity between between the the original watermarkoriginal and watermark the extracted and the watermark extracted watermark represented represent as Formulaed as (19).Formula If NC (19). is I closef NC tois close 1, w( qto) is1, very 0 0 similarw( toq) wis( veryq) . Onsimilar the otherto w(q) hand,' . On thew( qother) and hand,w(q )wwill(q) and be veryw(q)' differentwill be very when different NC iswhen close NC to zero:is close to zero: L ∑L w(q) × w0(q) q = 1w(q)× w'(q) NC(w, w0) = ∑ . (19) s qL=1 L NC(w, w') = 2 . 2 (19) L∑ w(q)L ∑ w0(q) q∑= w1(q)2 ∑qw='(q1)2 q=1 q=1 When λ changes from 0 to 1, the experimental results of SNR and BER are shown in Figure1. When λ changes from 0 to 1, the experimental results of SNR and BER are shown in Figure 1. Figure1a shows that the larger the λ is, the smaller the SNR of the watermarked audio signal is, Figure 1a shows that the larger the λ is, the smaller the SNR of the watermarked audio signal is, and the worse the imperceptibility of the algorithm is. Figure1b shows that the larger the λ is, and the worse the imperceptibility of the algorithm is. Figure 1b shows that the larger the λ is, the the smallersmaller thethe BERBER ofof thethe extracted watermark watermark is, is,and and the the better better the therobustness robustness of the of algorithm the algorithm is. is. Thus,Thu imperceptibilitys, imperceptibility and and robustness robustness areare contradictory,contradictory, a alarger larger λ λ cancan be beselected selected to improve to improve the the robustnessrobustness of the of algorithmthe algorithm under under the the premise premise ofof ensuringensuring tha thatt the the SNR SNR is isgreater greater than than 20 dB. 20 dB.

FigureFigure 1. Performance 1. Performance evaluation evaluation under different different embedding embedding depth: depth: (a) evaluation (a) evaluation about SNR; about (b) SNR; evaluation about BER. (b) evaluation about BER.

4. Implementation of the Proposed Algorithm In Section3, the watermark embedding principle and extraction principle of the proposed algorithm are described. The detailed implementation steps are described in this section including two parts: embedding watermark and extracting watermark.

4.1. Procedure for Embedding Watermark The watermark embedding diagram of this proposed algorithm is showed in Figure2. The detailed embedding steps are described as follows:

Step 1: Convert the image watermark into the binary bit stream W = {w(q), 1 ≤ q ≤ L} with the length of L.

Step 2: The carrier audio A is divided into M audio fragments Al (1 ≤ l ≤ M) with the length of N after low-pass filtering. M is the number of the audio fragments, M ≥ L.

Step 3: When l changes from 1 to M, perform the r-level DWT on each fragment Al to obtain the wavelet coefficients, and select De(r) as the embedding frequency-band.

Step 4: Divide De(r) into the former packet De1(r, j) and the latter packet De2(r, j) with the length of N/2r+1 according to Formulas (3) and (4).

Step 5: Perform DCT on De1(r, j) and De2(r, j) to obtain C1(r, j) and C2(r, j) respectively. r Step 6: Connect C1(r, j) and C2(r, j) to form an array C(r, n) with the length of N/2 . Step 7: Calculate the average amplitude of |C(r, n)|, |C1(r, j)| and |C2(r, j)| according to Formulas (5)–(7). Appl. Sci. 2018, 8, 723 9 of 17

Step 8: If w(q) = 1, embed one-bit watermark into De(r) according to Formulas (8) and (9). If w(q) = 0, embed one-bit watermark into De(r) according to Formulas (10) and (11). C0 ( ) C0 ( ) De0 (r ) De0 (r j) Step 9: Perform the inverse DCT on 1 r, j and 2 r, j to obtain 1 , j and 2 , . De0 (r ) De0 (r j) De0( ) Step 10: Recombine 1 , j and 2 , into r, n and perform the inverse DWT to reconstruct A0 the watermarked audio fragment l . Step 11: Repeat Step 3 to Step 10 until all watermarks are embedded. 0 0 Step 12: Recombine Al(1 ≤ l ≤ M) as the watermarked audio signal A . Appl. Sci. 2018, 8, x FOR PEER REVIEW 10 of 18

Original audio signal

Segment into M fragments

Apply DWT on Al to obtain De (, r n )

Divide De (, r n ) into

De 1 (, r j ) and De2 (, r j )

C11(, r j )= DCT ( De (, r j ))

C22(, r j )= DCT ( De (, r j ))

Crn(, )= [ Crj12 (, ) C (, rj )]

Calculate , M c M and c1 M c2

The binary image Yes No If wl () = 1 ?

Embedding watermark Embedding watermark according to formula according to formula Binary bit stream (8) and (9) (10) and (11)

De''(, r j )= IDCT ( C (, r j )) 11 Watermark embedding De''(, r j )= IDCT ( C (, r j )) 22 De'(, r n )= [ De '' (, r j ) De (, r j )] 12

Apply IDWT to reconstruct A l

End of No fragments ?

Yes

Watermarked audio fragments combination

Watermarked audio signal Figure 2. FigureThe watermark 2. The watermark embedding embedding diagram diagram ofof this proposed proposed algorithm algorithm.. DWT: DWT:discrete discretewavelet wavelet transform; DCT: discrete cosine transform; IDWT: inverse discrete wavelet transforms. transform; DCT: discrete cosine transform; IDWT: inverse discrete wavelet transforms. Appl. Sci. 2018, 8, 723 10 of 17

4.2. Procedure for Extracting Watermark The watermark extracting diagram of this proposed algorithm is showed in Figure3. The detailed extracting steps are described as follows: 0 0 Step 1: Segment the watermarked audio signal A into M audio fragments Al with the length of N. 0 0 Step 2: Perform r-level DWT on Al to obtain the wavelet coefficients De (r). De0(r) De0 ( ) De0 ( ) N r+1 Step 3: Divide into 1 r, j and 2 r, j with the length of /2 . De0 ( ) De0 ( ) C0 ( ) C0 ( ) Step 4: Perform DCT on 1 r, j and 2 r, j to obtain 1 r, j and 2 r, j respectively. |C ( )| |C ( )| M0 M0 Step 5: Calculate the average amplitudes of 1 r, j and 2 r, j to obtain c1 and c2 according to Formulas (6) and (7). 0 0 Step 6: If Mc1 > Mc2, the extracted binary information is ‘1’, otherwise, it is ‘0’. Step 7: Repeat Step 2 to Step 6 until all binary watermarks are extracted. Step 8: Convert the extracted binary stream into binary image watermark. Appl. Sci. 2018, 8, x FOR PEER REVIEW 11 of 18

Watermarked audio signal

Segment A ' into M fragments

' Apply DWT on Al to obtain De ' (, r n )

Divide De ' (, r n ) into ' and ' De1(, r j ) De2 (, r j )

'' C11(, r j )= DCT ( De (, r j )) '' C22(, r j )= DCT ( De (, r j ))

' ' Calculate M c 1 and M c2

Yes '' No MM > ? cc12

The extracted binary The extracted binary bit is ‘1’ bit is ‘0’

Watermark extracting No

End of fragments ?

Yes

Binary bit stream

The extracted image

FigureFigure 3. The 3. The watermark watermark extracting extracting diagram diagram of ofthis this proposed proposed algorithm algorithm..

5. Performance Evaluation This section will use a large number of experiments to evaluate the performance of the proposed algorithm. The detailed experimental environment is described as follows: (1) Computer system: Microsoft Windows XP Professional; (2) Programming Language: MATLAB 6.5; (3) Software for processing audio signals: Cool Edit Pro V2.1. Experimental parameters are as follows: (1) The tested carrier audio signals consist of 20 songs downloaded from the Internet, sampled at 44,100 Hz and 16-bit quantization; (2) Four binary images are used as the watermark shown in Figure 4 respectively; (3) Perform four-level DWT on each audio fragment; (4) Embed the watermark into the Appl. Sci. 2018, 8, 723 11 of 17

5. Performance Evaluation This section will use a large number of experiments to evaluate the performance of the proposed algorithm. The detailed experimental environment is described as follows: (1) Computer system: Microsoft Windows XP Professional; (2) Programming Language: MATLAB 6.5; (3) Software for processing audio signals: Cool Edit Pro V2.1. Experimental parameters are as follows: (1) The tested carrier audio signals consist of 20 songs downloaded from the Internet, sampled at 44,100 Hz and Appl.16-bit Sci. quantization; 2018, 8, x FOR PEER (2) Four REVIEW binary images are used as the watermark shown in Figure4 respectively;12 of 18 (3) Perform four-level DWT on each audio fragment; (4) Embed the watermark into the wavelet waveletcoefficient coefficientDe(4); (5) De The(4) ; length (5) The of length the audio of the fragment audio isfragment 256; (6) Theis 256 embedding; (6) The embedding depth is 0.4. depth is 0.4.

(a) (b)

(c) (d)

FigureFigure 4. Four binary binary image imagess to to be embedded into the the carrier audio signals signals:: ( (aa)) The The first first image image with with dimensionsdimensions of of 100 × 9696;; ( (bb)) The The second second image image with with dimensions dimensions of of 200 × 5050;; ( (cc)) The The third third image image with with dimensionsdimensions of of 200 200 ×× 50;50; (d (d) )The The fourth fourth image image with with dimensions dimensions of of 100 100 ×× 100.100.

55.1..1. Imperceptibility Imperceptibility and and Payload Payload Capacity Capacity TheThe test test of of each each audio audio signal signal was was repeat repeateded 1 100 time times,s, s soo it it takes takes 200 200 experiments to to test test all all the the audioaudio signals. Based on thethe principleprinciple ofofwatermark watermark embedding embedding in in Section Section3, one-bit3, one-bit watermark watermark can can be beembedded embedded for for each each audio audio fragment. fragment. Since Since the length the length of the of audio the fragmentaudio fragment is 256, theis 256, payload the payload capacity capacityof this algorithm of this algorithm is 44,100/256 is 44,100/256 = 172.27 = bps. 172.27 bps. TThehe averageaverage resultsresults about about the the SNR SNR of the of audiothe audio signal, signa the NCl, the and NC BER and of the BER extracted of the watermarkextracted watermarkand the payload and the capacity payload are capacity listed inare Table listed1. in The Table experimental 1. The experimental results showed results that showed the payload that the payloadcapacity capacity of the algorithm of the algorithm is the same is the as same that ofas paperthat of [25paper], higher [25], higher than that than of that paper of [pape4,12],r [ and4] and far [higher12], and than far thathigher of paperthan that [16 ,of19 ].paper [16] and [19].

TableTable 1.1. ExperimentalExperimental resultsresults of of the the signal-to-noise signal-to-noise ratio ratio (SNR), (SNR normalized), normalized correlation correlation (NC), (NC bit) error, bit errorrate (BER) rate (BER and) payload and payload capacity capacity (no attack). (no attack). Items Proposed Paper [4] Paper [16] Paper [25] Paper [12] Paper [19] Items Proposed Paper [4] Paper [16] Paper [25] Paper [12] Paper [19] SNR (dB) 23.49 N/A 21.37 18.42 20.32 26.79 SNRCapacity (dB)(bps) 23.49 172.27 N/A125 21.3743.07 172.27 18.42 139.97 20.32 34.14 26.79 Capacity(bps) 172.27 125 43.07 172.27 139.97 34.14 NC 1 N/A N/A N/A N/A 1 NC 1 N/A N/A N/A N/A 1 BER (%)BER (%) 0.000.00 0.000.00 N/AN/A N/A N/A 0.12 0.12 0.00 0.00 NoteNote that that N/A N/A means means no no report report is found inin the the selected selected algorithm. algorithm.

The average SNR of this proposed algorithm is 23.49 dB, which is higher than that in papers The average SNR of this proposed algorithm is 23.49 dB, which is higher than that in papers [12,16,25] [12,16,25] but not paper [19], while the payload capacity of this proposed algorithm is five times that but not paper [19], while the payload capacity of this proposed algorithm is five times that of the of the paper [19]. paper [19]. The images extracted without any attack by this algorithm are very similar to the original images because NC equals 1 and BER equals 0 as shown in Table 1. Figure 5 shows the waveform comparison of the carrier audio and the watermarked audio (only show a short fragment lasting about three seconds) without performing any attack. It can be seen from Figure 5 that two waveform figures have no obvious changes before and after the watermark was embedded into the carrier audio. Figure 6 shows that two figures are slightly different at high frequency band, mainly because the watermarks are embedded into the high frequency band of the watermarked audio in Figure 6b, but human ears are not sensitive to these minor changes of the high frequency component. The experimental result of SNR in Table 1, Figures 5 and 6 all indicate the excellent imperceptibility of this algorithm. Appl. Sci. 2018, 8, 723 12 of 17

The images extracted without any attack by this algorithm are very similar to the original images because NC equals 1 and BER equals 0 as shown in Table1. Figure5 shows the waveform comparison of the carrier audio and the watermarked audio (only show a short fragment lasting about three seconds) without performing any attack. It can be seen from Figure5 that two waveform figures have no obvious changes before and after the watermark was embedded into the carrier audio. Figure6 shows that two spectrogram figures are slightly different at high frequency band, mainly because the watermarks are embedded into the high frequency band of the watermarked audio in Figure6b, but human ears are not sensitive to these minor changes of the high frequency component. The experimental result of

SNR inAppl. Table Sci. 20181, Figures, 8, x FOR 5PEER and REVIEW6 all indicate the excellent imperceptibility of this algorithm. 13 of 18 Appl. Sci. 2018, 8, x FOR PEER REVIEW 13 of 18

FigureFigure 5. Waveform 5. Waveform comparison comparison of of the the carrier carrier audio audio andand the watermarked watermarked audio: audio: (a) (originala) original audio audio Figuresignal; 5.(b Waveform) watermarked comparison audio signal. of the carrier audio and the watermarked audio: (a) original audio signal;signal; (b) watermarked (b) watermarked audio audio signal. signal.

Figure 6. Spectrogram comparison of the carrier audio and the watermarked audio: (a) original FigureFigureaudio 6. Spectrogram signal;6. Spectrogram (b) watermarked comparison comparison audio of theof signal. the carrier carrier audio audio and and the the watermarked watermarked audio: audio: (a ()a original) original audio signal;audio (b) watermarkedsignal; (b) watermarked audio signal. audio signal. 5.2. Robustness 5.2. Robustness 5.2. RobustnessRobustness is an important index for evaluating the performance of the watermarking algorithm.Robustness This studyis an examinesimportant the index NC and for BERevaluating between the the performancecarrier watermark of the and watermarking the extracted Robustnessalgorithm.watermark This to is assess anstudy important the examines robustness index the NC foragainst evaluatingand variousBER between theattacks. performance the The carrier attack watermark of types the watermarkingconsidered and the inextracted the algorithm. test This studywatermarkare as examinesfollows: to assess the the NC robustness and BER betweenagainst various the carrier attacks. watermark The attack and types the considered extracted in watermark the test to assessare the as robustnessfollows: against various attacks. The attack types considered in the test are as follows: A. Low-pass filtering: applying low-pass filter with cutoff frequency of four kilohertz. A. Low-pass filtering: applying low-pass filter with cutoff frequency of four kilohertz. A. B.Low-pass Amplitude filtering: scaling: applying scaling the low-pass amplitude filter of the with watermarked cutoff frequency audio signal of four by kilohertz. 0.8. B. Amplitude scaling: scaling the amplitude of the watermarked audio signal by 0.8. B. C.Amplitude Amplitude scaling: scaling: scaling scaling the the amplitudeamplitude of of the the watermarked watermarked audio audio signal signal by 1.2. by 0.8. C.D. AmplitudeNoise corruption: scaling: addingscaling zerothe amplitude-mean Gaussian of the watermarkednoise to the watermarked audio signal byaudio 1.2. signal with 20 C. Amplitude scaling: scaling the amplitude of the watermarked audio signal by 1.2. D. NoisedB. corruption: adding zero-mean Gaussian noise to the watermarked audio signal with 20 D. E.Noise dB.Noise corruption: corruption: adding adding zero-meanzero-mean Gaussian Gaussian noise noise to the to thewatermarked watermarked audio audiosignal signalwith 30 with E.20 dB.NoisedB. corruption: adding zero-mean Gaussian noise to the watermarked audio signal with 30 E. F.Noise dB.Noise corruption: corruption: adding adding zero-meanzero-mean Gaussian Gaussian noise noise to the to thewatermarked watermarked audio audiosignal signalwith 35 with F. Noise corruption: adding zero-mean Gaussian noise to the watermarked audio signal with 35 30 dB.dB. dB. F. G.Noise MP3 corruption: compression: adding applying zero-mean MP3 compression Gaussian with noise 64 kbps to theto the watermarked watermarked audioaudio signal. signal with G.H. MP3MP3 compression: compression: applying applying MP3MP3 compression compression withwith 64128 kbps kbps to to the the watermarked watermarked audio audio signal. signal. 35 dB. H.I. MP3Re-sampling: compression: droppin applyingg the MP3sampling compression rate of the with watermarked 128 kbps to theaudio watermarked signal from audio 44,100 signal. Hz to I. Re22-,sampling:050 Hz and droppin then roseg the back sampling to 44,100 rate Hz. of the watermarked audio signal from 44,100 Hz to J. 22Re,050-quantization: Hz and then quantizing rose back theto 44 watermarked,100 Hz. audio signal from 16-bit/sample to 8-bit/sample J. Reand-quantization: then back to quantizing16-bit/sample. the watermarked audio signal from 16-bit/sample to 8-bit/sample K. andEcho then addition: back to addi 16-bngit/sample. an echo signal with a delay of 50 ms and a decay of five percent to the K. Echowatermarked addition: audio adding signal. an echo signal with a delay of 50 ms and a decay of five percent to the watermarked audio signal. Appl. Sci. 2018, 8, 723 13 of 17

G. MP3 compression: applying MP3 compression with 64 kbps to the watermarked audio signal. H. MP3 compression: applying MP3 compression with 128 kbps to the watermarked audio signal. I. Re-sampling: dropping the sampling rate of the watermarked audio signal from 44,100 Hz to 22,050 Hz and then rose back to 44,100 Hz. J. Re-quantization: quantizing the watermarked audio signal from 16-bit/sample to 8-bit/sample and then back to 16-bit/sample. K. Echo addition: adding an echo signal with a delay of 50 ms and a decay of five percent to the watermarked audio signal.

Appl. Sci. 2018, 8, x FOR PEER REVIEW 14 of 18 The extracted images and the average experimental results of NC are shown in Figures7–10 respectively.The extracted images and the average experimental results of NC are shown in Figures 7–10 Appl.respectively. Sci. 2018, 8 , x FOR PEER REVIEW 14 of 18

The extracted images and the average experimental results of NC are shown in Figures 7–10 respectively.

(a) (b) (c) (d)

(a) (b) (c) (d)

(e) (f) (g) (h)

(e) (f) (g) (h)

(i) (j) (k) (l)

Figure 7. The extracted pictures of the first image under different attacks: (a) Low-pass filtering, NC Figure 7. The= extracted1; (b) Amplitude pictures scaling by of 0.8, the NC first= 1; (c) image Amplitude under scalingdifferent by 1.2, NC = attacks:1; (d) Noise( corruptiona) Low-pass filtering, NC = 1; (b) Amplitude(20 dB), NC = 0.9665; scaling(i) (e) N byoise 0.8,corruption NC(j) with = 1; 30 (dcB,) N AmplitudeC =( k0).9965 ; (f) N scalingoise corruption(l) by 1.2, with NC35 dB, = 1; (d) Noise NC = 0.9992; (g) MP3 compression with 64 kbps, NC = 0.9998; (h) MP3 compression with 128 kbps, corruption (20Figure dB), 7. NC The =extracted 0.9665; pictures (e) Noise of the first corruption image under withdifferent 30 attacks: dB, NC (a) Low = 0.9965;-pass filtering, (f) NoiseNC corruption =N 1C; (=b 1) ;Amplitude (i) Re-sampling, scaling N byC =0.8, 1; (Nj)C Re = -1quantization,; (c) Amplitude NC scaling = 0.9982; by (1.2,k) ENchoC = addition, 1; (d) Noise NC corruption = 1; (l) no with 35 dB, NC(2attack,0 = dB), 0.9992; N NCC = =1 .0 .9665; (g) MP3 (e) Noise compression corruption with with 30 dB, 64 N kbps,C = 0.9965 NC; (f) =Noise 0.9998; corruption (h) MP3with 35 compression dB, with 128 kbps, NC =NC 1; = (0i.9992) Re-sampling,; (g) MP3 compression NC = with 1; ( j64) Re-quantization,kbps, NC = 0.9998; (h) NCMP3 =compression 0.9982; (withk) Echo 128 kbps, addition, NC = 1; (l) no attack, NCNC = 1 1.; (i) Re-sampling, NC = 1; (j) Re-quantization, NC = 0.9982; (k) Echo addition, NC = 1; (l) no attack, NC = 1.

(a) (b) (c)

(da)) ((be) (cf)

(d) (e) (f) (g) (h) (i)

(g) (h) (i) (j) (k) (l)

Figure 8. The extracted pictures of the second image under different attacks: (a) Low-pass filtering, NC = 1; (b) Amplitude scaling by 0.8, NC = 1; (c) Amplitude scaling by 1.2, NC = 1; (d) Noise corruption( j(2) 0 dB), NC = 0.9954; (e) Noise corruption(k) with 30 dB, NC = 0.9996; (f) Noise(l) corruption Figurewith 35 8. d B,The N Cextracted = 0.9999 ;pictures (g) MP3 of compression the second withimage 64 under kbps, differentNC = 0.9999; attacks: (h) MP3 (a) Low compression-pass filtering, with Figure 8. TheN12 extractedC8 k=bps, 1; ( Nb)C Amplitude = pictures 1; (i) Re-sampling, scaling of the by Nsecond C0.8, = 1 N; (jC) Re= image -1quantization,; (c) Amplitude under NC different scaling= 0.9998 by; (k )1.2, attacks:Echo N Caddition, = 1;( a(d) )N Low-passNoiseC = 1; filtering, NC = 1; (b) Amplitudecorruption(l) no attack, (2 scaling N0 CdB), = 1 N. C by = 0 0.8,.9954; NC (e) Noise = 1; (corruptionc) Amplitude with 30 scalingdB, NC = by0.9996 1.2,; (f NC) Noise = 1;corruption (d) Noise corruption with 35 dB, NC = 0.9999; (g) MP3 compression with 64 kbps, NC = 0.9999; (h) MP3 compression with e f (20 dB), NC =12 0.9954;8 kbps, N (C )= Noise1; (i) Re-sampling, corruption NC = 1 with; (j) Re- 30quantization, dB, NC N =C 0.9996;= 0.9998; (k () )Echo Noise addition, corruption NC = 1; with 35 dB, NC = 0.9999; ((gl) )no MP3 attack, compression NC = 1. with 64 kbps, NC = 0.9999; (h) MP3 compression with 128 kbps, NC = 1; (i) Re-sampling, NC = 1; (j) Re-quantization, NC = 0.9998; (k) Echo addition, NC = 1; (l) no attack, NC = 1. Appl. Sci. 2018, 8, 723 14 of 17 Appl. Sci. 2018, 8, x FOR PEER REVIEW 15 of 18 Appl. Sci. 2018, 8, x FOR PEER REVIEW 15 of 18

(a) (b) (c) (a) (b) (c)

(d) (e) (f) (d) (e) (f)

(g) (h) (i) (g) (h) (i)

(j) (k) (l) (j) (k) (l) Figure 9. The extracted pictures of the second image under different attacks: (a) Low-pass filtering, Figure 9. The extracted pictures of the second image under different attacks: ( a) Low Low-pass-pass filtering, filtering, NC = 1; (b) Amplitude scaling by 0.8, NC = 1; (c) Amplitude scaling by 1.2, NC = 1; (d) Noise NCNC == 1;1; ( b()b Amplitude) Amplitude scaling scaling by 0.8,by NC0.8, =N 1;C (c=) Amplitude1; (c) Amplitude scaling scaling by 1.2, NCby =1.2, 1; (dN)C Noise = 1; corruption(d) Noise corruption (20 dB), NC = 0.9911; (e) Noise corruption with 30 dB, NC = 0.9975; (f) Noise corruption (20corruption dB), NC (2 =0 0.9911;dB), N (Ce )= Noise0.9911; corruption (e) Noise withcorruption 30 dB, with NC =3 0.9975;0 dB, N (Cf) = Noise 0.9975 corruption; (f) Noise withcorruption 35 dB, with 35 dB, NC = 0.9992; (g) MP3 compression with 64 kbps, NC = 0.9998; (h) MP3 compression with NCwith = 3 0.9992;5 dB, N (Cg )= MP30.9992 compression; (g) MP3 compression with 64 kbps, with NC 64 =kbps, 0.9998; NC ( h=) 0 MP3.9998; compression (h) MP3 compression with 128 kbps,with 128 kbps, NC = 1; (i) Re-sampling, NC = 1; (j) Re-quantization, NC = 0.9986; (k) Echo addition, NC128 =kbps, 1; (i )N Re-sampling,C = 1; (i) Re NC-sampling, = 1; (j) Re-quantization,NC = 1; (j) Re-quantization, NC = 0.9986; N (kC) = Echo 0.9986 addition,; (k) Echo NC addition, = 0.9999; lNC = 0.9999; (l) no attack, NC = 1. (N)C no = attack,0.9999; NC(l) n =o 1.attack, NC = 1.

(a) (b) (c) (d) (a) (b) (c) (d)

(e) (f) (g) (h) (e) (f) (g) (h)

(i) (j) (k) (l) (i) (j) (k) (l) Figure 10. The extracted pictures of the first image under different attacks: (a) Low-pass filtering, NC Figure 10. The extracted pictures of the first image under different attacks: (a) Low-pass filtering, NC Figure= 1; (b 10.) AmplitudeThe extracted scaling pictures by 0.8, ofNC the = 1 first; (c) imageAmplitude under scaling different by 1.2, attacks: NC = (a1); ( Low-passd) Noise corruption filtering, = 1; (b) Amplitude scaling by 0.8, NC = 1; (c) Amplitude scaling by 1.2, NC = 1; (d) Noise corruption NC(20 = d 1;B), (b N) AmplitudeC = 0.9856; scaling (e) Noise by 0.8,corruption NC = 1; (withc) Amplitude 30 dB, N scalingC = 0.9991;( by 1.2,f) N NCoise = 1;corruption (d) Noise corruptionwith 35 dB, (20 dB), NC = 0.9856; (e) Noise corruption with 30 dB, NC = 0.9991;(f) Noise corruption with 35 dB, (20NC dB), = 0 NC.9998;( = 0.9856;g) MP3 ( ecompression) Noise corruption with 64 with kbps, 30 N dB,C NC= 0.9998; = 0.9991;( (h) Mf)P3 Noise compression corruption with with 12 358 k dB,bps, NC = 0.9998;(g) MP3 compression with 64 kbps, NC = 0.9998; (h) MP3 compression with 128 kbps, NCNC = = 0.9998;( 1; (i) Rge-)sampling, MP3 compression NC = 0.9998; with (j 64) Re kbps,-quanti NCzation, = 0.9998; NC (=h 0).9990; MP3 compression(k) Echo addition, with 128NC kbps,= 1; (l) NC = 1; (i) Re-sampling, NC = 0.9998; (j) Re-quantization, NC = 0.9990; (k) Echo addition, NC = 1; (l) NCno =attack, 1; (i) Re-sampling,NC = 1. NC = 0.9998; (j) Re-quantization, NC = 0.9990; (k) Echo addition, NC = 1; (nlo) noattack, attack, NC NC = 1 =. 1. It can be seen from Figures 7–10 that the extracted image watermarks are very similar to the It can be seen from Figures 7–10 that the extracted image watermarks are very similar to the original image watermarks when resisting against low-pass filter with cutoff frequency 4 kHz, original image watermarks when resisting against low-pass filter with cutoff frequency 4 kHz, amplitude scaling by 0.8 and 1.2, MP3 compression with 128 kbps and 64 kbps, re-sampling, amplitude scaling by 0.8 and 1.2, MP3 compression with 128 kbps and 64 kbps, re-sampling, re-quantization, echo addition with a delay of 50 ms and a decay of 5%, noise corruption in 30 dB re-quantization, echo addition with a delay of 50 ms and a decay of 5%, noise corruption in 30 dB Appl. Sci. 2018, 8, 723 15 of 17

It can be seen from Figures7–10 that the extracted image watermarks are very similar to the original image watermarks when resisting against low-pass filter with cutoff frequency 4 kHz, amplitude scaling by 0.8 and 1.2, MP3 compression with 128 kbps and 64 kbps, re-sampling, re-quantization, echo addition with a delay of 50 ms and a decay of 5%, noise corruption in 30 dB and 35 dB. The extracted image watermarks shown in Figures7d,8d,9d and 10d are relatively obscure when suffering attack from noise corruption in 20 dB, but with the decrease of white noise, they become more and more clear, as shown in Figure7e,f Figure8e,f Figure9e,f and Figure 10e,f. The average BER values of the extracted watermarks under different attacks are listed in Table2. It can be seen that this proposed algorithm has excellent robustness against low-pass filter with cutoff frequency four kilohertz, amplitude scaling by 0.8, amplitude scaling by 1.2, MP3 compression with 128 kbps, MP3 compression with 64 kbps, re-sampling, re-quantization, echo addition with a delay of 50 ms and a decay of five percent, noise corruption in 30 dB and noise corruption in 35 dB, so it is far superior to the algorithms proposed in papers [4,16,25]. In particular, when this algorithm resists low-pass filter, BER is only 0.01%, which is much better than 21.975% in paper [16], 6.93% in paper [19], 28.250% in paper [25], 0.12% in paper [12] and 0.39% in paper [4].

Table 2. Average BER values (%) of the extracted watermarks under different attacks.

Attack Proposed Paper [4] Paper [16] Paper [25] Paper [12] Paper [19] A 0.01 0.39 21.97 28.25 0.12 6.93 B 0.01 2.87 0.50 0.30 0.12 N/A C 0.01 17.92 0.47 0.35 N/A N/A D 2.27 N/A N/A N/A 1.29 N/A E 0.22 N/A N/A N/A 0.31 N/A F 0.07 0.78 N/A N/A N/A 0.00 G 0.08 1.95 2.45 6.85 0.12 0.00 H 0.01 N/A 1.12 4.97 1.61 0.00 I 0.01 0.00 1.00 6.45 0.12 0.00 J 0.14 0.78 N/A N/A 0.12 0.00 K 0.01 N/A N/A N/A 0.84 N/A Note that N/A means no report is found in the selected algorithm.

The robustness is slightly inferior to that in paper [19] when resisting re-quantization, and also slightly inferior to that in paper [12] when resisting noise corruption in 20 dB. When suffering attack from noise corruption, the quality of the extracted watermark is slightly poor, which is mainly because the algorithm is achieved by comparing two sets of TDC obtained from the fourth level detail coefficient. When the additional noise is very loud, the fourth level wavelet coefficient will be affected by noise, thus reducing the accuracy of the extracted watermark. As the noise becomes smaller, the quality of watermark is significantly improved under the noise attack of 30 and 35 dB.

6. Conclusions In this paper, a novel and blind audio watermarking algorithm with strong robustness is proposed according to the multi-resolution characteristic of DWT and the energy compression capability of DCT. The cover audio signal is first segmented into audio fragments, and then each audio fragment is performed by DWT and DCT in order to obtain two sets of TDC which are modified to embed the binary watermark. This proposed algorithm can realize blind extraction of digital watermark without the participation of carrier audio signal when extracting watermark, which is convenient for the practical application. The embedding depth is the key factor to determine the performance of the algorithm. In the case that the algorithm has good imperceptibility, the embedding depth should be increased to improve the robustness. The experimental results show that the average SNR of the carrier audio signals reaches 23.49 dB in the case of the payload capacity of 172.27 bps, which indicates that this proposed algorithm has large capacity and good imperceptibility. In addition, this proposed Appl. Sci. 2018, 8, 723 16 of 17 algorithm has excellent robustness against white noise, low-pass filtering, re-sampling, re-quantization, echo addition, amplitude scaling and MP3 compression compared with other audio watermarking algorithms. Without synchronization signal, the watermark extraction begins with the first audio fragment, so this algorithm cannot combat the desynchronization attack, although it can effectively resist the conventional signal processing attacks generated during the use of audio media. In the next study, we will focus on the issue of how to add synchronization signals in the carrier audio, as well as the problem of combating other types of attacks, such as desynchronization attack, collusion attack type II and strong noise corruption.

Author Contributions: Q.W. conceived the algorithm and designed the experiments; Q.W. performed the experiments; Q.W. and M.W. analyzed the results and adjusted the experimental scheme; Q.W. drafted the manuscript; Q.W. and M.W. revised the manuscript. All authors read and approved the final manuscript. Acknowledgments: This research work was supported by the National Natural Science Foundation of China for Youth, China (Grant No. 61602263), Research and Innovation Project for Graduate Students of Jiangsu Province, China (Grant No. KYLX_0815), the University Natural Science Foundation of Jiangsu Province, China (Grant No. 16KJB510015). Conflicts of Interest: The authors declare no conflict of interest.

References

1. Shiva, M.G.; D’Souza, R.J.; Varaprasad, G. Digital signature-based secure node disjoint multipath routing protocol for wireless sensor networks. IEEE Sens. J. 2012, 12, 2941–2949. [CrossRef] 2. Kai, C.; Kuo, W.C. A new digital signature algorithm based on chaotic maps. Nonlinear Dyn. 2013, 74, 1003–1012. 3. Zhou, X.; Zhang, H.; Wang, C.Y. A robust image watermarking technique based on DWT, APDCBT, and SVD. Symmetry 2018, 10, 77. [CrossRef] 4. Wang, X.Y.; Shi, Q.L.; Wang, S.M.; Yang, H.Y. A blind robust digital watermarking using invariant exponent moments. Int. J. Electron. Commun. 2016, 70, 416–426. [CrossRef] 5. Wang, C.Y.; Zhang, Y.P.; Zhou, X. Robust image watermarking algorithm based on ASIFT against geometric attacks. Appl. Sci. 2018, 8, 410. [CrossRef] 6. Natgunanathan, I.; Xiang, Y.; Hua, G.; Beliakov, G.; Yearwood, J. Patchwork-Based multi-layer audio watermarking. IEEE Trans. Audio Speech Lang. Process. 2017, 25, 2176–2187. [CrossRef] 7. Xiang, S.J.; Li, Z.H. Reversible audio data hiding algorithm using noncausal prediction of alterable orders. EURASIP J. Audio Speech Music Process. 2017, 4.[CrossRef] 8. Hu, H.T.; Hsu, L.Y.; Chou, H.H. Variable-dimensional vector modulation for perceptual-based DWT blind audio watermarking with adjustable payload capacity. Digit. Signal Process. 2014, 31, 115–123. [CrossRef] 9. Singh, D.; Singh, S.K. DWT-SVD and DCT based robust and blind watermarking scheme for copyright protection. Multimed. Tools Appl. 2017, 76, 13001–13024. [CrossRef] 10. Lutovac, B.; Dakovi´c,M.; Stankovi´c,S.; Orovi´c,I. An algorithm for robust image watermarking based on the DCT and Zernike moments. Multimed. Tools Appl. 2017, 76, 23333–23352. [CrossRef] 11. Shukla, D.; Sharma, M. A new approach for scene-based digital video watermarking using discrete wavelet transforms. Int. J. Adv. Appl. Sci. 2018, 5, 148–160. [CrossRef] 12. Hu, H.T.; Chou, H.H.; Yu, C.; Hsu, L.Y. Incorporation of perceptually adaptive QIM with singular value decomposition for blind audio watermarking. EURASIP J. Adv. Signal Process. 2014, 12, 1–12. [CrossRef] 13. Glowacz, A. DC Motor Fault Analysis with the Use of Acoustic Signals, Coiflet Wavelet Transform, and K-Nearest Neighbor Classifier. Arch. Acoust. 2015, 40, 321–327. [CrossRef] 14. Li, Z.X.; Jiang, Y.; Hu, C.; Peng, Z. Recent progress on decoupling diagnosis of hybrid failures in gear transmission systems using vibration sensor signal: A review. Measurement 2016, 90, 4–19. [CrossRef] 15. Glowacz, A. Diagnostics of Direct Current machine based on analysis of acoustic signals with the use of symlet wavelet transform and modified classifier based on words. Eksploat. Niezawodn. Maint. Reliab. 2014, 16, 554–558. 16. Lei, W.N.; Chang, L.C. Robust and high-quality time-domain audio watermarking based on low-frequency amplitude modification. IEEE Trans. Multimed. 2006, 8, 46–59. Appl. Sci. 2018, 8, 723 17 of 17

17. Erfani, Y.; Siahpoush, S. Robust audio watermarking using improved TS echo hiding. Digit. Signal Process. 2009, 19, 809–814. [CrossRef] 18. Basia, P.; Pitas, I.; Nikolaidis, N. Robust audio watermarking in the time domain. IEEE Trans. Multimed. 1998, 3, 232–241. [CrossRef] 19. Kumsawat, P. A genetic algorithm optimization technique for multiwavelet-based digital audio watermarking. EURASIP J. Adv. Signal Process. 2010, 1, 1–10. [CrossRef] 20. Natgunanathan, I.; Xiang, Y.; Rong, Y.; Peng, D. Robust patchwork-based watermarking method for stereo audio signals. Multimed Tools Appl. 2014, 72, 1387–1410. [CrossRef] 21. Megias, D.; Serra-Ruiz, J.; Fallahpour, M. Efficient self-synchronized blind audio watermarking system based on time domain and FFT amplitude modification. Signal Process. 2010, 90, 3078–3092. [CrossRef] 22. Tewari, T.K.; Saxena, V.; Gupta, J.P. A digital audio watermarking algorithm using selective mid band DCT coefficients and energy threshold. Int. J. Audio Technol. 2014, 17, 365–371. 23. Hu, H.T.; Hsu, L.Y. Robust transparent and high capacity audio watermarking in DCT domain. Signal Process. 2015, 109, 226–235. [CrossRef] 24. Li, W.; Xue, X.; Lu, P. Localized audio watermarking technique robust against time-scale modification. IEEE Trans. Multimed. 2006, 8, 60–90. [CrossRef] 25. Chen, S.T.; Huang, H.N.; Chen, C.J.; Tseng, K.K.; Tu, S.Y. Adaptive audio watermarking via the optimization point of view on the wavelet-based entropy. Digit. Signal Process. 2013, 23, 971–980. [CrossRef] 26. Wu, Q.L.; Wu, M. Novel Audio information hiding algorithm based on Wavelet Transform. J. Electron. Inf. Technol. 2016, 38, 834–840. 27. Wu, S.; Huang, J.; Huang, D.; Shi, Y.Q. Efficiently self-synchronized audio watermarking for assured audio data transmission. IEEE Trans. Broadcast. 2005, 51, 69–76. [CrossRef] 28. Chen, S.T.; Wu, G.D.; Huang, H.N. Wavelet-domain audio watermarking algorithm using optimisation-based quantisation. IET Signal Process. 2010, 4, 720–727. [CrossRef] 29. Abd, F.; Samie, E. An efficient singular value decomposition algorithm for digital audio watermarking. Int. J. Audio Technol. 2009, 12, 27–45. 30. Asmara, R.A.; Agustina, R.; Hidayatulloh. Comparison of Discrete Cosine Transforms (DCT), Discrete Fourier Transforms (DFT), and Discrete Wavelet Transforms (DWT) in Digital Image Watermarking. Int. J. Adv. Comput. Sci. Appl. 2017, 8, 245–249. 31. Wang, X.; Qi, W.; Liu, P. A new adaptive digital audio watermarking based on support vector regression. J. Netw. Comput. Appl. 2008, 31, 735–749. [CrossRef] 32. Wang, X.Y.; Zhao, H. A novel synchronization invariant audio watermarking algorithm based on DWT and DCT. IEEE Trans. Signal Process. 2006, 54, 4835–4840. [CrossRef] 33. Vivekananda, B.K.; Sengupta, I.; Das, A. An adaptive audio watermarking based on the singular value decomposition in the wavelet domain. Digit. Signal Process. 2010, 20, 1547–1558. 34. Dhar, P.K.; Shimamure, T. Blind audio watermarking in transform domain based on singular value decomposition and exponential-log operations. Radioengineering 2017, 26, 552–561. [CrossRef] 35. Lei, B.Y.; Soon, I.Y.; Li, Z. Blind and robust audio watermarking algorithm based on SVD-DCT. Signal Process. 2011, 91, 1973–1984. [CrossRef] 36. Vivekananda, B.K.; Sengupta, I.; Das, A. A new audio watermarking scheme based on singular value decomposition and quantization. Circuits Syst. Signal Process. 2011, 30, 915–927.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).