applied sciences

Article Synthesis of Normal Sounds Using Generative Adversarial Networks and Empirical Wavelet Transform

Pedro Narváez * and Winston S. Percybrooks

Department of Electrical and Electronics Engineering, Universidad del Norte, Barranquilla 081001, Colombia; [email protected] * Correspondence: [email protected]

 Received: 31 July 2020; Accepted: 12 September 2020; Published: 8 October 2020 

Abstract: Currently, there are many works in the literature focused on the analysis of heart sounds, specifically on the development of intelligent systems for the classification of normal and abnormal heart sounds. However, the available heart sound databases are not yet large enough to train generalized machine learning models. Therefore, there is interest in the development of algorithms capable of generating heart sounds that could augment current databases. In this article, we propose a model based on generative adversary networks (GANs) to generate normal synthetic heart sounds. Additionally, a denoising algorithm is implemented using the empirical wavelet transform (EWT), allowing a decrease in the number of epochs and the computational cost that the GAN model requires. A distortion metric (mel–cepstral distortion) was used to objectively assess the quality of synthetic heart sounds. The proposed method was favorably compared with a mathematical model that is based on the morphology of the phonocardiography (PCG) signal published as the state of the art. Additionally, different heart sound classification models proposed as state-of-the-art were also used to test the performance of such models when the GAN-generated synthetic signals were used as test dataset. In this experiment, good accuracy results were obtained with most of the implemented models, suggesting that the GAN-generated sounds correctly capture the characteristics of natural heart sounds.

Keywords: generative adversarial network; heart sound classification; EWT; sound synthesis; machine learning; e-health

1. Introduction Cardiovascular diseases are one of the leading causes of death in the world. According to recent reports from the World Health Organization and the American Heart Association, more than 17 million people die each year from these diseases. Most of these deaths (about 80%) occur in low- and middle-income countries [1,2]. Tobacco use, unhealthy eating, and lack of physical activity are the main causes of heart disease [1]. Currently, there are sophisticated equipment and tests for diagnosing heart disease, such as: electrocardiogram, holter monitoring, echocardiogram, stress test, , computed tomography scan, and magnetic resonance imaging [3]. However, most of this equipment is very expensive, and must be used by specialized technicians and medical doctors, which limits its availability in rural and urban areas that do not have the necessary financial resources [4]. Therefore, even today, it is common in such scenarios for non-specialized medical personnel to rely on basic auscultation with an stethoscope as a primary screening tool for the detection of many cardiac abnormalities and heart diseases [5]. However, to be effective, this method requires a sufficiently trained ear to identify cardiac

Appl. Sci. 2020, 10, 7003; doi:10.3390/app10197003 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, x 7003 FOR PROOFREADING 2 of 16 trained ear to identify cardiac conditions. Unfortunately, the literature suggests that in recent years conditions.such auscultation Unfortunately, training has the been literature in decline suggests [6–8]. that in recent years such auscultation training has beenThis in decline situation [6–8 has]. motivated the development of computer classification models to support the identificationThis situation of normal has motivated and abnormal the development heart sounds ofby computer non-specialized classification health professionals. models to support To date, the identificationmany investigations of normal related and to abnormal the analysis heart and sounds synthesis by non-specialized of heart sounds health (HSs) professionals. have been published, To date, manyobtaining investigations good results related especially to the in analysis the classification and synthesis of normal of heart and sounds abnormal (HSs) heart have sounds been published, [9–25]. obtainingHeart good sounds results are closely especially related in the to classificationboth, the vibration of normal of the and entire abnormal myocardial heart structure sounds [9 and–25]. the vibrationHeart of sounds the heart are valves closely during related closure to both, and the opening. vibration A of recording the entire of myocardial heart sounds structure is composed and the vibrationof series of of cardiac the heart cycles. valves A duringnormal closurecardiac andcycle, opening. as shown A recordingin Figure 1, of is heart composed sounds by is the composed S1 sound of series(generated of cardiac by the cycles. closing A of normal the atrioventricular cardiac cycle, va aslve), shown the inS2 Figure sound1 (generated, is composed by the by theclosing S1 sound of the (generatedsemilunar valve), by the the closing systole of the (range atrioventricular between S1 and valve), S2), the and S2 the sound diastole (generated (range included by the closing between of the S2 semilunarand S1 of the valve), next the cycle) systole [26]. (range Abnormalities between S1are and represented S2), and the by diastole murmurs (range that includedusually occur between in the S2 andsystolic S1 ofor the diastolic next cycle) interval [26 ].[27]. Abnormalities Health professi areonals represented use different by murmurs attributes that of usually heart murmurs occur in thefor systolictheir classification, or diastolic intervalthe most [27 common]. Health are: professionals timing, cadence, use diff duration,erent attributes pitch, and of heart shape murmurs of the murmur for their classification,[28,29]. Therefore the most the commonexaminer are: mu timing,st have cadence, an ear duration,sufficiently pitch, trained and shape to identify of the murmur each of [ 28these,29]. Thereforeattributes. the examiner must have an ear sufficiently trained to identify each of these attributes.

Figure 1. Example signal for a normal heart sound.

Despite the amount of work related to the classificationclassification of heart sounds, it is still didifficultfficult to statistically evaluate the robustness of these algorithms,algorithms, since the number of samples used in training is not susufficientfficient toto guaranteeguarantee a a generalized generalized model. model. Similarly, Similarly, there there have have been been no no significant significant advances advances in thein the classification classification of specific of specific types oftypes heart of murmurs, heart murmurs, due to the limiteddue to availabilitythe limited of availability corresponding of labelscorresponding in current labels public in databases current [public30,31]. Indatabases this sense, [30,31]. having In a this model sense, for the having generation a model of synthetic for the sounds,generation capable of synthetic of outputting sounds, varied capable synthetic of outputti heartng varied sounds synthetic indistinguishable heart sounds from indistinguishable natural ones by medicalfrom natural personnel, ones by could medical be used pers toonnel, augment could existing be used databases to augment for training existing robust databases machine for learningtraining models.robust machine However, learning heart soundmodels. signals However, are highly heart non-stationary,sound signals are and highly their level non-stationary, of complexity and makes their obtaininglevel of complexity good generative makes obtaining models very good challenging. generative models very challenging. In the the literature, literature, there there are are several several publications publications related related to the to generation the generation of synthetic of synthetic heart sounds heart sounds[16–25]. [16All– 25these]. All works these are works based are on based mathematical on mathematical models models to generate to generate the S1 the and S1 S2 and sections S2 sections of a ofcardiac a cardiac cycle. cycle. On the On other the other hand, hand, the systolic the systolic and anddiastolic diastolic intervals intervals of the of cardiac the cardiac cycle cycle are not are notadequately adequately modeled, modeled, and andas a result, as a result, do not do present not present the variability the variability recorded recorded in natural in natural normal normal heart heartsounds. sounds. Therefore, Therefore, these thesesynthetic synthetic models models are arenot notsuitable suitable to totrain train HS HS classification classification models. Additionally, a basicbasic time–frequencytime–frequency analysis of these synthetic signals shows that they are very didifferentfferent from natural signals. Table 1 1 presents presents aa comparisoncomparison ofof thethe didifferentfferent heart sound generative methods found in the Web ofof Science,Science, Scopus,Scopus, andand IEEEIEEE XploreXplore databases.databases. On the other hand, in recent years there have been great advances in the synthesis of audio, mainly speech, using machine learning techniques, specifically with deep learning. In Table 2, several proposed methods to improve audio synthesis are presented. According to the limitations presented in the currently proposed models for the generation of synthetic heart sounds, and taking into account the significant advances in voice synthesis using deep learning methods, in this work we propose a model based on generative adversarial networks (GANs) to generate only normal heart sounds that can be used to train machine learning models. This article is organized as follows: Section 2 presents a definition of the GAN architecture; in Section Appl. Sci. 2020, 10, 7003 3 of 16

Table 1. Timeline of methods for the generation of synthetic heart sounds.

Year Author Previous Method 1992 Tang et al. [16] Exponentially damped sinusoidal model 1995 Trang et al. [17] S1 and S2 as transient–linear chirp signals 1998 Zhang et al. [18] Sum of Gaussian-modulated sinusoidal 2000 Xu et al. [19] S1 and S2 as transient–nonlinear chirp signal 2009 Toncharoen et al. [20] A heart-sound-like chaotic attractor 2011 Almasi et al. [21] A dynamical model based in electrocardiogram (ECG) signal Amplitude and width modification of S1 and S2 sounds from real heart 2012 Tao et al. [22] sounds, and combining them with noise 2013 Jablouna et al. [23] Coupled ordinary differential equations Estimation of the second heart-sound split using windowed 2018 Saederup [24] sinusoidal models Sum of almost periodically recurring deterministic “wavelets”; S1 and S2 2018 Joseph et al. [25] are modeled by two sinusoidal pulses of Gaussian modulation.

On the other hand, in recent years there have been great advances in the synthesis of audio, mainly speech, using machine learning techniques, specifically with deep learning. In Table2, several proposed methods to improve audio synthesis are presented.

Table 2. Previous methods for generation of synthetic audio.

Year Author Previous Method Synthetic Signal WaveNet: Probabilistic and autoregressive Music; 2015 van den Oord et al. [32] model based on deep neural networks (DNNs) text-to-speech 2017 Engel et al. [33] WaveNet and autoenconders Musical notes 2017 Bollepalli et al. [34] Generative adversarial network (GAN) Glottal waveform 2018 Biagetti et al. [35] Hidden Markov model Text-to-speech 2019 Donahue et al. [36] WaveGAN: GANs unsupervised Intelligible words

According to the limitations presented in the currently proposed models for the generation of synthetic heart sounds, and taking into account the significant advances in voice synthesis using deep learning methods, in this work we propose a model based on generative adversarial networks (GANs) to generate only normal heart sounds that can be used to train machine learning models. This article is organized as follows: Section2 presents a definition of the GAN architecture; in Section3, the proposed method is described; Section4 presents experimental results, using mel–cepstral distortion (MCD) and heart sound classification models; finally, the conclusions and discussions of the proposed work and experiments are presented in Section5.

2. Generative Adversarial Networks (GANs) Generative adversarial networks (GANs) are architectures of deep neural networks widely used in the generation of synthetic images [37]. This architecture is composed of two neural networks that face each other, called the generator and discriminator [38]. In Figure2, a general diagram of a GAN architecture is presented. Appl. Sci. 2020, 10, 7003 4 of 16 Appl. Sci. 2020, 10, x FOR PROOFREADING 4 of 16 Appl. Sci. 2020, 10, x FOR PROOFREADING 4 of 16

Figure 2. General diagram of a GAN. Figure 2. General diagram of a GAN. The generator (counterfeiter) needs to learn to createcreate data in such a wayway thatthat thethe discriminatordiscriminator The generator (counterfeiter) needs to learn to create data in such a way that the discriminator can no longer distinguish it as false.false. The competition between these two networks is what improves can no longer distinguish it as false. The competition between these two networks is what improves their knowledge,knowledge, untiluntil thethe generatorgenerator managesmanages toto createcreate realisticrealistic data.data. their knowledge, until the generator manages to create realistic data. As aa result, result, the the discriminator discriminator must must be able be toable correctly to correctly classify classify the data the generated data generated by the generator by the As a result, the discriminator must be able to correctly classify the data generated by the asgenerator real or false.as real This or false. means This that means their weightsthat their are weights updated are toupdated minimize to minimize the probability the probability that any false that generator as real or false. This means that their weights are updated to minimize the probability that dataany false will bedata classified will be asclassified belonging as tobelonging the actual to data the actual set. On data the otherset. On hand, the other the generator hand, the is trainedgenerator to any false data will be classified as belonging to the actual data set. On the other hand, the generator trickis trained the discriminator to trick the discriminator by generating by data generating as realistically data as as realistically possible, which as possible, means thatwhich the means weights that of is trained to trick the discriminator by generating data as realistically as possible, which means that the generatorweights of are the optimized generator to are maximize optimized the to probability maximize that the theprobability false data that it generates the false data will beit generates classified the weights of the generator are optimized to maximize the probability that the false data it generates aswill real be byclassified the discriminator as real by the [38 ].discriminator [38]. will be classified as real by the discriminator [38]. The generator is an inverseinverse convolutionalconvolutional network—that is, it takes a random noise vectorvector andand The generator is an inverse convolutional network—that is, it takes a random noise vector and converts itit intointo anan image—unlikeimage—unlike thethe discriminator,discriminator, which which takes takes an an image image and and samples samples it it to to produce produce a converts it into an image—unlike the discriminator, which takes an image and samples it to produce probability.a probability. In In other other words, words, the the discriminator discriminator ( D(D)) and and generator generator ((GG)) playplay thethe followingfollowing two-playertwo-player a probability. In other words, the discriminator (D) and generator (G) play the following two-player minmin/max/max gamegame withwith valuevalue functionfunction L(L(GG,, DD),), asas describeddescribed inin EquationEquation (1):(1): min/max game with value function L(G, D), as described in Equation (1): min max 𝐿(𝐷, 𝐺)∼ =𝔼 ()log 𝐷(𝑥) +𝔼∼ ()log(1 − 𝐷(𝐺(𝑧)) min max L(D, G) = Ex ρ(x)[log D(x)] + Ez ρ(z)[log(1 D(G(z))] (1)(1) minG maxD 𝐿(𝐷, 𝐺)∼ =𝔼 data ()log 𝐷(𝑥) +𝔼∼data ()log(1 − 𝐷(𝐺(𝑧)) ∼ ∼ − (1) where D(x) represents the probability that x is estimated by the discriminator, z represents the input wherewhere DD((xx)) representsrepresents the the probability probability that thatx xis is estimated estimated by by the the discriminator, discriminator,z represents z represents the the input input of of random variables from the generator, and 𝜌(𝑥) and 𝜌(𝑧) denote the data distribution and randomof random variables variables from from the the generator, generator, and andρ 𝜌 (x)(and𝑥) andρ 𝜌(z) denote(𝑧) denote the data the data distribution distribution and and the the distribution of samples from the generator,data respectively.data distributionthe distribution of samples of samples from from the generator,the generator, respectively. respectively. After several training steps, if the generator and the discriminator have sufficient capacity (if the After several training steps, if the generator and the discriminator have susufficientfficient capacity (if the networks can approach the objective functions), they will reach a point where both can no longer networks can approach the objective functions), th theyey will reach a point where both can no longer improve. At this point, the generator generates realistic synthetic data, and the discriminator cannot improve.improve. At this point, the generator generates realisticrealistic synthetic data, and the discriminator cannot differentiate between the two input types. didifferentiatefferentiate betweenbetween thethe twotwo inputinput types.types. 3. Proposed Method 3. Proposed Method The proposed method is made up of two main stages, as shown in Figure 3. The first stage The proposedproposed method method is is made made up up of twoof two main main stages, stages, as shown as shown in Figure in 3Figure. The first3. The stage first consists stage consists of the implementation of a GAN architecture to generate a synthetic heart sound, and the ofconsists the implementation of the implementation of a GAN of architecturea GAN architectu to generatere to generate a synthetic a synthetic heart sound, heart andsound, the and second the second stage is in charge of reducing the noise of the synthetic signal using an empirical wavelet stagesecond is stage in charge is in of charge reducing of reducing the noise the of thenoise synthetic of the signalsynthetic using signal an empirical using an waveletempirical transform wavelet transform (EWT). This last stage consists of a post-processing applied to the signal generated by the (EWT).transform This (EWT). last stage This consistslast stage of consists a post-processing of a post-processing applied to applied the signal to the generated signal generated by the GAN, by the in GAN, in order to attenuate the noise level. Therefore, it makes it possible to reduce the number of orderGAN, toin attenuateorder to attenuate the noise the level. noise Therefore, level. Therefor it makese, it it makes possible it possible to reduce to thereduce number the number of epochs of epochs (and consequently the computational cost) required to train the GAN until obtaining a low- (andepochs consequently (and consequently the computational the computational cost) required cost) required to train to the train GAN the untilGAN obtaininguntil obtaining a low-noise a low- noise output signal. outputnoise output signal. signal.

Figure 3. General diagram of proposed method. Figure 3. General diagram of proposed method. Figure 4 shows the diagram of the implemented GAN architecture, and each of its components Figure4 4 shows shows the the diagram diagram of of the the implemented implemented GAN GAN architecture, architecture, and and each each of of its its components components is is described below. describedis described below. below. • Noise: a gaussian noise with a size of 2000 samples is used as input to the generator. The • Noise: a gaussian noise with a size of 2000 samples is used as input to the generator. The mean and standard deviation of the noise’s distribution are 0 and 1, respectively; mean and standard deviation of the noise’s distribution are 0 and 1, respectively; • Generator model: Figure 5 shows a diagram of the generating network, and the • Generator model: Figure 5 shows a diagram of the generating network, and the hyperparameters of each layer are specified. It begins with a dense layer with ReLu hyperparameters of each layer are specified. It begins with a dense layer with ReLu Appl. Sci. 2020, 10, 7003 5 of 16

Noise:Appl. Sci. 2020 a gaussian, 10, x FOR noisePROOFREADING with a size of 2000 samples is used as input to the generator. The5 of 16 mean • and standard deviation of the noise’s distribution are 0 and 1, respectively; activation function, followed by three convolutional layers with filters of size 128, 64, and 1, Generator model: Figure5 shows a diagram of the generating network, and the hyperparameters • respectively; each of these layers have ReLu activation function, a kernel size of 3, and a of each layerstride areof 1. specified. Finally, there It begins is a dense with alayer dense with layer a tanh with activation ReLu activation function. function, The padding followed by threeparameter convolutional is set to layers “same” with to maintain filters of the size same 128, data 64, anddimension 1, respectively; in the input each and ofoutput these of layers have ReLuthe activationconvolutional function, layer; a kernel size of 3, and a stride of 1. Finally, there is a dense layer with•a tanhDiscriminator activation model: function. Figure The 6 shows padding a diagram parameter of the discriminator is set to “same” network. to maintain It begins with thesame data dimensiona dense layer in the with input ReLu and activation output of function, the convolutional followed by layer; four convolutional layers with Discriminatorfilters of model: size 256, Figure 128,6 64,shows and a32, diagram respectively; of the each discriminator of these layers network. uses Leaky It begins ReLu with a • activation function, a kernel size of 3, and a stride of 1; additionally, between each dense layer with ReLu activation function, followed by four convolutional layers with filters of convolution layer there is a dropout of 0.25. Finally, there is a dense layer with a tanh size 256, 128, 64, and 32, respectively; each of these layers uses Leaky ReLu activation function, a activation function. The padding parameter is set to “same” to maintain the same data kernel sizedimension of 3, and in athe stride input of and 1; additionally,output of the convolutional between each layer; convolution layer there is a dropout of 0.25.• Finally,Dataset thereof heart is sounds: a dense 100 layer normal with heart a tanh sounds activation obtained function. from the The Physionet padding database parameter [30] is set to “same”were to maintainused, with the a sampling same data frequency dimension of 2 inKHz the and input 1 s andof duration. output ofFor the this convolutional dataset, those layer; Dataset ofsignals heart with sounds: a similar 100 heart normal rate heart were soundsselected—t obtainedhat is, all from signals the have Physionet a similar database systolic [and30] were • used, withdiastolic a sampling interval frequencyduration; of 2 KHz and 1 s of duration. For this dataset, those signals • witha similarOptimization: heart ratethe Adam were selected—thatoptimizer was used, is, all since signals it is haveone of a the similar best performers systolic and in diastolicthis type of architecture. A learning rate of 0.0002 and a beta of 0.3 were set; interval duration; • Loss function: a binary, cross-entropy function weas used in this work. This function Optimization: the Adam optimizer was used, since it is one of the best performers in this type of • computes the cross-entropy loss between true labels and predicted labels. architecture.Subsequently, A learning the difference rate of between 0.0002 and generator a beta an ofd 0.3 discriminator were set; losses was analyzed. If this Lossdifference function: was greater a binary, than cross-entropy 0.5, the input data function to the weasdiscriminator used in was this switched work. This to a functionGaussian noise computes • thewith cross-entropy a mean of 0 and loss standard between deviation true labels of 1, and and not predicted the generator labels. output as it was otherwise. With this method, a convergence in the loss functions of the generator and discriminator could be achieved.

Appl. Sci. 2020, 10, x FOR PROOFREADING 6 of 16

Taking into account that the frequency range of the S1 and S2 sounds is between 20–200 Hz [44], it was decided to modify the edge selection method that determines the number of components for the EWT algorithm. The signal is then broken down into two frequency bands: the first component corresponds to the frequency range between 0–200 Hz, while the second component corresponds to frequencies over 200 Hz. Therefore, in this work, the signal corresponding to the first component is used. FigureFigure 4. 4. ProposedProposed GAN GAN diagram. diagram.

As mentioned before, the second stage of the proposed method aims to reduce the noise level of the synthetic signal generated by the GAN model. It is understood that as the number of epochs in the training of the generator and discriminator models increases, the noise of the synthetic signal is attenuated; however, it requires many epochs, and in turn, a long computation time [39]. Therefore, in order to reduce the number of GAN training epochs required to generate synthetic signals with acceptable noise levels, it was decided to introduce a post-processing stage using the algorithm proposed in [40], called empirical wavelet transform (EWT). The EWT allows the extraction of different components of a signal by designing an appropriate filter bank. The theory of the EWT algorithm is described in detail in [40]. This algorithm has been used in different signal processing applications [41–43]. In [9], a modified version of this algorithm was used as a pre-processing stage in the analysis of heart sounds. Its implementation is described in more detail in [9]. In this work, it was decided to use as a reference the method proposed in [9] to reduce the noise of the synthetic

signal. FigureFigure 5. 5.Architecture Architecture of the generator generator model. model.

Figure 6. Architecture of the discriminator model.

Taking into account that the input of the GAN model is a Gaussian noise, and in turn, the output of the model during the first epochs of training is expected to be a signal mixed with Gaussian noise, it was decided to do a test using a real heart signal mixed with Gaussian noise to evaluate the performance of the proposed EWT filter on signals with the same expected characteristics of the generator’s output. Figure 7A,B shows, respectively, a real heart sound and the same heart sound mixed with low-amplitude Gaussian noise, while Figure 7C,D shows their respective Fourier transforms (FFTs). In this last figure, the frequency of the components between 0 and 200 Hz can be seen in blue, and the frequency of components above 200 Hz caused by the Gaussian noise are in green. The signal shown in Figure 7B was then used as an input example to the proposed EWT algorithm to illustrate its de-noising action. Figure 7D shows the two frequency bands extracted with EWT using the FFT, while Figure 7E,F shows the two components extracted from the noisy signal in the time domain. As can be seen, the signal obtained in Figure 7E presents a lower noise level, and is comparable with the original cardiac signal shown in Figure 7A. Appl. Sci. 2020, 10, x FOR PROOFREADING 6 of 16

Taking into account that the frequency range of the S1 and S2 sounds is between 20–200 Hz [44], it was decided to modify the edge selection method that determines the number of components for the EWT algorithm. The signal is then broken down into two frequency bands: the first component corresponds to the frequency range between 0–200 Hz, while the second component corresponds to frequencies over 200 Hz. Therefore, in this work, the signal corresponding to the first component is used.

Appl. Sci. 2020, 10, 7003 6 of 16 Figure 5. Architecture of the generator model.

FigureFigure 6. ArchitectureArchitecture of of the the discriminator discriminator model.

TakingSubsequently, into account the di thatfference the input between of the generator GAN mode andl is discriminator a Gaussian noise, losses and was in analyzed.turn, the output If this ofdi fftheerence model was during greater the than first 0.5, epochs the inputof training data to is theexpected discriminator to be a signal was switched mixed with to a Gaussian Gaussian noise, noise itwith was a meandecided of 0 to and do standard a test using deviation a real of 1,heart and notsignal the generatormixed with output Gaussian as it was noise otherwise. to evaluate With thisthe performancemethod, a convergence of the proposed in the lossEWT functions filter on of signal the generators with the and same discriminator expected characteristics could be achieved. of the generator’sAs mentioned output. before,Figure the7A,B second shows, stage respective of the proposedly, a real methodheart sound aims and to reduce the same the noiseheart levelsound of mixedthe synthetic with low-amplitude signal generated Gaussian by the GAN noise, model. while It Figure is understood 7C,D shows that as their the number respective of epochs Fourier in transformsthe training (FFTs). of the generatorIn this last and figure, discriminator the frequency models of the increases, components the noisebetween of the 0 and synthetic 200 Hz signal can be is seenattenuated; in blue, however, and the itfrequency requires manyof components epochs, and above in turn, 200 Hz a long caused computation by the Gaussian time [39 ].noise Therefore, are in green.in order The to reducesignal shown the number in Figure of GAN 7B was training then epochsused as required an input to example generate to synthetic the proposed signals EWT with algorithmacceptable to noise illustrate levels, its de-noising it was decided action. to Figu introducere 7D shows a post-processing the two frequency stage bands using extracted the algorithm with EWTproposed using in the [40 FFT,], called while empiricalFigure 7E,F wavelet shows transformthe two components (EWT). The extracted EWT allows from the the noisy extraction signal in of thediff erenttime domain. components As can of be a signalseen, the by signal designing obtained an appropriate in Figure 7E filter presents bank. a lower The theory noise level, of the and EWT is comparablealgorithm is with described the original in detail cardiac in [40 signal]. This shown algorithm in Figure has been 7A. used in different signal processing applications [41–43]. In [9], a modified version of this algorithm was used as a pre-processing stage in the analysis of heart sounds. Its implementation is described in more detail in [9]. In this work, it was decided to use as a reference the method proposed in [9] to reduce the noise of the synthetic signal. Taking into account that the frequency range of the S1 and S2 sounds is between 20–200 Hz [44], it was decided to modify the edge selection method that determines the number of components for the EWT algorithm. The signal is then broken down into two frequency bands: the first component corresponds to the frequency range between 0–200 Hz, while the second component corresponds to frequencies over 200 Hz. Therefore, in this work, the signal corresponding to the first component is used. Taking into account that the input of the GAN model is a Gaussian noise, and in turn, the output of the model during the first epochs of training is expected to be a signal mixed with Gaussian noise, it was decided to do a test using a real heart signal mixed with Gaussian noise to evaluate the performance of the proposed EWT filter on signals with the same expected characteristics of the generator’s output. Figure7A,B shows, respectively, a real heart sound and the same heart sound mixed with low-amplitude Gaussian noise, while Figure7C,D shows their respective Fourier transforms (FFTs). In this last figure, the frequency of the components between 0 and 200 Hz can be seen in blue, and the frequency of components above 200 Hz caused by the Gaussian noise are in green. The signal shown in Figure7B was then used as an input example to the proposed EWT algorithm to illustrate its de-noising action. Figure7D shows the two frequency bands extracted with EWT using the FFT, while Figure7E,F shows the two components extracted from the noisy signal in the time domain. As can be seen, the signal obtained in Figure7E presents a lower noise level, and is comparable with the original cardiac signal shown in Figure7A. Appl. Sci. 2020, 10, 7003 7 of 16 Appl. Sci. 2020, 10, x FOR PROOFREADING 7 of 16

Figure 7. (A) Real heart sound; ( B) real heart sound with Gaussian noise; ( C) Fourier transform (FFT) of realreal heartheart sound; sound; (D ()D FFT) FFT of heartof heart sound sound with with Gaussian Gaussian noise; noise; (E) empirical (E) empirical wavelet wavelet transform transform (EWT) component(EWT) component of the noisy of the signal noisy insignal the frequencyin the frequency range ofrange 0–200 of Hz;0–200 (F )Hz; EWT (F) component EWT component of the of noisy the signalnoisy signal in the frequencyin the frequency range greaterrange greater than 200 than Hz. 200 Hz.

4. Experiments and Results The proposed GAN model was trained for a total of 2000 epochs. Figure 8 shows sample output signals generated at different epochs. As can be observed, as the epochs increase, the signals present a more realistic form. In this work, the EWT filter is a post-processing stage that is applied to the synthetic signal generated with 2000 training epochs, in order to reduce the noise level of the generator output. Therefore, the EWT filter is not part of the training loop for the GAN. This number Appl. Sci. 2020, 10, 7003 8 of 16

4. Experiments and Results The proposed GAN model was trained for a total of 2000 epochs. Figure8 shows sample output signals generated at different epochs. As can be observed, as the epochs increase, the signals present a more realistic form. In this work, the EWT filter is a post-processing stage that is applied to the synthetic signal generated with 2000 training epochs, in order to reduce the noise level of the generator output. Appl. Sci. 2020, 10, x FOR PROOFREADING 8 of 16 Therefore, the EWT filter is not part of the training loop for the GAN. This number of epochs was determined after observing the synthetic signals obtained at different training points (from 100 epochs of epochs was determined after observing the synthetic signals obtained at different training points to 12,000 epochs). It was observed that from 2000 epochs, the synthetic signal has a shape very similar (from 100 epochs to 12,000 epochs). It was observed that from 2000 epochs, the synthetic signal has a to a natural signal, but with a relatively high noise level, as shown in Figure8E. Therefore, it was shape very similar to a natural signal, but with a relatively high noise level, as shown in Figure 8E. decided to generate the synthetic signals up to 2000 epochs, and subsequently apply the proposed Therefore, it was decided to generate the synthetic signals up to 2000 epochs, and subsequently apply EWT algorithm. Figure8F,G shows the results of a synthetic signal generated with 12,000 training the proposed EWT algorithm. Figure 8F,G shows the results of a synthetic signal generated with epochs without applying an EWT algorithm or a natural signal, respectively. 12,000 training epochs without applying an EWT algorithm or a natural signal, respectively.

Figure 8. Cont. Appl. Sci. 2020, 10, 7003 9 of 16 Appl. Sci. 2020, 10, x FOR PROOFREADING 9 of 16

FigureFigure 8.8. ((AA)) Synthetic Synthetic signal signal with with 100 100 epochs, epochs, (B) ( syntheticB) synthetic signal signal with with 500 epochs, 500 epochs, (C) synthetic (C) synthetic signal withsignal 1000 with epochs, 1000 epochs, (D) synthetic (D) synthetic signal with signal 2000 with epochs, 2000 epochs, (E) synthetic (E) synthetic signal with signal 2000 with epochs 2000 +epochsEWT, (+F )EWT, synthetic (F) synthetic signal with signal 12,000 with epochs, 12,000 andepochs, (G) naturaland (G) signalnatural of signal heart sound.of heart sound.

InIn thisthis work,work, aa comparisoncomparison isis mademade betweenbetween thethe proposedproposed methodmethod andand aa mathematicalmathematical modelmodel proposedproposed inin [[21],21], inin order order to to determine determine which which method method generates generates a cardiaca cardiac signal sign realistical realistic enough enough to beto usedbe used by aby classification a classification model. model. The The mathematical mathematical method method [21] [21] was was inspired inspired by a by dynamic a dynamic model model that generatesthat generates a synthetic a synthetic electrocardiographic electrocardiographic signal, signal, as described as described [45]. Therefore, [45]. Therefore, this model this is basedmodel on is thebased morphology on the morphology of the phonocardiographic of the phonocardiographic signal, and si hasgnal, been and used has been as a referenceused as a inreference other proposed in other methodsproposed [ 23methods]. The equation [23]. The for equation the reference for the modelreference [21 ]model is described [21] is below:described below:

( ) . ( )  2 2  𝛼 (θ µi) (θ µi) X  ( − ) (− )  𝑧 = −. (𝜃α−i 𝜇)𝑒 2σ2cos(2𝜋𝑓𝜃 − 𝜑) +2𝜋𝛼𝑓𝑒 2σ2 sın(2𝜋𝑓𝜃 − 𝜑)  (2) = 𝜎  ( ) i ( ) + i ( ) z  θ µi e cos 2π fiθ ϕi 2παi fie sin 2π fiθ ϕi  (2) ∈−  σi − − −  i S1 S1+S2 S2+ ∈{ − − } Where 𝛼 , 𝜇, and 𝜎 are the parameters of amplitude, center, and width of the Gaussian terms, where α , µ , and σ are the parameters of amplitude, center, and width of the Gaussian terms, respectively;i i 𝑓 andi 𝜑 are the frequency and the phase shift of the sinusoidal terms, respectively; respectively;and 𝜃 is an findependenti and ϕi are the parameter frequency in and radians the phase that shift varies of thein sinusoidalthe range terms,-π, π for respectively; each beat. and Theθ isparameters an independent used by parameter the authors in radiansin [21] are that summarized varies in the in rangeTable -3.π , π for each beat. The parameters used by the authors in [21] are summarized in Table3. The ordinary differential equation .𝑧 was solved using the numerical Runge–Kutta method of fourthThe order, ordinary using di thefferential Matlab equation software.z Usingwas solved the values using in the Table numerical 3, we obtain Runge–Kutta the graph method in Figure of fourth9A, which order, represents using the the Matlab S1 and software. S2 sounds Using of the a cardiac values incycle. Table Figure3, we obtain 9B shows the graph a natural in Figure signal9A, of whichheart sound. represents the S1 and S2 sounds of a cardiac cycle. Figure9B shows a natural signal of heart sound. Appl. Sci. 2020, 10, 7003 10 of 16

Table 3. Parameters used in [21] to generate normal heart sounds. Appl. Sci. 2020, 10, x FOR PROOFREADING 10 of 16 Index (i) S1 (-) S1 (+) S2 (-) S2 (+) Table 3. Parameters used in [21] to generate normal heart sounds. αi 0.4250 0.6875 0.5575 0.4775

µi (radians) Index (i) π/12 S1 (-) 3π/19 S1 (+) S2 3(-)π/ 4 S2 (+) 7π/9 𝛼 0.4250 0.6875 0.5575 0.4775 σi 0.1090 0.0816 0.0723 0.1060 𝜇 (radians) 𝜋/12 3𝜋/19 3𝜋/4 7𝜋/9 fi (Hz) 10.484 11.874 11.316 10.882 𝜎 0.1090 0.0816 0.0723 0.1060 ϕi (radians) 3π/4 9π/11 7π/8 3π/4 𝑓 (Hz) 10.484 11.874 11.316 10.882

𝜑 (radians) 3𝜋/4 9𝜋/11 7𝜋/8 3𝜋/4

FigureFigure 9. (A 9.) Synthetic(A) Synthetic heart heart sound sound generated generated by by the the model model [21 [21];]; (B )(B natural) natural signal signal of of the the heart heart sound. sound. 4.1. Results Using Mel–Cepstral Distortion (MCD) 4.1. Results Using Mel–Cepstral Distortion (MCD) Mel–cepstral distortion (MCD) is a metric widely used to objectively evaluate audio quality [46], and itsMel–cepstral calculation distortion is based on(MCD) Mel-frequency is a metric widely cepstral used coe toffi objectivelycients (MFCC). evaluate This audio method quality has [46], been widelyand its used calculation in the evaluation is based on of Mel-frequency voice signals, cepstral since many coefficients automatic (MFCC). voice This recognition method has models been use featurewidely vectors used in based the evaluation on MFCC of coe voicefficients signals, [46 since]. The many parameters automatic used voice for recognition MFCC extraction models areuse the following:feature vectors the length based of on the MFCC analysis coefficients window [46]. (frame) The inparameters seconds is used 0.03 for s, theMFCC step extraction between successiveare the windowsfollowing: in secondthe length is 0.015 of the s, analysis the number window of cepstra (frame) to in return seconds in each is 0.03 windows s, the step (frame) between is 14, successive the number ofwindows filters in thein second filterbank is 0.015 is 22, s, the the FFT number size isof 4000, cepstra the to lowest return band in each edge windows of mel filters (frame) is 0, is the 14, highest the number of filters in the filterbank is 22, the FFT size is 4000, the lowest band edge of mel filters is 0, band edge of mel filters is 0.5, and no window function is applied to the analysis window of each frame. the highest band edge of mel filters is 0.5, and no window function is applied to the analysis window Basically, MCD is a measure of the difference between two MFCC sequences. In Vasilijevic and of each frame. Petrinovic [47], different ways of calculating this distortion are presented. In Equation (3), the formula Basically, MCD is a measure of the difference between two MFCC sequences. In Vasilijevic and used in [46] is defined, where C and C ˆ are the MFCC vectors of a frame of the original Petrinovic [47], different ways of calculatingMFCC thisMFCC distortion are presented. In Equation (3), the formula and study signal, respectively, and L represents the number of coefficients in that frame. MCDFRAME used in [46] is defined, where 𝐶 and 𝐶^ are the MFCC vectors of a frame of the original and represents the MCD result obtained in a frame. study signal, respectively, and L represents the number of coefficients in that frame. 𝑀𝐶𝐷 represents the MCD result obtained in a frame. L X 2 MCDFRAME = CMFCC[l] CMFCCˆ [l] (3) − 𝑀𝐶𝐷 =𝐶l=1 𝑙 −𝐶^ 𝑙 (3) In this work, it was decided to use this objective measurement method to evaluate the similarity betweenIn this natural work, and it was synthetic decided heart to use signals, this objectiv takinge intomeasurement account that method heart to soundsevaluate are the audio similarity signals between natural and synthetic heart signals, taking into account that heart sounds are audio signals that are typically evaluated using human hearing, and the MFCC coefficients have already been used in the analysis of heart sound signals [9–12]. Appl. Sci. 2020, 10, 7003 11 of 16 that are typically evaluated using human hearing, and the MFCC coefficients have already been used in the analysis of heart sound signals [9–12]. A set of 400 natural normal heart sounds taken from the Physionet [30] and Pascal [31] databases Appl. Sci. 2020, 10, x FOR PROOFREADING 11 of 16 were used. Each signal was cut to a single cardiac cycle, with a normalized duration of 1 s, applying a Appl. Sci. 2020, 10, x FOR PROOFREADING 11 of 16 resamplingA set on of the 400 signal. natural Signals normal wereheart alsosounds normalized taken from in the amplitude, Physionet [30] and and those Pascal signals [31] databases with a similar heartwere rate used. wereA set Eachof chosen. 400 signal natural Those was normal cut natural to heart a single sounds signals cardiac taken are cycle, comparedfrom withthe Physionet a tonormalized a total [30] ofand duration 50 Pascal synthetic of[31] 1 databasess, heartapplying sounds generateda wereresampling using used. Each the on proposed thesignal signal. was method,cutSignals to a singlewere and 50cardiacalso synthetic normalized cycle, with signals in a amplitude,normalized were generated durationand those using of signals1 s, theapplying with model a [21]. In thesimilar casea resampling ofheart the rate model on werethe [ 21signal. chosen.], the Signalsα Thosei parameters were natural also signal werenormalizeds obtained are compared in amplitude, with randomto a andtotal those variablesof 50 signalssynthetic in with the heart rangea of 0.3 tosounds 0.7,similar in ordergenerated heart torate generate usingwere chosen.the di proposedfferent Those wave naturalmethod, signals. signal and Thes50 are synthetic othercompared parameters signals to a totalwere were of generated 50 established synthetic using heart as the shown in Tablemodelsounds3. Additionally,[21]. generated In the case using the of thethe synthetic modelproposed [21], signal method, the was𝛼 parametersan mixedd 50 synthetic with were a obtained whitesignals Gaussianwere with generated random noise, variables using as indicated the in in thethe articlemodel range [21]. [ 21of]. In0.3 Thesethe to case 0.7, synthetic of in the order model signals to [21], generate the have 𝛼 di a parametersfferent sampling wave were rate signals. obtained of 2 KHz,The with other a random duration parameters variables of 1 s,were in and are establishedthe range asof shown0.3 to 0.7,in Table in order 3. Additionally, to generate thedifferent synthetic wave signal signals. was mixedThe other with parameters a white Gaussian were amplitude-normalized. All signals (natural and synthetic) have a similar heart rate—that is, the size of noise,established as indicated as shown in the in article Table 3.[21]. Additionally, These syntheti the syntheticc signals havesignal a wassampling mixed rate with of a 2 white KHz, Gaussiana duration the systolicofnoise, 1 s, and andas indicated are diastolic amplitude-normalized. in the interval article is[21]. similar These All insignalssyntheti all the (naturalc signals signals. and have synthetic) a sampling have rate aof similar 2 KHz, hearta duration rate— Thethatof 1is, first s, the andevaluation size are of amplitude-normalized. the systolic step was and todiastolic calculate All signalsinterval the (natural is MCD similar and between in synthetic) all thethe signals. have natural a similar signals—that heart rate— is, the MCD betweenthatThe is, thefirst eachsize evaluation of natural the systolic step signal was and to and diastolic calculate the remaininginterval the MCD is similar between natural in allthe samples. the natural signals. signals—that A total of 399 is, the MCD MCD values werebetween computedThe each first and naturalevaluation then signal averaged. step andwas theto calculate This remaining same the MCD procedurenatural between samples. was the appliednaturalA total signals—thatof with 399 MCD the synthetic is,values the MCD were signals, i.e., computingcomputedbetween eachand the natural MCDthen averaged. betweensignal and This each the same syntheticremaining procedur signalnaturale was andsamples. applied each A natural withtotal ofthe signal,399 synthetic MCD obtaining values signals, were 400 i.e., MCD valuescomputingcomputed that were theand then MCD then averaged. betweenaveraged. Figureeach This syntheticsame 10 shows procedur signal a schematice andwas eachapplied of natural the with procedure signal,the synthetic obtaining to compute signals, 400 MCDi.e., the MCD. computing the MCD between each synthetic signal and each natural signal, obtaining 400 MCD Figurevalues 11 shows that were the then results averaged. of the averageFigure 10 MCD shows between a schematic the of natural the procedure signals to (blue compute color), the the MCD. average values that were then averaged. Figure 10 shows a schematic of the procedure to compute the MCD. distortionsFigure 11 of shows the synthetic the results signals of the average generated MCD with between the the proposed natural signals method (blue (red color), color), the average the average distortionsFigure 11 showsof the thesynthetic results ofsignals the average generated MCD with between the proposedthe natural method signals (blue(red color),color), thethe average average distortions of the synthetic signals generated with the proposed method without applying an EWT distortionsdistortions of of the the synthetic synthetic signals signals generated generated with with the the proposed proposed method method without (red color), applying the average an EWT distortions of the synthetic signals generated with the proposed method without applying an EWT algorithmalgorithm (dark (dark blue blue color), color), and and the the average average distortiondistortion of of the the synthetic synthetic signals signals generated generated with withthe the algorithm (dark blue color), and the average distortion of the synthetic signals generated with the modelmodel in [21 in] [21] (green (green color) color) using using the the MCD MCD method. method. model in [21] (green color) using the MCD method.

FigureFigureFigure 10. 10.General 10. General General diagram diagram diagram of ofof the thethe procedureprocedure to to calculatecalculate calculate thethe the signal signal signal distortion. distortion. distortion.

Figure 11. Result of mel–cepstral distortions (MCDs). FigureFigure 11. 11.Result Result of of mel–cepstralmel–cepstral distortions distortions (MCDs). (MCDs). Appl. Sci. 2020, 10, 7003 12 of 16 Appl. Sci. 2020, 10, x FOR PROOFREADING 12 of 16

ToTo verifyverify that that the the generator generator in the in GANthe GAN model model was not was just not copying just copying some of thesome training of the examples, training weexamples, computed we thecomputed MCD distortion the MCD of distortion a synthetic of signala synt againsthetic signal each against of the signals each of used the signals in the training used in dataset.the training Figure dataset. 12 shows Figure the 12 resulting shows MCDthe re values,sulting withMCD and values, without with the and EWT without postprocessing. the EWT Itpostprocessing. can be seen that It can in none be seen of the that cases in none is there of th ane cases MCD is value there equal an MCD to approaching value equal zero;to approaching therefore, thezero; generated therefore, signal the generated is not a copy signal of anyis no oft a the copy training of any inputs. of the training inputs.

FigureFigure 12.12. ResultResult ofof MCDMCD distortiondistortion usingusing oneone syntheticsynthetic signalsignal withwith trainingtraining dataset.dataset.

ItIt cancan bebe seenseen inin FigureFigure 1111 thatthat thethe distortionsdistortions ofof thethe naturalnatural signalssignals andand thethe signalssignals generatedgenerated usingusing thethe proposedproposed methodmethod areare inin thethe samesame range,range, unlikeunlike thethe distortiondistortion obtainedobtained withwith thethe signalssignals generatedgenerated usingusing thethe modelmodel [[21].21].

4.2.4.2. ResultsResults UsingUsing ClassificationClassification ModelsModels InIn thisthis section,section, didifferentfferent heartheart soundsound classificationclassification modelsmodels publishedpublished inin thethe statestate ofof thethe artart areare testedtested [[9–12].9–12]. These models focus onon discriminationdiscrimination betweenbetween normal and abnormalabnormal heartheart sounds.sounds. TheyThey werewere trainedtrained withwith aa totaltotal ofof 805805 heartheart soundssounds (415(415 normalnormal andand 390390 abnormal),abnormal), obtainedobtained fromfrom thethe followingfollowing databases:databases: thethe PhysioNetPhysioNet/Computing/Computing inin CardiologyCardiology ChallengeChallenge 20162016 [[30],30], PascalPascal challengechallenge databasedatabase [ 31[31],], Database Database of of the the University University of of Michigan Michigan [48 [48],], Database Database of the of Universitythe University of Washington of Washington [49], Thinklabs[49], Thinklabs database database (digital (digital stethoscope) stethoscope) [50], and[50], 3M and database 3M databa (digitalse (digital stethoscope) stethoscope) [51]. [51]. TableTable4 4presents presents the the di differentfferent characteristics characteristics extracted extracted in thein the proposed proposed classification classification methods methods [9–12 [9–]. These12]. These characteristics characteristics belong belong to the to domains the domains of time, of time, frequency, frequency, time–frequency, time–frequency, and perception. and perception.

TableTable 4.4. FeaturesFeatures extractedextracted inin modelsmodels [[9–12].9–12].

ReferenceReference Features Features [9][9] Six Sixpower power values values (three (three in in systole systole andand three three in in diastole) diastole) Nineteen mel–frequency mel–frequency cepstral cepstral coe fficoefcientsficients (MFCCs) (MFCCs) and 24 and discrete 24 discrete wavelet [10] [10] wavelettransform transform features features StatisticalStatistical domain: domain: mean value, mean median value, value,median standard value, deviation, standard mean deviation, absolute mean deviation, quartile 25, quartile 75, inter-quartile range (IQR), skewness, kurtosis, and coefficient of [11] absolute deviation, quartile 25, quartile 75, inter-quartile range (IQR), [11] variation.skewness, Frequency kurtosis, domain: and entropy, coefficient dominant of variation. frequency Frequency value, dominant domain: frequency magnitude, and dominant frequency ratio. Perceptual domain: 13 MFCCs. entropy, dominant frequency value, dominant frequency magnitude, and [12] Six linear predictiondominant coeffi frequencycient (LPCs) ratio.+ 14 MFCCsPerceptual per segment domain: (S1, 13 S2,MFCCs. systole, and diastole) Six linear prediction coefficient (LPCs) + 14 MFCCs per segment (S1, S2, [12] Each feature set was used as input to the followingsystole, and machine diastole) learning (ML) models: support vector machine (SVM), k-nearest neighbors (KNNs), random forest (RF), and multilayer perceptron (MLP). Each feature set was used as input to the following machine learning (ML) models: support vector machine (SVM), k-nearest neighbors (KNNs), random forest (RF), and multilayer perceptron Appl. Sci. 2020, 10, 7003 13 of 16

In Table5, the accuracy results of each one of the combinations of characteristics with the ML models are presented, applying a 10-fold cross-validation. The analysis of these results is described in more detail in [9].

Table 5. Accuracy results of the methods proposed in [9–12], taken from article [9].

Classifier Feature Extraction SVM KNN RF MLP EWT + power [9] 92.42% 99.25% 99.00% 98.63% MFCC + DWT [10] 90.68% 91.18% 91.42% 91.55% Statistical, frequency, and perceptual [11] 84.47% 93.66% 93.66% 92.54% LPC + MFCC [12] 96.27% 95.52% 95.27% 97.26%

In this work, these classification models were used to test the synthetic signals generated with the proposed method (GAN). Therefore, 50 synthetic signals were used as the test dataset, and the accuracy results are presented in Table6. The same procedure was done with the synthetic signals without applying the EWT algorithm, with the accuracy results presented in Table7.

Table 6. Accuracy results of synthetic signals, using the trained models proposed in articles [9–12].

Classifier Feature Extraction SVM KNN RF MLP EWT + power [9] 100% 100% 100% 100% MFCC + DWT [10] 80% 90% 78% 82% Statistical, frequency, and perceptual [11] 98% 78% 78% 60% LPC + MFCC [12] 98% 96% 96% 82%

Table 7. Accuracy results of synthetic signals without applying an EWT algorithm, using the trained models proposed in articles [9–12].

Classifier Feature Extraction SVM KNN RF MLP EWT + power [9] 100% 100% 100% 100% MFCC + DWT [10] 85% 88% 80% 90% Statistical, frequency, and perceptual [11] 90% 78% 75% 60% LPC + MFCC [12] 95% 90% 90% 78%

The best results were obtained with the power characteristics proposed in [9]. However, in several combinations of characteristics and ML models, precision results greater than 90% were obtained, as was the case of the combination of LPC and MFCC proposed in [12]. From these results, it can be argued that the synthetic signals generated with the proposed method have similar characteristics to the natural signals, since the classification results on both type of signals are similar.

5. Conclusions and Future Work A GAN-based architecture was implemented to generate synthetic heart sounds, which can be used to train/test classification models. The proposed GAN model is accompanied by a denoising stage using the empirical wavelet transform, which allows us to decrease the number of training epochs and therefore the total computational cost, obtaining a synthetic cardiac signal with a low noise level. The proposed method was compared with a mathematical model proposed in the state of the art [21]. Two evaluation tests were carried out: the first was to measure the distortion between the natural and synthetic cardiac signals, in order to objectively evaluate the similarity between them. In this case, the mel–cepstral distortion (MCD) method was used, being widely used in the evaluation Appl. Sci. 2020, 10, 7003 14 of 16 of audio quality. In this test, the synthetic signal generated with the proposed method obtained a better similarity result with the natural signals compared to the mathematical model proposed in [21]. The second method consisted of using different, pre-trained classification machine learning models with good precision performance, in order to use the synthetic signals as test dataset and verify if the different ML models perform well. In this test, the power characteristics proposed in [9] with the different machine learning models registered the best results. Generally speaking, most of the combinations of features with classification models performed well in discriminating synthetic heart sounds as normal, as shown in Table4. According to the results obtained using the MCD distortion metric, and the performance of different ML models, a strong indication can be seen that synthetic signals can be used to improve the performance of heart sound classification models, since the number of samples in the training could be increased. As future work, we are implementing a GAN-based model that can generate abnormal types of heart sounds, delving into the generation of heart murmurs, since it is very difficult to acquire many samples of a specific type of abnormality. By creating a database of normal synthetic heart sounds and types of abnormalities, we hope to improve the performance of classification systems and advance the detection of types of heart abnormalities, generating significant support for healthcare professionals.

Author Contributions: Conceptualization, P.N. and W.S.P.; methodology, P.N. and W.S.P.; software, P.N.; validation, P.N. and W.S.P.; formal analysis, P.N. and W.S.P.; investigation, P.N.; resources, P.N.; data curation, P.N.; writing—original draft preparation, P.N.; writing—review and editing, P.N. and W.S.P.; visualization, P.N.; supervision, W.S.P.; project administration, P.N.; funding acquisition, P.N. and W.S.P. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Conflicts of Interest: The authors declare no conflict of interest.

References

1. World Health Organization. A Global Brief on Hypertension. Available online: http://www.who.int/ cardiovascular_diseases/publications/global_brief_hypertension/en/ (accessed on 15 September 2020). 2. Benjamin, E.J.; Blaha, M.J.; Chiuve, S.E.; Cushman, M.; Das, S.R.; Deo, R.; De Ferranti, S.D.; Floyd, J.; Fornage, M.; Gillespie, C.; et al. Heart Disease and Statistics—2017 Update: A Report From the American Heart Association. Circulation 2017, 135, 146–603. [CrossRef] 3. Camic, P.M.; Knight, S.J. Clinical Handbook of Health Psychology: A Practical Guide to Effective Interventions; Hogrefe & Huber Publishers: Cambridge, MA, USA, 2004; pp. 31–32. 4. Alvarez, C.; Patiño, A. State of emergency medicine in Colombia. Int. J. Emerg. Med. 2015, 8, 1–6. 5. Shank, J. Auscultation Skills: Breath & Heart Sounds, 5th ed.; Lippincott Williams & Wilkins: Philadelphia, PA, USA, 2013. 6. Alam, U.; Asghar, O.; Khan, S.; Hayat, S.; Malik, R. Cardiac auscultation: An essential clinical skill in decline. Br. J. Cardiol. 2010, 17, 8. 7. Roelandt, J.R.T.C. The decline of our physical examination skills: Is to blame? Eur. Heart J. Cardiovasc. Imaging 2014, 15, 249–252. 8. Clark, D.; Ahmed, M.; Dell’Italia, L.; Fan, P.; McGiffin, D. An argument for reviving the disappearing skill of cardiac auscultation. Clevel. Clin. J. Med. 2012, 79, 536–537. 9. Narváez, P.; Gutierrez, S.; Percybrooks, W. Automatic Segmentation and Classification of Heart Sounds using Modified Empirical Wavelet Transform and Power Features. Appl. Sci. 2020, 10, 4791. [CrossRef] 10. Yaseen; Gui-Young, S.; Kwon, S. Classification of Heart Sound Signal Using Multiple Features. Appl. Sci. 2018, 8, 2344. [CrossRef] 11. Arora, V.; Leekha, R.; Singh, R.; Chana, I. Heart sound classification using machine learning and . Mod. Phys. Lett. B. 2019, 33, 1950321. [CrossRef] 12. Narváez, P.; Vera, K.; Bedoya, N.; Percybrooks, W. Classification of heart sounds using linear prediction coefficients and mel-frequency cepstral coefficients as acoustic features. In Proceedings of the IEEE Colombian Conference on Communications and Computing, Cartagena, Colombia, 16 August 2017. Appl. Sci. 2020, 10, 7003 15 of 16

13. Noman, F.; Ting, C.; Salleh, S.; Ombao, H. Short-segment heart sound classification using an ensemble of deep convolutional neural networks. In Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Brighton, UK, 12 May 2019. 14. Raza, A.; Mehmood, A.; Ullah, S.; Ahmad, M.; Sang, G.; Byung-Won, O. Heartbeat Sound Signal Classification Using Deep Learning. Sensors 2019, 19, 4819. [CrossRef] 15. Abdollahpur, M.; Ghaffari, A.; Ghiasi, S.; Mollakazemi, M.J. Detection of pathological heart sound. Physiol. Meas. 2017, 38, 1616–1630. 16. Tang, Y.; Danmin, C.H.; Durand, L.G. The synthesis of the aortic valve closure sound on the dog by the mean filter of forward and backward predictor. IEEE Trans. Biomed. Eng. 1992, 39, 1–8. [PubMed] 17. Tran, T.; Jones, N.B.; Fothergill, J.C. Heart sound simulator. Med. Biol. Eng. Comput. 1995, 33, 357–359. [PubMed] 18. Zhang, X.; Durand, L.G.; Senhadji, L.; Lee, H.C.; Coatrieux, J.L. Analysis—synthesis of the phonocardiogram based on the matching pursuit method. IEEE Trans. Biomed. Eng. 1998, 45, 962–971. [PubMed] 19. Xu, J.; Durand, L.; Pibarot, P. Nonlinear transient chirp signal modelling of the aortic and pulmonary components of the second heart sound. IEEE Trans. Biomed. Eng. 2000, 47, 1328–1335. 20. Toncharoen, C.; Srisuchinwong, B. A heart-sound-like chaotic attractor and its synchronization. In Proceedings of the 6th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON, Pattaya, Thailand, 6 May 2009. 21. Almasi, A.; Shamsollahi, M.B.; Senhadji, L. A dynamical model for generating synthetic phonocardiogram signals. In Proceedings of the 33rd Annual International Conference of the IEEE EMBS, Boston, MA, USA, 30 August–3 September 2011. 22. Tao, Y.W.; Cheng, X.F.; He, S.Y.; Ge, Y.P.; Huang, Y.H. Heart sound signal generator Based on LabVIEW. Appl. Mech. Mater. 2012, 121, 872–876. 23. Jablouna, M.; Raviera, P.; Buttelli, O.; Ledeea, R.; Harbaa, R.; Nguyenb, L. A generating model of realistic synthetic heart sounds for performance assessment of phonocardiogram processing algorithms. Biomed. Signal Process. Control 2013, 8, 455–465. 24. Sæderup, R.G.; Hoang, P.; Winther, S.; Boettcher, M.; Struijk, J.J.; Schmidt, S.E.; Ostergaard, J. Estimation of the second heart sound split using windowed sinusoidal models. Biomed. Signal Process. Control 2018, 44, 229–236. 25. Joseph, A.; Martínek, R.; Kahankova, R.; Jaros, R.; Nedoma, J.; Fajkus, M. Simulator of Foetal Phonocardiographic Recordings and Foetal Heart Rate Calculator. J. Biomim. Biomater. Biomed. Eng. 2018, 39, 57–64. 26. McConnell M., E.; Branigan, A. Pediatric Heart Sounds; Springer: London, UK, 2008. [CrossRef] 27. Brown, E.; Leung, T.; Collis, W.; Salmon, A. Heart Sounds Made Easy, 2nd ed.; Churchill Livingstone Elsevier: Philadelphia, PA, USA, 2008. 28. Etoom, Y.; Ratnapalan, S. Evaluation of Children With Heart Murmurs. Clin. Pediatr. 2013, 53, 111–117. [CrossRef] 29. Johnson, W.; Moller, J. Pediatric Cardiology: The Essential Pocket Guide; Wiley-Blackwell: Oxford, UK, 2008. 30. PhysioNet/Computing in Cardiology Challenge. Classification of Normal/Abnormal Heart Sound Recordings. Available online: https://www.physionet.org/challenge/2016/1.0.0/ (accessed on 15 September 2020). 31. Bentley, P.; Nordehn, G.; Coimbra, M.; Mannor, S.; Getz, R. Classifying Heart Sounds Callenge. Available online: http://www.peterjbentley.com/heartchallenge/#downloads (accessed on 15 September 2020). 32. Van den oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. Available online: https://arxiv.org/abs/1609. 03499 (accessed on 15 September 2020). 33. Engel, J.; Resnick, C.; Roberts, A.; Dieleman, S.; Eck, D.; Simonyan, K.; Norouzi, M. Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. 34. Bollepalli, B.; Juvela, L.; Alku, P. Generative Adversarial Network-Based Glottal Waveform Model for Statistical Parametric Speech Synthesis. In Proceedings of the Interspeech 2017, Stockholm, Sweden, 20–24 August 2017. [CrossRef] 35. Biagetti, G.; Crippa, P.; Falaschetti, L.; Turchetti, C. HMM speech synthesis based on MDCT representation. Int. J. Speech Technol. 2018, 21, 1045–1055. [CrossRef] Appl. Sci. 2020, 10, 7003 16 of 16

36. Chrism, D.; Julian, M.; Miller, P. Adversarial Audio Synthesis. Available online: https://arxiv.org/abs/1802. 04208 (accessed on 15 September 2020). 37. Huang, H.; Yu, P.S.; Wang, C. An Introduction to Image Synthesis with Generative Adversarial Nets. arXiv 2018, arXiv:1803.04469. 38. Goodfellow, J.I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Neural Information Processing Systems 27, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. 39. Hany, J.; Walters, G. Hands-On Generative Adversarial Networks with PyTorch 1.x.; Pckt publishing Ltd.: Birmingham, UK, 2019. 40. Gilles, J. Empirical wavelet transform. IEEE Trans. Signal Process. 2013, 61, 3999–4010. 41. Oung, Q.; Muthusamy, H.; Basah, S.; Lee, H.; Vijean, V. Empirical Wavelet Transform Based Features for Classification of Parkinson’s Disease Severity. J. Med. Syst. 2017, 42, 29. 42. Qin, C.; Wang, D.; Xu, Z.; Tang, G. Improved Empirical Wavelet Transform for Compound Weak Bearing Fault Diagnosis with Acoustic Signals. Appl. Sci. 2020, 10, 682. 43. Chavez, O.; Dominguez, A.; Valtierra-Rodriguez, M.; Amezquita-Sanchez, J.P.; Mungaray, A.; Rodriguez, L.M. Empirical Wavelet Transform-based Detection of Anomalies in ULF Geomagnetic Signals Associated to Seismic Events with a Fuzzy Logic-based System for Automatic Diagnosis. In Wavelet Transform and Some of Its Real-World Applications; InTech: Rijeka, Croatia, 2015. 44. Debbal, S.M.; Bereksi-Reguig, F. Computerized Heart Sounds Analysis. Comput. Biol. Med. 2008, 38, 263–280. [CrossRef] 45. McSharry, P.; Clifford, G.; Tarassenko, L.; Smith, L. A Dynamical Model for Generating Synthetic Electrocardiogram Signals. IEEE Trans. Biomed. Eng. 2003, 50, 289–294. [PubMed] 46. Di Persia, L.; Yanagida, M.; Rufiner, H.; Milone, D. Objective quality evaluation in blind source separation for speech recognition in a real room. Signal Process. 2007, 87, 1951–1965. 47. Vasilijevic, A.; Petrinovic, D. Perceptual significance of cepstral distortion measures in digital speech processing. Automatika 2011, 52, 132–146. 48. University of Michigan. Heart Sound and Murmur Library. Available online: https://open.umich.edu/find/ open-educational-resources/medical/heart-sound-murmur-library (accessed on 15 September 2020). 49. University of Washington. Heart Sound and Murmur. Available online: https://depts.washington.edu/ physdx/heart/demo.html (accessed on 15 September 2018). 50. Thinklabs. Heart Sounds Library. Available online: http://www.thinklabs.com/heart-sounds (accessed on 15 September 2020). 51. Littmann Stethoscope. Heart Sounds Library. Available online: www.3m.com/healthcare/littmann/mmm- library.html (accessed on 15 September 2020).

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).