<<

Equalization of Audio Channels A Practical Approach for Speech Communication

Nils Westerlund

November, 2000 Abstract

Many occupations of today requires the usage of personal preservative equip- ment such as a mask to protect the employee from dangerous substances or the usage of a pair of ear-muffs to damp high sound pressure levels. The goal of this Master thesis is to investigate the possibility of placing a for communication purposes inside such a preservative mask as well as the possibil- ity of placing the microphone inside a persons auditory meatus and perform a digital channel on the speech path in question in order to enhance the speech intelligibility. Subjective listening tests indicates that the speech quality and intelligibility can be considerably improved using some of the methods described in this thesis. Acknowledgements

I would like to express my gratitude to Dr. Mattias Dahl for his support and extraordinary ability to explain complex systems and relationships in an under- standable way. I would also like to express my appreciation to Svenska EMC- Lab in Karlskrona for letting me use their semi-damped room for measurement purposes. Contents

1 Channel Equalization — An Introduction 2 1.1 Non-Adaptive Methods ...... 3 1.2 Adaptive Channel Equalization ...... 4

2 Equalization of Mask Channel 8 2.1 Gathering of Measurement Data ...... 8 2.2 Coherence Function ...... 9 2.3 Channel Equalization using tfe ...... 10 2.4 Adaptive Channel Equalization ...... 10 2.4.1 The LMS Algorithm ...... 11 2.4.2 The NLMS Algorithm ...... 14 2.4.3 The LLMS Algorithm ...... 16 2.4.4 The RLS Algorithm ...... 16 2.5 Minimum-Phase Approach ...... 18 2.6 Results of Mask Channel Equalization ...... 21

3 Equalization of Mouth-Ear Channel 22 3.1 Gathering of Measurement Data ...... 23 3.2 Coherence Function of Mouth-Ear Channel ...... 24 3.3 Channel Equalization Using tfe ...... 25 3.4 Adaptive Channel Equalization ...... 25 3.4.1 The LMS Algorithm ...... 26 3.5 Results of mouth-ear channel equalization ...... 28

4 Identification of “True” Mouth-Ear Channel 31 4.1 Basic Approach ...... 31

5 Conclusions 35 5.1 Further Work ...... 35

A MatLab functions 37 A.1 LMS Algorithm ...... 37 A.2 NLMS Algorithm ...... 38 A.3 LLMS Algorithm ...... 39 A.4 RLS Algorithm ...... 40 A.5 Minimum-Phase ...... 41 A.6 Coherence and ...... 42 A.7 Estimate of “True” Channel ...... 43

1 Chapter 1

Channel Equalization — An Introduction

x(n) y(n) h(n)

X(z) Y(z) H(z)

Figure 1.1: System with input and output signals and the corresponding system in z-domain.

A linear time-invariant system h(n) takes input signal x(n) and produces an output signal y(n) which is the convolution of x(n) and the unit sample response h(n) of the system, see fig. 1.1. The input, output and system is assumed to be real and only real signals will be considered in this thesis. The convolution described above can be written as

y(n) = x(n) ∗ h(n) (1.1) where the convolution operation is denoted by an asterisk ∗. In z-domain, the convolution represents a multiplication given by

Y (z) = X(z)H(z) (1.2) where Y (z) is the z-transform of the output y(n), X(z) is the z-transform of the input x(n) and H(z) is the z-transform of the unit sample response h(n) of the system. In many practical applications there is a need to correct the distortion caused by the channel and in this way recover the original signal x(n). In this thesis, this corrective operation will be called channel equalization.

2 1.1 Non-Adaptive Methods

A cascade connection of a system h(n) and its inverse hI (n) is illustrated in fig. 1.2. Suppose the distorting system has an impulse response h(n) and let

Identitysystem

x(n) y(n) d(n)=x(n) h(n) hI(n)

Figure 1.2: System h(n) cascaded with its inverse system hI (n) results in an identity system.

hI (n) denote the impulse response of the inverse system. We can then write

d(n) = x(n) ∗ h(n) ∗ hI (n) = x(n) (1.3) where d(n) is the desired signal, i.e. the original input signal x(n). This implies that h(n) ∗ hI (n) = δ(n) (1.4) where δ(n) is a unit impulse. In z-domain, (1.4) becomes

H(z)HI (z) = 1 (1.5)

Thus, the transfer function for the inverse system is 1 H (z) = (1.6) I H(z)

Note that the zeros of H(z) becomes the poles of the inverse system and vice versa. If the characteristics of the system is unknown, it is often necessary to excite the system with a known input signal, observe the output, compare it with the input and then determine the characteristics of the system. This operation is called system identification [1]. If we obtain an output signal y(n) from a system h(n) excited with a known input signal x(n), we could of course use the z-transforms of y(n) and x(n) to form

Y (z) H(z) = (1.7) X(z)

However, this is an analytical example and the transfer function H(z) is most likely infinite in duration. A more practical approach is based on a correlation method. The crosscorrelation of the signals x(n) and y(n) is given by

∞ rxy(l) = X x(n)y(n − l) , l = 0, 1, 2, . . . (1.8) n=−∞

3 The index l is the lag parameter1 and the subscripts xy on the crosscorrelation sequence rxy(l) indicate the sequences being correlated. If the roles of x(n) and y(n) is reversed, we obtain

∞ ryx(l) = X y(n)x(n − l) , l = 0, 1, 2, . . . (1.9) n=−∞

Thus, rxy(l) = ryx(−l) (1.10) Note the similarities between the computation of the crosscorrelation of two sequences and the convolution of two sequences. Hence, if the sequence x(n) and the folded sequence y(−n) is provided as inputs to a convolution algorithm, the convolution yields the crosscorrelation rxy(l), i.e.

rxy(l) = x(l) ∗ y(−l) (1.11)

In the special case when x(n) = y(n) the operation results in the autocorrelation of x(n), rxx(l). Recall that y(n) = x(n) ∗ h(n). The insertion of this expression for y(n) into (1.11) yields rxy(l) = h(−l) ∗ rxx(l) (1.12) In z-domain, (1.12) becomes

∗ Pxy(z) = H (z)Pxx(z) (1.13)

∗ where H (z) is the complex conjugate of H(z) and Pxx is the power spectral density of x(n). The transfer function for the identified system is then

∗ P (z) H (z) = xy (1.14) Pxx(z) where Pxy(z) is the cross spectral density between x(n) and y(n). If rxy(l) is replaced by ryx(−l) in (1.12), the complex conjugate in (1.14) is eliminated and we obtain the following estimate of the transfer function:

P (z) H(z) = yx (1.15) Pxx(z)

The MatLab2 function tfe3 uses this method to estimate a transfer function of the system in question [4]. In later sections it will be clear that this method is both straightforward and powerful when identifying a given system.

1.2 Adaptive Channel Equalization

Another trail to equalize a channel is to use adaptive algorithms. There are a vast amount of application areas for adaptive algorithms and the mathematical theory is quite complex and reaches beyond the scope of this thesis. Therefore,

1Also commonly referred to as (time) shift parameter 2MatLab is a trademark of The MathWorks, Inc. 3Transfer Function Estimate

4 in this section, only a brief description of the basic principles of adaptive filtering will be given [2]. A block diagram of an adaptive filter is shown in fig. 1.3. It consists of a shift-varying filter and an adaptive algorithm for updating the filter coefficients. The goal of adaptive FIR-filters, is to find the Wiener filter w(n) that minimizes

d(n)

x(n) Adaptive d(n) filter

e(n) Adaptive algorithm

Figure 1.3: Basic structure for an adaptive filter. the mean-square error

ξ(n) = E{|d(n) − dˆ(n)|2} = E{|e(n)|2} (1.16) where E{·} is the expected value and dˆ(n) is the estimate of the desired signal d(n). We know that if x(n) and d(n) are jointly wide-sense stationary processes, the filter coefficients that minimize the mean-square error ξ(n) are found by solving the Wiener-Hopf equations [2]

Rxxw = rdx (1.17) where Rxx denotes the autocorrelation matrix of x(n), w denotes the vector containing filter coefficients and rdx denotes the crosscorrelation vector of d(n) and x(n). The calculation of the Wiener-Hopf equations is a complex mathematical operation including an inversion of the autocorrelation matrix Rxx. If the in- put signal or the desired signal is nonstationary, this operation would have to be performed iteratively. Instead, the requirement that w(n) should minimize the mean-square error at each time n can be relaxed and a coefficient update equation of the form w(n + 1) = w(n) + ∆w(n) (1.18) can be used. In this equation ∆w(n) is a correction that is applied to the filter coefficients w(n) at time n to form a new set of coefficients, w(n + 1), at time n+1. Equation (1.18) is the heart of all adaptive algorithms used in this thesis.4 Since the error function ξ(n) is a quadratic function, its curve can be viewed as a “bowl” with the minimum error at the bottom of this bowl. The idea of 4Except for the RLS algorithm described in section 2.4.4

5 adaptive filters is to find the optimal vector w(n) by taking small steps towards the minimum error. The update equation for this vector is

w(n + 1) = w(n) − µ∇ξ(n) (1.19) where µ is the step size and ∇ξ(n) is the gradient vector of ξ(n). Note that the steps are taken in the negative direction of the gradient vector since this vector points in the direction of steepest ascent. The gradient can be directly estimated by the product of e(n) and x(n). Introducing this estimate in (1.19) yields

w(n + 1) = w(n) + µe(n)x(n) (1.20) which is the well known Least Mean Squares (LMS) algorithm. Further devel- opments of this algorithm includes Normalized LMS (NLMS) and Leaky LMS (LLMS). All of these algorithms will be evaluated in later sections of this thesis [2]. In fig. 1.4, a block scheme that can be used for adaptive channel equalization is shown. The original signal s(n) is passed through some sort of system (a channel) that distorts the input signal and this distorted signal is then used as input to the adaptive algorithm. The output signal dˆ(n) from the adaptive causal filter is subtracted from the desired signal d(n) and the result forms the error e(n). The error is the second input signal to the adaptive algorithm. If the system is considered as a non-trivial system, it will not only affect the spectral characteristics of the input signal but also introduce a on the same5. This is the reason why the delay ∆ is so important. Another important property is that if the channel to be equalized is causal, the equalizing filter will be non-causal if no delay of the filtered signal x(n) is acceptable. However, only causal Finite Impulse Response (FIR) adaptive filters will be used in this thesis and these filters will indeed introduce a delay on the signal. Also note that an FIR filter of course only can approximate an Infinite Impulse Response (IIR) filter with a certain precision if such a filter is needed for an optimal solution [3].

5That is, if the impulse response of the system is more complex than a zero-centered unit sample .

6 Delay D d(n)

Inputsignal x(n) Adaptive d(n) e(n) s(n) Channel filter

Adaptive algorithm

Figure 1.4: Basic structure for an adaptive channel equalizer.

7 Chapter 2

Equalization of Mask Channel

In this chapter, a protective mask is studied. The goal was to equalize the distortion of human speech caused by this mask. In order to collect the necessary data to perfom this study, a measurement setup was assembled in order to data on site.

2.1 Gathering of Measurement Data

The gathering of measurement data was made with the help of a test dummy head1, two DAT-recorders2 and a signal analyzer3. The test dummy used, was constructed specially for audio measurements and was equipped with a placed in its mouth. A microphone was mounted on the inside of the mask and the mask was then attached to the test dummy head, see fig. 2.1. To damp disturbing environmental noise, the complete was placed behind particle boards covered with insulation wool. The signal analyzer was used to generate noise bandlimited to 12.8 kHz and one of the DAT-recorders, the SV3800 model, was used to record noise and speech sequences while the other was used for playback of speech sequences. The sampling frequency was 48 kHz with a resolution of 16 bits and the information on the DAT-tapes was then stored as wav-files using the software CoolEdit 2000. The wav-files was finally read by MatLab for further processing. A block scheme of the complete setup is shown in fig. 2.2. The first action taken, was to reduce the amount of data by sampling rate conversion. Using the MatLab function decimate, the sampling frequency was reduced in two steps: First from 48 kHz to 24 kHz and then from 24 kHz to 12 kHz. Hence, the amount of data was reduced to one fourth. For a detailed description of how decimate works, see [5].

1Head Acoustic 2Sony TCD-D8 and Panasonic SV3800 3Hewlett-Packard 36570A

8 Figure 2.1: (a) Test dummy head equipped with a loudspeaker in its mouth. The microphone is placed inside the mask. (b) Placement of the microphone in the mask.

Figure 2.2: Block scheme of the complete measuring setup.

2.2 Coherence Function

A powerful tool for investigating the properties of input-output signals, is the coherence function. If Pxx and Pyy are the power spectral densities of input signal x(n) and output signal y(n) respectively, and Pxy is the cross spectrum of the input and output signal, the coherence function Cxy can be calculated as

2 |Pxy| Cxy = (2.1) PxxPyy

A coherence function equal to one, means that a perfectly linear and noise-free system is being measured. Thus, a coherence function gives a direct measure of the quality of the estimated . In appendix A.6 a MatLab function that calculates the coherence function is listed.

9 The coherence function Cxy of the mask is shown in fig. 2.3. The length of the FFTs (Fast Fourier Transform) used for calculating Cxy was 2048.

1

0.8

0.6

0.4

0.2

0 0 1000 2000 3000 4000 5000 6000 Frequency [Hz]

Figure 2.3: The coherence function Cxy of the mask. The input signal was flat bandlimited noise sequence with variance σ2 = 1 (FFT-length 2048).

2.3 Channel Equalization using tfe

First, the impulse response of the system was estimated using the MatLab function tfe. For a detailed description of how this function works, see [4]. A short resum of the theory behind tfe is given in section 1.1. An alternative function, custom made by the author, is listed in appendix A.6. The data was divided into non-overlapping sections and then windowed by a Hanning window. The magnitude squared of the Discrete Fourier Transforms (DFT) of the input noise sections were averaged to form Pxx. The products of the DFTs of the input and output noise sections were averaged to form Pxy. A one-sided spectrum is returned by tfe and in order to perform an Inverse FFT (IFFT), the spectrum has to be converted to a two-sided spectrum. This spectrum can then be used as input to the MatLab function ifft and in this way the corresponding impulse response for the transfer function can be calculated. For a detailed description of the MatLab function ifft, see [5] The channel transfer function and impulse response for different filter lengths are shown in fig. 2.4. Calculating a channel equalizing filter for the mask using tfe is easily done simply by switching the input parameters. That is, if the tfe function call to estimate a channel is Txy=tfe(inputNoise, outputNoise), the function call to estimate an equalizing filter to the same channel would be Txy inv=tfe(outputNoise, inputNoise). The result of this operation is shown in fig. 2.5.

2.4 Adaptive Channel Equalization

The MatLab function tfe calculates the transfer function using “brute force”. However, an alternative approach is the usage of adaptive methods. In this section an investigation based on LMS, NLMS, LLMS and RLS (Recursive Least Squares) adaptive FIR filters will take place.

10 20 0.2 0

0 dB −20 L=50 −40 −0.2 −60 10 20 30 40 50 0 1000 2000 3000 4000 5000 20 0.2 0

0 dB −20 L=110 −40 −0.2 −60 20 40 60 80 100 0 1000 2000 3000 4000 5000 20 0.2 0

0 dB −20 L=256 −40 −0.2 −60 50 100 150 200 250 0 1000 2000 3000 4000 5000 20 0.2 0

0 dB −20 L=512 −0.2 −40 −0.4 −60 100 200 300 400 500 0 1000 2000 3000 4000 5000 Filter taps Frequency [Hz]

Figure 2.4: The left column shows impulse responses for the mask. The fil- ters were calculated using the correlation method. The filter lengths are L=50, L=110, L=256 and L=512. The right column shows the corresponding transfer functions. The transfer functions were calculated using the MatLab function freqz [5].

2.4.1 The LMS Algorithm The first adaptive algorithm used for channel equalization was the LMS algo- rithm according to w(n + 1) = w(n) + µe(n)x(n) (2.2) An implementation of the LMS algorithm is listed in appendix A.1

Step size The correct choice of the step size µ is of great importance when using the LMS algorithm or other LMS-based algorithms. Using (2.3), the maximum step size can easily be approximated by 2 0 < µ < (2.3) pE{|x(n)|2} where p is the filter length and E{|x(n)|2} is estimated with

n 1 Eˆ{|x(n)|2} = |x(m)|2 (2.4) p X m=n−p+1

11 4 40 2 20

0 dB L=50 −2 0 −4 −20 10 20 30 40 50 0 1000 2000 3000 4000 5000 5 40

20

0 dB L=110 0

−5 −20 20 40 60 80 100 0 1000 2000 3000 4000 5000

5 40

20

0 dB L=256 0 −5 −20 50 100 150 200 250 0 1000 2000 3000 4000 5000

5 40 20

0 dB L=512 0 −5 −20 100 200 300 400 500 0 1000 2000 3000 4000 5000 Filter taps Frequency [Hz]

Figure 2.5: The left column shows impulse responses for the mask channel equal- izing filter. The filters were calculated using the correlation method. The filter lengths are L=50, L=110, L=256 and L=512. The right column shows the cor- responding transfer functions. The transfer functions were calculated using the MatLab function freqz.

In reality, this step size approximation can seldom or never be used. Instead, as a rule of thumb, a step size at least an order of magnitude smaller than the maximum value allowed, should be used [2]. Nevertheless there are applications that may allow larger step sizes.

Delay The choice of delay has a substantial effect on the quality of the channel equal- izer. The Mean Square Error (MSE) measures the quality in this case. As a rule of thumb, the delay can be chosen equal to half the adaptive filter length [3]. In fig. 2.6 the MSE is plotted as a function of the delay. It is clear that a delay of about 100 samples gives the least MSE if the filter length is 200. Note that the introduction of a delay is crucial for the quality of a channel equalizer but that the length of the delay is not critical. According to the figure, the delay could have been as short as 50 samples and as long as 150 samples with maintained low level of the MSE. However, to leave out the delay results in an unacceptably high MSE. The physical delay that the limited speed of sound propagation c introduces

12 −4 x 10 5

4.5

4

3.5

3

2.5 Mean Square Error

2

1.5

1

0.5 0 50 100 150 200 250 300 350 400 Delay [Samples]

Figure 2.6: MSE plotted as a function of the delay ∆. The length of the adaptive filter is L=200.

to the system, is best illustrated with a plot of the impulse response of the mask, i.e. the crosscorrelation between the loudspeaker and the microphone. This plot is shown in fig. 2.7 and is based on an estimate made by the Hewlett-Packard 36570A signal analyzer. Note that the amplitude of the impulse response is not correctly scaled. The crosscorrelation is approximately zero during the time 0–0,2 ms. This delay is due to the propagation time for the first sound wave that reaches the microphone. If we approximate c ≈ 330 m/s as the speed of sound and ∆ ≈ 2 · 10−4 ms delay, the distance L between the loudspeaker and the microphone can be calculated by L = ∆ · c (2.5) which yields a distance between the loudspeaker and the microphone of about 6.5 cm. This distance corresponds well to the real distance.

Filter length The filter length is of course a key parameter in all sorts of filter design. The- oretically, the length can be chosen arbitrarily but in a realization of a filter in, for example, a digital signal processor (DSP), the length of the filter is lim- ited by memory size as well as by mathematical complexity. Hence, we have a classical trade-off situation between efficiency and quality. To motivate the

13 −4 x 10 3

2

1

0

−1

−2

−3

−4 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016

−4 x 10 1

0

−1

−2

−3

−4 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −4 Time [Seconds] x 10

Figure 2.7: The plots shows the impulse response of the mask, i.e. the cross- correlation between the loudspeaker and the microphone. The lower plot is a zoomed version of the upper plot.

choice of filter length, fig. 2.8 shows the MSE plotted as a function of the filter length. The result from all LMS-based adaptive algorithms used in this thesis are plotted. Note that when the filter lengths increases beyond a certain point the MSE actually increases. The reason for this is that as the number of filter coefficients is increased, the error due to stochastic “jumps” of these coefficients on the error surface also increases. This error is called the Excess MSE.

Results The LMS algorithm was used to perform both a channel identification and a channel equalization. The corresponding plots is shown in fig. 2.9-2.10.

2.4.2 The NLMS Algorithm Normalized LMS (NLMS) uses a time varying step size as follows β β µ(n) = = (2.6) xT (n)x(n) +  ||x(n)||2 +  In this thesis, only real signals are used, hence the transpose in the denominator of (2.6). If x(n) was a complex signal, the transpose would be a hermitian trans-

14 −4 x 10 5 LMS, 0.025 of max step size LMS, 0.050 of max step size 4.5 LMS, 0.100 of max step size NLMS, beta=0.05 NLMS, beta=0.10 4 NLMS, beta=0.20 LLMS, 0.025 of max step size LLMS, 0.050 of max step size 3.5 LLMS, 0.100 of max step size RLS, lambda=1

3

2.5

2 Mean Square Error

1.5

1

0.5

0 0 20 40 60 80 100 120 140 160 180 200 Filter Length

Figure 2.8: The MSE plotted as a function of filter length for LMS, NLMS, LLMS and RLS. The delay is half the length of the filter plus eight samples due to the physical delay introduced by the system. Three different step sizes was used for each algorithm (except for the RLS algorithm). The input signal was 50 000 samples of flat bandlimited noise with the variance σ2 = 1 except for the RLS algorithm where 10 000 samples of noise were used.

pose. Also note that to avoid division by zero, a small constant  is introduced in the denominator. If equation 2.6 is inserted into equation 2.2, we obtain

x(n) w(n + 1) = w(n) + β e(n) (2.7) ||x(n)||2

With a correct statistical assumption it can be shown that the NLMS algorithm will converge if 0 < β < 2 [2]. Therefore, the NLMS algorithm requires no knowledge about the statistics of the input signal in order to calculate the step size. Another advantage of the NLMS algorithm, is its insensitivity to the am- plification of the gradient noise that a high-amplitude input signal introduces. This insensitivity comes from the normalization in (2.7). The delay requirement is the same as when using the LMS algorithm.

15 0.5 20 0

0 dB −20 L=50 −40 −0.5 −60 10 20 30 40 50 0 1000 2000 3000 4000 5000 20 0.2 0

0 dB −20 L=110 −40 −0.2 −60 20 40 60 80 100 0 1000 2000 3000 4000 5000 20 0.2 0

0 dB −20 L=256 −0.2 −40 −60 50 100 150 200 250 0 1000 2000 3000 4000 5000 20 0.2 0

0 dB −20 L=512 −0.2 −40 −60 100 200 300 400 500 0 1000 2000 3000 4000 5000 Filter taps Frequency [Hz]

Figure 2.9: The left column shows impulse responses for the mask. The fil- ters were calculated using the LMS algorithm. The filter lengths are L=50, L=110, L=256 and L=512. The right column shows the corresponding transfer functions. The transfer functions were calculated using the MatLab function freqz.

2.4.3 The LLMS Algorithm If the eigenvalues of an autocorrelation matrix is zero, the LMS algorithm does not converge as expected. The LLMS algorithm (Leaky LMS) solves this prob- lem by adding a “leakage coefficient” γ to the filter coefficients according to

w(n + 1) = (1 − µγ)w(n) + µe(n)x(n) (2.8)

This leakage coefficient forces the filter coefficients to zero if either the input signal or the error signal becomes zero. The obvious drawback of this method, is that a bias is introduced to the solution. This bias becomes evident in fig. 2.8. In this case, the LLMS algorithm has approximately twice as large MSE compared to the other algorithms in the plot. The delay requirement is the same as when using the LMS and NLMS algo- rithms.

2.4.4 The RLS Algorithm If an increased computational complexity is acceptable, the time for convergence can be reduced considerably by using the RLS algorithm. For a more thorough

16 2 20

0

0 dB L=50 −20

−2 −40 10 20 30 40 50 0 2000 4000 6000 2 20

0

0 dB

L=110 −20

−2 −40 20 40 60 80 100 0 2000 4000 6000 20 1 0

0 dB

L=256 −20 −1 −40 50 100 150 200 250 0 2000 4000 6000 20 0.5 0

0 dB

L=512 −20 −0.5 −40 100 200 300 400 500 0 2000 4000 6000 Filter taps Frequency [Hz]

Figure 2.10: The left column shows impulse responses for the equalizing filter. The filters were calculated using the LMS algorithm. The filter lengths are L=50, L=110, L=256 and L=512. The right column shows the corresponding transfer functions. The transfer functions were calculated using the MatLab function freqz.

description of this algorithm, see [2]. One important property of the RLS algo- rithm, is that the step size depends on the size of the error: If the estimate dˆ(n) is close to the desired signal d(n), small corrections of the filter coefficients will be made. Hence, the step size will be large at the beginning of the convergence and then, as dˆ(n) approaches d(n), become smaller and smaller. A plot of the MSE as a function of the filter length is shown in fig. 2.8. Due to the complexity of the algorithm, the plot has been calculated from 10 000 samples of flat bandlimited noise (σ2 = 1).

17 2.5 Minimum-Phase Approach

If we were to design a channel equalizer for hifi audio purposes, a linear phase filter would be the only acceptable choice since all frequencies are delayed equally when passed through such a filter. In the case of the mask, this constraint is substantially relaxed. This channel equalizer is supposed to operate in a telephone network (PSTN) using the frequency band 300-3400 Hz. Since the channel equalizer is designed to operate in such a large system, it is desirable to reduce the delay caused by the filtering and in this way minimize the total delay introduced by the whole system, i.e. the PSTN. One powerful method of minimizing the delay of a system, is to design it as a minimum-phase filter. A minimum-phase filter has all of its zeros inside of or possibly on the unit circle. This type of filters can be obtained from a linear-phase filter by reflecting all of the zeros that are outside the unit circle to the inside of the unit circle. The resulting filter will have minimum-phase and, except for a scaling factor, the same magnitude as the linear-phase filter [6].

2

1

0

−1

−2 0 20 40 60 80 100 120

3

2

1

0

−1 0 20 40 60 80 100 120 Filter taps

Figure 2.11: The upper plot shows the impulse response for a minimum-phase filter and the lower plot shows the impulse response for a linear-phase filter. Note how the “centre of gravity” of the linear-phase filter has been shifted to form the minimum-phase filter.

The plots in fig. 2.11 shows the impulse response for a mask channel equal- izing minimum-phase filter and the corresponding linear-phase filter. The filter length of the linear-phase filter is 128 and thus the delay when using this filter,

18 will be 64 samples due to its symmetry. In figure 2.12 the corresponding am- plitude functions are plotted. It is clear that the minimum-phase filter indeed

20

10

0

−10

−20 Amplitude [dB]

−30 Desired Frequency Response Minimum Phase Frequency Response −40 0 1000 2000 3000 4000 5000 6000

20

10

0

−10

−20 Amplitude [dB]

−30 Desired Frequency Response Linear Phase Frequency Response −40 0 1000 2000 3000 4000 5000 6000 Frequency [Hz]

Figure 2.12: Amplitude for the minimum-phase filter (upper plot) and linear- phase filter (lower plot). Note that the overall performance is approximately the same for both filters results in approximately the same amplitude as the linear-phase filter. Another interesting question, is how the phase behaves over the frequency band. This is illustrated in fig. 2.13. The group delay τg is defined to be

dθ(ω) τ = − (2.9) g dω where θ(ω) is the phase. For a linear-phase system the group delay is, by definition, constant. The group delay for the different filters is shown in fig. 2.13. It is clear that the usage of a linear-phase filter of length L will introduce a constant delay of L/2 samples. If a minimum-phase filter is used, the delay will be reduced substantially but on the other hand it will not be constant over the frequency band.

19 5

0 Radians

−5 0 1000 2000 3000 4000 5000 6000

0

−50

−100 Radians

−150

−200 0 1000 2000 3000 4000 5000 6000 Frequency [Hz]

Figure 2.13: The phase for the minimum-phase filter (upper plot) and linear- phase filter (lower plot).

100

50

0 Samples

−50 Minimum Phase Linear Phase −100 0 1000 2000 3000 4000 5000 6000 Frequency [Hz]

Figure 2.14: The group delay for the minimum-phase filter and the linear-phase filter.

20 2.6 Results of Mask Channel Equalization

When talking about speech quality and speech intelligibility it is hard to decide what is “high quality speech” and “low quality speech”. One needs some sort of measure to be able to draw conclusions on whether one speech sample is “better” than another. There are nevertheless a great deal of subjective feelings about speech quality and intelligibility. In the case of the mask channel equalization, both correlation methods and adaptive methods proved to be a powerful tool in channel equalization. Both methods managed to substantially improve the speech quality and intelligibility using reasonable filter lengths. A subjective listening test showed that at a sampling frequency of Fs = 12 kHz, a filter length of about L = 100 taps significantly improved the speech quality. There was also little or no difference at all between the results of the different adaptive algorithms and this is the reason why only the LMS algorithm is used as adaptive method in chapter 3 where an equalization of the mouth-ear channel is performed. When a minimum-phase filter was used to equalize the mask channel, a subjective listening test could not differ a speech sample filtered by such a filter from a speech sample filtered by a corresponding linear-phase filter. This suggests that minimum-phase filters can be used without loss off speech quality in speech communication system.

21 Chapter 3

Equalization of Mouth-Ear Channel

In chapter 2 we saw that it is possible to equalize the channel that a mask represents using both correlation methods and adaptive methods. We now move on to next issue: Placing the microphone inside a persons auditory meatus and identify and equalize the channel between the mouth and the ear. The first problem that arises, is how to generate a noise signal. When using the test dummy head, a signal analyzer could be used to generate the reference input signals (see section 2.1). Now, when placing the microphone inside a human auditory meatus, the skull itself represents the channel to be equalized. Thus, the test subject himself must generate a broadband noise-like signal to excite the channel/skull. This may seem like an impossible task but in fact it is quite possible to generate a broadband noise-like sound. The power spectral density for such a noise-like sound, made by a human speech organ, is shown in fig. 3.1.

0

−20

−40

−60 Amplitude [dB] −80

−100 0 1000 2000 3000 4000 5000 Frequency [Hz]

Figure 3.1: Power spectrum of broadband noise-like sound generated by human speech organ.

22 3.1 Gathering of Measurement Data

The equipment used was a DAT-recorder1, a custom made microphone amplifier, two microphones2 and a pair of ear-muffs3. One of the was placed in front of the test subjects mouth and the other was placed inside the test sub- jects auditory meatus. The ear-muffs was then placed on the test subjects head. This is advantageous since the signal path outside the skull is damped consider- ably. Also, a pair of ear-muffs damps disturbing or even harmful environmental noise. The test subject was placed in a semi-damped room and pronounced a num- ber of sentences chosen a priori. He also tried to make noise-like sounds. The two-channel data was recorded at 44.1 kHz and then the sampling frequency was reduced to 11.025 kHz in the same manner as the data from the mask mea- surements (see section 2.1.) For a block scheme of the complete measurement setup, see fig. 3.2 and 3.3.

Figure 3.2: Block scheme of the complete measurement setup.

Figure 3.3: Microphone placement in auditory meatus.

1Sony TCD-D8 2Sennheiser 3Hellberg

23 3.2 Coherence Function of Mouth-Ear Channel

Using the noise signal generated by the human speech organ, the coherence was calculated as in (2.1). The result is shown in fig. 3.4. As described in

1

0.8

0.6

0.4

0.2

0 0 1000 2000 3000 4000 5000 Frequency [Hz]

Figure 3.4: Coherence function of mouth-ear channel (FFT-length 2048). section 2.2, the ideal coherence function is equal to one which means that a perfectly linear and noise-free system is being measured. As fig. 3.4 illustrates, this is not the case with the mouth-ear channel. It has been showed that sound propagation through the skull is perfectly linear in the frequency band at interest [7]. Furthermore, the signal recorded in the auditory meatus is severely damped (see section 3.3) and this indicates that the problem is a poor signal-to-noise ratio (SNR) rather than a non-linear problem. An additional problem is that the excitation signal is not used as reference signal when identifying the system. However, one should not concentrate on the coherence function to the exclu- sion of other information about the signals. It will be clear in later sections, that a satisfactory channel equalization can be performed even though the coherence function at some frequencies (or frequency bands) falls far below unity.

24 3.3 Channel Equalization Using tfe

The MatLab function tfe uses correlations to calculate a transfer function, as described in (1.14), section 1.1. The transfer functions and impulse responses for a number of different filter lengths are shown in fig. 3.5-3.6. The procedure of calculating the impulse response from the transfer function given by tfe, was the same as in section 2.3. It is evident that the skull performs a relatively simple low-pass filtering with a cut-off frequency of about 500 Hz and a stop- band damping of about 30-40 dB. The strange behaviour of the impulse response can probably be explained by the aggravating circumstances mentioned in section 3.2.

0.04 0 0.02 −20

0 dB L=50 −40 −0.02 −0.04 −60 10 20 30 40 50 0 1000 2000 3000 4000 5000 0 0.02 −20

0 dB

L=110 −40 −0.02 −60 20 40 60 80 100 0 1000 2000 3000 4000 5000 0.04 0

0.02 −20 dB

L=256 0 −40

−0.02 −60 50 100 150 200 250 0 1000 2000 3000 4000 5000 0.06 0 0.04 −20

0.02 dB

L=512 −40 0 −0.02 −60 100 200 300 400 500 0 1000 2000 3000 4000 5000 Filter taps Frequency [Hz]

Figure 3.5: The left column shows impulse responses for the mouth-ear channel. The filters were calculated using the correlation method. The filter lengths are L=50, L=110, L=256 and L=512. The right column shows the corresponding transfer functions. The transfer functions were calculated using the MatLab function freqz.

3.4 Adaptive Channel Equalization

Due to the small differences between the results of the different adaptive algo- rithms used in chapter 2.4, only the standard LMS algorithm will be used as an example of adaptive channel equalization of the mouth-ear channel.

25 20 40

20

0 dB L=50 0

−20 −20 10 20 30 40 50 0 1000 2000 3000 4000 5000 20 40

20

0 dB L=110 0

−20 −20 20 40 60 80 100 0 1000 2000 3000 4000 5000 20 40

20

0 dB L=256 0

−20 −20 50 100 150 200 250 0 1000 2000 3000 4000 5000

20 40

20

0 dB L=512 0 −20 −20 100 200 300 400 500 0 1000 2000 3000 4000 5000 Filter taps Frequency [Hz]

Figure 3.6: The left column shows impulse responses for the mouth-ear channel equalizing filter. The filters were calculated using the correlation method. The filter lengths are L=50, L=110, L=256 and L=512. The right column shows the corresponding transfer functions. The transfer functions were calculated using the MatLab function freqz.

3.4.1 The LMS Algorithm As in the case with the mask, a number of parameters must be calculated to obtain an effective equalizing filter.

Step size and Filter Length To find a proper step size, the MSE was plotted as a function of the filter length L. Different fractions of the maximum step size was used and the result is shown in fig. 3.7. According to this figure, a step size of about one fifth of the maximum allowed step size, seems to be a reasonable choice. For a start, the delay was chosen as half the filter lengths. Later, a more thorough investigation of an optimal delay is performed.

Delay As in the case of the mask channel equalization, a proper delay must be chosen. However, a problem arises when the human speech organ is considered. The test dummy head used in the mask channel equalization had a loudspeaker placed in

26 −3 x 10 10 0.025 of maximum my 0.05 of maximum my 0.1 of maximum my 9 0.2 of maximum my 0.4 of maximum my

8

7

6

Mean Square Error 5

4

3

2 0 50 100 150 200 250 300 Filter Lenght

Figure 3.7: The MSE plotted as a function of filter length for the LMS algorithm. The delay was half the filter length. Five different step sizes was used and the input signal was noise generated by a human speech organ.

its “mouth”. Hence, the source of the speech or noise was generated at a certain isolated point. When a person is talking, this is not the case. Instead, the vocal chords acts together with the throat, mouth and nostril cavities to form sounds. This means that the speech or noise no longer is generated at one isolated point. Rather, the sound is a result of many systems cooperating. Since we are forced to use a human skull instead of a test dummy head to collect data, it is difficult to predict a certain optimal delay for a mouth-ear channel equalizer. According to [3], a delay of half the filter length is a rule of thumb. This rule can of course be used without further investigations but a simple plot of the MSE as a function of the delay can offer interesting information about the optimal delay. Fig. 3.8 shows the MSE plotted as a function of delay for eight mouth-ear channel equalizing filters. The filter lengths are L = 10, L = 30, L = 50, L = 70, L = 90, L = 110, L = 256 and L = 512 and the delay was 0–2L. Using this information, the LMS algorithm was used to identify and equalize the mouth-ear channel. The results of these operations are shown in fig. 3.9-3.10.

27 −3 −3 x 10 x 10 12 12 10 10 8 8

L=10 6 L=30 6 4 4 2 2 0 −3 5 10 15 20 0 −3 20 40 60 x 10 x 10 12 12 10 10 8 8

L=50 6 L=70 6 4 4 2 2 0 −3 20 40 60 80 100 0 −3 50 100 x 10 x 10 12 12 10 10 8 8

L=90 6 6 L=110 4 4 2 2 0 −3 50 100 150 0 −3 50 100 150 200 x 10 x 10 12 12 10 10 8 8 6 6 L=256 L=512 4 4 2 2 0 100 200 300 400 500 0 200 400 600 800 1000 Delay [Samples] Delay [Samples]

Figure 3.8: The MSE plotted as a function of delay for eight mouth-ear channel equalizing filters, each of different length L and with a delay of 0–2L. The filters were calculated using the LMS algorithm.

3.5 Results of mouth-ear channel equalization

The mouth-ear channel represents a far more complex system and measurement environment, than the mask channel does. The speech signal inside the audi- tory meatus is severely damped and this means that great demands are made upon the microphones and amplifiers. Furthermore, the signal that is used as the reference signal, i.e. the speech signal at the mouth, is not the excitation signal of the system/skull. Most likely, these factors concurs to a poor coher- ence function and a poor estimate of the channel. Nevertheless, it is possible to significantly enhance the speech intelligibility by using some of the methods described in this chapter. The performance of the correlation method was par- ticularly good while some problems were encountered when trying to make the adaptive algorithms converge properly.

28 10 20 0

0 dB −20 L=50 −40 −10 −60 10 20 30 40 50 0 1000 2000 3000 4000 5000 0.05 20 0

0 dB −20 L=110 −40 −0.05 −60 20 40 60 80 100 0 1000 2000 3000 4000 5000 0.05 20 0

0 dB −20 L=256 −40 −0.05 −60 50 100 150 200 250 0 1000 2000 3000 4000 5000 0.05 20 0

0 dB −20 L=512 −40 −0.05 −60 100 200 300 400 500 0 1000 2000 3000 4000 5000 Filter taps Frequency [Hz]

Figure 3.9: The left column shows impulse responses for the mouth-ear channel. The filters were calculated using the LMS algorithm. The filter lengths are L=50, L=110, L=256 and L=512. The right column shows the corresponding transfer functions. The transfer functions were calculated using the MatLab function freqz.

29 5 20 0

0 dB

L=50 −20 −40 −5 10 20 30 40 50 0 1000 2000 3000 4000 5000

2 20 0

0 dB

L=110 −20 −2 −40 20 40 60 80 100 0 1000 2000 3000 4000 5000

1 20 0

0 dB

L=256 −20 −1 −40 50 100 150 200 250 0 1000 2000 3000 4000 5000

0.5 20 0

0 dB

L=512 −20 −0.5 −40 100 200 300 400 500 0 1000 2000 3000 4000 5000 Filter taps Frequency [Hz]

Figure 3.10: The left column shows impulse responses for the mouth-ear channel equalizing filter. The filters were calculated using the LMS algorithm. The filter lengths are L=50, L=110, L=256 and L=512. The right column shows the corresponding transfer functions. The transfer functions were calculated using the MatLab function freqz.

30 Chapter 4

Identification of “True” Mouth-Ear Channel

Most types of measurements in some way affects the item being measured. In the case of the mouth-ear channel identification and equalization, the cables, analog-to-digital converters (ADC) and the microphones forms a system that distorts the signal in some way. However, it is possible to equalize this system as well and in this way find an approximation of the “true” channel.

4.1 Basic Approach

In fig. 4.1 a principal block scheme illustrates how the measurements are per- formed. SE is the signal recorded in the auditory meatus, i.e. the Ear, and SM is the signal recorded at the Mouth. HT is the true ear-mouth channel and H is the true ear-mouth channel distorted by the measurement equipment. The microphones, cables and ADCs can be viewed upon as a system. Sup- pose we perform a measurement and uses equipment/system GM to record data at the mouth and equipment/system GE to record data in the auditory meatus. We then have the situation shown in fig. 4.2. H1 is the first estimate of the channel. This setup means that

SM = SEHT (4.1)

The output from H1 will be SEGEH1 and the output from GM will be SM GM . This means that SM GM H1 = (4.2) SEGE Then, the microphones are switched so that the equipment that was used to record data in the auditory meatus in the first measurement now is placed in front of the mouth and vice versa. Fig. 4.3 shows this new setup. Note that GE and GM are switched. This means that

SM GE H2 = (4.3) SEGM

If (4.1) is substituted into (4.2) and (4.3) and H1 and H2 are multiplied we obtain HT = pH1H2 (4.4)

31 SE SM

HT

ADC ADC etc. etc.

H

Figure 4.1: Block scheme of how the measurements are performed.

i.e. the true channel. The result of applying the operations described in this section to the mouth- ear channel equalizing problem is shown in fig. 4.4. A MatLab function for estimating HT is listed in appendix A.7.

32 SE SM=SEHT

HT

G E GM

H1

SEGEH1=SMGM

Figure 4.2: Block scheme where the measurement equipment is viewed upon as a system, one in front of the mouth, GM , and one in the auditory meatus, GE.

SE SM=SEHT

HT

GM GE

H2

SEGMH2=SMGE

Figure 4.3: Block scheme of the second measurement where GM and GE are switched, resulting in another identified channel, H2.

33 40 30 20 10 0 Amplitude [dB] −10 Transfer Function − Left channel at mouth −20 Transfer Function − Right channel at mouth Estimate of "true" channel −30 0 1000 2000 3000 4000 5000 Frequency [Hz]

25 20 15 10 5 0 −5 −10 −15 50 100 150 200 250 300 350 400 450 500 Filter taps

Figure 4.4: Estimated transfer functions (upper plot) and impulse response for the true channel (lower plot).

34 Chapter 5

Conclusions

The goal of this Master thesis has been to investigate the possibility of placing a microphone for communication purposes inside a preservative mask as well as the possibility of placing the microphone inside a persons auditory meatus and digitally equalize the speech path in question. A number of methods has been evaluated, both adaptive and non-adaptive. The work shows that the cor- relation method is a powerful and straightforward way of identifying a system. Subjective listening tests indicates that this method was able to identify and equalize the mask channel with a satisfactory result and with reasonable filter lengths. The mouth-ear channel presented more difficulties because of its “non-ideal” circumstances. The mask was attached to a test dummy head equipped with a loudspeaker in its mouth and bandlimited noise was used as reference signal. When the mouth-ear channel was to be identified, a real human skull had to be used and the test subject had to excite the skull himself. Partly because of this, a proper transfer function for this channel was difficult to find. The work also shows that the speech signal detected inside the auditory meatus is substantially damped and this raises the requirements on the measurement equipment because of the low SNR. Another factor that affects the final result is that the excitation signal of the skull is not used as reference/desired signal. Instead, the speech at the test subjects mouth is used as the desired signal when identifying an equalizing filter. This makes the identification process far more complex than in the case with the test dummy head and the protective mask. Nevertheless, subjective listening tests revealed that a substantial improve- ment in speech intelligibility was achieved when using the correlation method. The adaptive methods performed less well, mainly because of convergence prob- lems.

5.1 Further Work

Further improvements may be achieved by using one ore more of the suggestions below:

• To identify a channel between the mouth and the auditory meatus, low- noise microphones and amplifiers probably have to be used. This would most likely raise the SNR and improve the final results.

35 • It has been shown that the sound pressure level varies depending on where inside the auditory meatus the microphone is placed [8]. It is possible that a small change in the position of the microphone may increase the SNR to some extent.

• The excitation signal used when identifying the mouth-ear channel prob- ably causes problems. This signal is far from ideal and furthermore it does not stem from the vocal chords. Another way of exciting the skull would be to simply talk for a few minutes and in this way excite all the frequencies needed for an identification of the channel.

36 Appendix A

MatLab functions

A.1 LMS Algorithm function [yout,eout,f]=lms(x,d,mu,nord)

% [yout,eout,f]=lms(x,d,mu,nord) % % x - Input Signal % d - Desired Signal % mu - Step size % nord - Filter length % yout - Filter output % eout - Error during convergence % f - Filter taps % % (c)Nils Westerlund, 2000

L=length(x); f=zeros(nord,1); yout=zeros(1,L); eout=zeros(1,L); for K=nord:L, xn=x(K:-1:K-nord+1); y=xn’*f; yout(K)=y; e=d(K)-y; eout(K)=e; f=f+mu*e*xn; end

37 A.2 NLMS Algorithm function [yout,eout,f]=nlms(x,d,mu,nord)

% [yout,eout,f]=nlms(x,d,mu,nord) % % x - Input Signal % d - Desired Signal % mu - Step size % nord - Filter length % yout - Filter output % eout - Error during convergence % f - Filter taps % % (c)Nils Westerlund, 2000

L=length(x); f=zeros(nord,1); yout=zeros(1,L); eout=zeros(1,L); for K=nord:L, xn=x(K:-1:K-nord+1); y=xn’*f; yout(K)=y; e=d(K)-y; eout(K)=e; nrm=xn’*xn+eps; f=f+mu*e*(xn/nrm); end

38 A.3 LLMS Algorithm function [yout,eout,f]=llms(x,d,mu,gamma,nord)

% [yout,eout,f]=llms(x,d,mu,gamma,nord) % % x - Input Signal % d - Desired Signal % mu - Step size % gamma - Leakage factor % nord - Filter length % yout - Filter output % eout - Error during convergence % f - Filter taps % % (c)Nils Westerlund, 2000

L=length(x); f=zeros(nord,1); yout=zeros(1,L); eout=zeros(1,L); for K=nord:L, xn=x(K:-1:K-nord+1); y=xn’*f; yout(K)=y; e=d(K)-y; eout(K)=e; f=(1-mu*gamma)*f+mu*e*xn; end

39 A.4 RLS Algorithm function [W]=rls(x,d,nord,lambda)

% [W]=rls(x,d,nord,lambda) % % x - Input Signal % d - Desired Signal % nord - Filter length % lambda - Forgetting factor % W - Filter taps % % (c)Nils Westerlund, 2000 x=x(:)’; d=d(:)’; delta=0.001; P=inv(delta)*eye(nord); xflip=fliplr(x); xflip=[zeros(1,nord-1) xflip zeros(1,nord-1)]; W=zeros(length(xflip)-2*nord+2,nord); z=zeros(5,1); g=zeros(50,1); alpha=0; for k=1:length(xflip)-2*nord+1, z=P*xflip(end-k-nord+1:end-k)’; g=z/(lambda+xflip(end-k-nord+1:end-k)*z); alpha=d(k+1)-xflip(end-k-nord+1:end-k)*W(k,:).’; W(k+1,:)=W(k,:)+alpha*g.’; P=(P-g*z.’)/lambda; end W=W(end,:);

40 A.5 Minimum-Phase Filter Design function [x2,h]=minfas(Admag,Ndft)

% [x2,h]=minfas(Admag,Ndft) % % Admag - Desired frequency response % Ndft - Length of DFT % x2 - Minimum-phase Impulse response % h - Linear-phase Impulse response %

Admag=Admag(:); Admag=Admag’; fs=12000; f=100*(1.2589).^(3:length(Admag)+2);

Admagi=Admag; Admagi=[Admagi 0]; Ad=10.^(Admagi/20); Ad=Ad(:); Mag(1:Ndft/2+1)=Ad; Mag(Ndft/2+2:Ndft)=flipud(Ad(2:Ndft/2)); xehat=real(ifft(log(Mag))); xhat(1)=xehat(1); xhat=2*xehat; N=Ndft/2; x2=real(ifft(exp(fft(xhat(1:N),Ndft)))); x2=x2(1:N); % ------% Linear phase - FFT method %------L=Ndft/2; M=(L)/2; Adh=Ad(1:length(Ad)-1).*exp(-j*2*pi*M*((0:length(Ad)-2))’/Ndft); Magh(1:Ndft/2)=Adh; Magh(Ndft/2+1)=Ad(Ndft/2+1); Magh(Ndft/2+2:Ndft)=flipud(conj(Adh(2:Ndft/2))); h=real(ifft(Magh)); h=h(1:L+1);

41 A.6 Coherence Function and Estimate of Transfer Function function [Txy_H1,Txy_H2,Cxy]=sysest(x,y,nfft,winflag)

% [Txy_H1,Txy_H2,Cxy]=sysest(x,y,nfft,winflag) % % x - Input Signal % y - Output Signal % nfft - FFT Length % winflag - 1-> windowing 0-> no windowing % Txy_H1 - H1-estimate of transfer function % Txy_H2 - H2-estimate of transfer function % Cxy - Coherence Function % % (c)Nils Westerlund, 2000 x=x(:); y=y(:); win=hanning(nfft); k=fix(length(x)/nfft); u=inv(nfft)*sum(abs(win).^2); Pxx=zeros(nfft,1); Pxy=zeros(nfft,1); Pyy=zeros(nfft,1); if(winflag) disp(’Windowing...’) else disp(’No windowing...’) end for i=0:k-1 if(winflag) xw=win.*x(i*nfft+1:(i+1)*nfft); yw=win.*y(i*nfft+1:(i+1)*nfft); else xw=x(i*nfft+1:(i+1)*nfft); yw=y(i*nfft+1:(i+1)*nfft); end X=fft(xw,nfft); X2=abs(X).^2; Y=fft(yw,nfft); Y2=abs(Y).^2; XY=Y.*conj(X); Pxx=Pxx+X2; Pyy=Pyy+Y2; Pxy=(Pxy+XY); end Txy_H1=Pxy./Pxx;

42 Txy_H2=Pyy./conj(Pxy); Txy_H1=Txy_H1(1:nfft/2); Txy_H2=Txy_H2(1:nfft/2); Cxy=(abs(Pxy).^2)./(Pxx.*Pyy); Cxy=Cxy(1:nfft/2);

A.7 Estimate of “True” Channel function [Htrue,htrue,lchm_Hinv,rchm_Hinv]=... truechan(lchm_innoise,lchm_outnoise,rchm_innoise,rchm_outnoise)

% [Htrue,htrue,lchm_Hinv,rchm_Hinv]=... % truechan(lchm_innoise,lchm_outnoise,rchm_innoise,rchm_outnoise) % lchm_innoise - Left channel at mouth, input noise % lchm_outnoise - Left channel at mouth, output noise % rchm_innoise - Right channel at mouth, input noise % lchm_outnoise - Right channel at mouth, output noise % Htrue - "True" channel transfer function % htrue - Impulse response for "true" channel % lchm_Hinv - Est. of equ. transfer func., left ch. at mouth % rchm_Hinv - Est. of equ. transfer func., right ch. at mouth % % (c)Nils Westerlund, 2000

[lchm_Hinv,F]=tfe(lchm_outnoise,lchm_innoise,512); [rchm_Hinv,F]=tfe(rchm_outnoise,rchm_innoise,512); nfft=2*length(lchm_Hinv); Htrue=sqrt(lchm_Hinv.*rchm_Hinv); Htrue=[Htrue;flipud(conj(Htrue(2:end-1)))]; htrue=real(ifft(Htrue)); htrue=[htrue(nfft/2+1:end);htrue(1:nfft/2)]; Htrue=Htrue(1:nfft/2);

43 Bibliography

[1] Proakis J. G., Manolakis D. G. (1996). Digital Signal Processing, Principles, Algorithms and Applications (Prentice-Hall)

[2] Hayes M. H. (1996). Statistical Digital Signal Processing and Modeling (Wi- ley).

[3] Widrow B., Stearns S. D. (1985). Adaptive Signal Processing (Prentice-Hall).

[4] MatLab Reference Guide - System Identification Toolbox.

[5] MatLab Reference Guide.

[6] Parks T. W., Burrus C. S. (1987). Digital Filter Design (Wiley).

[7] H˚akansson B., Carlsson P., Brandt A., Stenfelt S. (1995). “Linearity of sound transmission through the human skull in vivo,” J. Acoust. Soc. Am. 99, 2239-2243.

[8] Hellstr¨om P-A., Axelsson, A. (1991). “Miniature microphone probe tube measurements in the external auditory canal,” J. Acoust. Soc. Am. 93(2), 907-919.

44