The Application of Dither and Noise-Shaping To

The Application of Dither and Noise-Shaping to

Nyquist-Rate Digital Audio: an Intro duction.

Christopher Hicks,

Communications and Signal Pro cessing Group,

Cambridge University Engineering Department,

United Kingdom.

[email protected]

Septemb er 12, 1995

Abstract

This pap er discusses the theory of dithered and noise-shap ed quantisation, and in particular their ap-

plication to Nyquist rate digital audio. Dither is shown to b e essential if quantisation-related distortion

is to b e avoided. Careful application of noise-shaping is shown to b e of audible b ene t by increasing the

p erceived dynamic range of a signal. Various drawbacks of the technique are discussed. In particular the

`fragility' of signals pro cessed in this way is noted, and the ease with which the b ene ts may b e undone

is illustrated.

to copy and store it in magnetic or electronic form and to make a single hard copy, for private study, educational

use or other non-commercial purp ose, sub ject to the following conditions: the do cument shall remain complete

and unaltered, and no charge shall b e made for access to it. The author reserves all other rights.

highly correlated signals (suchasmusic) results in 1 Intro duction

tonal distortion comp onents b eing added to the sig-

nal.

The imp ortance of dithering in digital audio sys-

tems has b een recognised, at least on an academic

The distribution of d is critical; it must e ectively

level, for some time. Despite this, the sub ject re-

decorrelate the quantisation error from the input

mains p o orly understo o d in the audio profession at

signal x, while adding a minimum of noise p ower

large, and the unp opularityofmany early digital

to the output signal y .Itmust also linearise the

recordings can b e attributed to signal distortion at

quantiser transfer function, such that the exp ected

low levels, as a result of unsuitable or non-existent

output E [y (n)] = x(n), the input. If d is a white

dither.

noise source with a suitable distribution then the

error e(n)=y (n) x(n) is also white, regardless of

The technique of noise-shaping has b een applied to

the sp ectrum of x.

digital audio in a numb er of sp eci c areas. The b est

known of these are analogue to digital and digital

In practice a triangular distribution for d of p eak

to analogue conversion, where it is used to obtain

deviation q (the quantiser step size) is found to

high resolution, low bandwidth p erformance from

b e suitable for high-quality audio [1]; this is of-

alow resolution, high bandwidth converter [1].

ten referred to as TPDF dither. In this case

the exp ected error E [e(n)] is zero, the error p ower

A more recent application is CD mastering from

2 2

= q =4, and b oth are indep endent of the signal

e high-resolution recordings stored on digital tap e or

hard disc [2]. Toconvert a recording of, say,20

or 24 bit resolution to the CD format requires a re-

Thus the action of the dithered quantiser maybe

quantisation to 16 bits. This inevitably intro duces

mo delled as a simple addition of a stationary white

noise into the signal, and in a straightforward quan-

noise source e(n). The SNR is degraded bythe

tisation this noise is approximately white.

TPDF dither by 4.8 dB, compared with the un-

dithered case, but this is far preferable to the dis-

Noise-shaping can b e used create a non-white noise

tortion that can arise from undithered quantisa-

sp ectrum. For example it is p ossible to lower the

tion.

noise p ower sp ectral density in the frequency bands

where the ear is most sensitive, at the exp ense of

The dither signal itself need not b e white provided

higher noise p ower in other bands (where the ear

that it is suciently random that it is not p erceived

is less sensitive). This pro cess lowers the p erceived

as a tonal comp onent in the output signal. How-

quantisation noise o or and increases the sub jective

ever, the noise added by the quantiser itself is still

dynamic range of the signal.

approximately white (if the dither is of suitable

power) and the output noise is the sum of these

two noise sources. Therefore, using this scheme the

minimum noise p ower densityachievable (at a par-

2 Dithered Quantisation

ticular frequency) is approximately that due to the

quantiser alone, this b eing

Consider the system shown in gure 1, whose out-

put y (n) is the sum of a high-resolution input x(n)

Hz ;

and a random dither pro cess d(n), linearly quan-

tised by Q. Without the dither, quantisation of

where 1=T is the sample rate.

d(n)

3 Noise-shap ed Quantisation x(n) y(n)

This limit maybelowered by the addition of a feed-

back lo op around the dithered quantiser as shown

in gure 2.

Figure 1: Simple dithered quantiser

Triangular Probability DensityFunction 1

rom equation 5 it can b e seen that the lter H

d(n) F

changes the error sp ectrum by the function (1

j!T

(e )) but do es not a ect the signal itself. x(n) u(n) y(n) H

Integrating this transfer function over frequency

gives the noise p ower gain G and the total noise

power in the output signal as

G = 1 H (e ) d (6)

2 2

= G : (7)

n e

h(m)

Rewriting equation 6 as

j j

G = 1+ H (e )H (e )d (8)

Figure 2: Noise-shap ed Quantiser

it is clear that the total noise p ower can never b e

reduced as the integral term in equation 8 cannot

The (non-linear) di erence equations describing

b e negative.

this system are

In the case of a realisable FIR lter

u(n) = x(n)+(y (n) u(n)) ?h(m) (1)

y (n) = Q [u(n)+ d(n)] (2)

H (z ) = b z (9)

where h(m) is the lter impulse rep onse, ? repre-

p=1

sents the discrete convolution op erator and Q[:]is

equation 8 b ecomes

the quantisation function. It was shown ab ovethat

if d has suitable statistical prop erties then the sys-

P P

X X

tem is linearised and the combined e ect of adding

jq jp

b e d : (10) b e G = 1+

q p

d(n) and quantising may b e mo delled as the ad-

q =1 p=1

dition of an error e(n)which is indep endentofx.

Equation 2 may therefore b e rewritten as a linear

Note that all the pro duct terms for which p 6= q

equation

integrate to zero by orthogonality.Swapping the

order of integration and summation this expression

y (n) = u(n)+ e(n): (3)

b ecomes

Eliminating u(n) and taking z -transforms gives the

system transfer function

j(pq )

b b e d ; (11) G = 1+

p q

p;q =1

Y (z ) = X (z )+E (z )(1 H (z )) : (4)

b : (12) G = 1+

Note that for the system to b e realisable only neg-

p=1

ativepowers of z are p ermissible in the numerator

of H (z ); there must b e at least one sample delay

The noise p ower gain is seen to b e a simple func-

in the feedback lo op for the system to b e causal.

tion of the lter co ecients, and the form of this

It follows from this that setting H (z )tounityand

expression hints at a law of diminishing returns;

eliminating all the noise is imp ossible. As with any

as the length of the lter impulse resp onse is in-

causal recursive lter no p ositivepowers of z are

creased the noise p ower gain is also, generally,in-

p ermissible in the denominator.

creased. This, in turn, implies that as we try to

Substituting e (where = !T , ! =2f ) for z

exercise more control over the shap e of the quanti-

gives the output sp ectrum

sation noise o or, the actual signal to noise ratio is

j j j j

degraded. ) = X (e )+E (e ) 1 H (e ) : (5) Y (e 2 −40 −40

−60 −60

−80 −80

−100 −100

−120 −120 amplitude /dB amplitude /dB

−140 −140

−160 −160

−180 −180 0 5 10 15 20 0 5 10 15 20 frequency /kHz frequency /kHz 8 8

6 6

4 4

2 2

0 0 Sample Value Sample Value −2 −2

−4 −4

−6 −6

−8 −8 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

time /ms time /ms

Figure 4: Dithered Quantiser Figure 3: Undithered Quantiser

tortion than noise.

4 Simulations

Compare this with the same signals to whichhave

b een added white TPDF dither of p eak amplitude Anumber of simulations are presented here to ex-

q , prior to quantisation ( gure 4). It is clear that press graphically the e ects of various typ es of

the noise o or is less structured in this latter case. dithered and noise-shap ed quantisation. The test

The SNR is approximately 33 dB for the dithered signal in for the sp ectra is a 3 kHz sinusoid,60dB

signal, which implies a b est-case SNR of approx- b elow the maximum amplitude available in a 16

imately 93 dB for a full-scale signal. This agrees bit system. All sp ectra have b een estimated bya

well with the exp ected 4.8 dB reduction of SNR 16,384 p ointwindowed DFT. The test signal for the

noted ab ove. time-domain plots is a 300 Hz sinusoid at -90 dB.

Figures 5 and 6 show the actions of quantisers in- Figure 3 shows the quantiser output if the signal is

corp orating two di erent noise-shaping lters. Fig- undithered. The sp ectrum shows a clear line struc-

ure 5 uses H (z )=z and the calculated SNR of ture, the spacing of these lines b eing a function of

30 dB agrees with the noise p ower gain of 3 dB b oth the test signal frequency and the sample rate.

calculated from equation 12. The dynamic range Their frequencies are therefore mo dulated bythe

at frequencies b elow 6 kHz has b een improved at signal, and the audible result is more akin to dis- 3

−40

the exp ense of dynamic range at higher frequencies.

This is a particularly imp ortant lter as its imple-

−60

mentation requires just one memory element, and

wo additions p er sample p erio d. t −80

−40 −100

−60 −120 amplitude /dB

−80 −140

−100 −160

−120 amplitude /dB −180 0 5 10 15 20 frequency /kHz −140 8

−160 6

−180 4 0 5 10 15 20 frequency /kHz 2

Figure 5: Quantised and noise shap ed, H (z )= z . Sample Value

−2

Even greater p erceptual b ene t can b e had bytai-

−4

loring the noise-shap er resp onse to the resp onse

h is most sensitive to broadband

of the ear, whic −6 noise at around 3 kHz. There is a second, smaller

−8

ve which its sensitivityde- p eak around 12 kHz, ab o 0 1 2 3 4 5 6 7 8 9 10

time /ms

creases rapidly[3].

Equation 12 shows that the more terms are in-

cluded in the lter, the greater the overall noise

Figure 6: Perceptually noise-shap ed.

gain. Exp erimental evidence suggests that p ercep-

tually optimal p erformance is achieved with lter

lengths b etween 5 and 15 for the FIR case [2]. The

level for the noise-shap er is 7 dB lower than for

higher order lters allow more precise shaping of

white quantisation using the same wordlength. Al-

the noise transfer function at the cost of increased

ternatively, an extra bit may b e dropp ed from the

computation and SNR degradation.

wordlength by using the noise-shap er, without af-

fecting the p erceived noise level.

Figure 6 shows the output of a quantiser incorp o-

rating a seventh-order FIR noise-shaping function,

designed with these p erceptual criteria in mind.

The dynamic range at 3 kHz has b een increased by

5 Discussion

ab out 15 dB, and again the measured SNR matches

closely that predicted by equation 12, b eing ab out

14 dB worse than the non-noise-shap ed signal. The

Wehave seen that noise-shaping can b e used to

high level of high-frequency noise is obvious in the

increase the p erceived dynamic range of a digital

time-domain plot, but the form of the original si-

audio signal. This is accomplished at the exp ense

nusoid is also clearly evident.

of overall SNR and high frequency headro om. Its

use has a numb er of further implications, the most The sub jective dynamic range increase in this case

imp ortant of which are discussed here. is around 7 dB. This is to say that the audible noise 4

compression) sp ell instant death! Subsequent Digital Pro cessing

One ma jor limitation of noise-shap ed signals is that

Sample Rate Conversion

they are extremely `fragile'|that is to sayitis

very easy to completely undo all the b ene ts of

There is a numb er of di erent sample rates in com-

the noise-shaping. For example, an op eration as

mon use for digital audio, those most often encoun-

simple as applying a gain to the signal in the dig-

tered b eing 48 kHz, 44.1 kHz and 32 kHz. This

ital domain may contain an implicit quantisation

range of common sample rates has a number of im-

that adds noise to the signal. Such noise sources

plications for the design and use of noise-shap ers.

can easily dominate the original noise-shap ed o or

Firstly, it is imp ortant to note that the frequency

at those frequencies at which the noise shap er has

resp onse of a given digital lter changes with sam-

most attenuation.

ple rate, but the resp onse of the ear is xed abso-

Figure 6 shows the sp ectrum of a sixteen-bit sig-

lutely in frequency.Thus di erent sets of lter co-

nal (as may b e stored on a CD); gure 7 shows the

ecients are required in the noise-shap er for each

sp ectrum of the same signal after three small ampli-

sample rate that is encountered.

tude changes (totalling 0 dB), each with 16-bit ac-

Secondly, since a sample rate converter (SRC) is

curacy. The dynamic range increase resulting from

essentially a digital lter, care must b e taken to

the noise-shaping has b een completely undone, but

ensure that quantisations implicit in the lter do

the excess high-frequency noise remains.

not adversely a ect the signal in a manner simi-

−40

lar to the gain change mentioned previously. This

includes any quantisation implicit at the output

−60

of the converter, for example when rate-converting one 16-bit signal direct to a second 16-bit recording

−80

medium. hronous

−100 Finally it has b een suggested [4] that async

sample rate conversion is an e ective technique to

−120

combat the jitter inherent in digital audio intercon-

amplitude /dB

nections. In particular the use of a single-chip SRC

−140

immediately b efore a DAC enables the DAC itself

to run o a clean, lo cal clo ck, rather than running

−160

from a p otentially noisy clo ckrecovered (for exam-

channel. It is imp ortant, −180 ple) from an S/P-DIF

0 5 10 15 20

twordlength at the

frequency /kHz once again, to retain sucien

SRC output, and to use a DAC with suciently

high resolution and lowinternal noise to preserve

Figure 7: Sp ectrum after 16-bit gain change.

the dynamic range.

Notice also the quantisation-related spuriae (the

Numerical Headro om

most prominent at 9 kHz) whichcouldhavebeen

avoided by dithering the (implicit) quantiser. Of

Up to this p ointwehave ignored the e ect of sat-

course, this dither and quantisation could also

uration of the output word; in short, if the input

incorp orate noise-shaping, but this must b e ap-

signal p eaks at (or close to) full-scale then addition

proached with caution as successive applications

of dither (and particularly noise-shap ed dither) can

of severe noise-shaping can cause excessivehigh-

cause the output word to saturate.

frequency noise to b e added to the signal.

Of course this maybeavoided by reducing the am-

If simply changing the amplitude of a signal is

plitude of the input signal, but the e ect of this

enough to upset its delicate balance of bits, then

more complex op erations (such as equalisation or Sony/Philips Digital Interface Format 5

power makes it a tempting prop osition to do, for varies greatly with output wordlength. The reduc-

example, digital equalisation or sample-rate bu er- tion required for the p erceptual noise shap er ab ove

ing in consumer replay equipment. This to o must is approximately 10 LSB's which corresp onds (in

b e done to high accuracy if the noise-shap ed signal a 16-bit system) to around 0.003dB, which is, of

sp ectrum is to b e accurately preserved, implying course, insigni cant.

long wordlengths and higher costs.

However, in (for example) a 6-bit system (p erhaps

Furthermore, the op eration of the DAC itself must an audio co ding application) this same headro om

b e considered. The vast ma jority of converters used requirement corresp onds to a 1.5 dB loss of dy-

for digital audio use digital oversampling lters to namic range, whichmust b e traded against the p er-

relax the requirements of the analogue anti-image ceived dynamic range increase that results from the

lter. For noise-shaping to b e of b ene t, these digi- noise shaping.

tal lters also must preserve the full dynamic range

of the source tap e in the audio band. Manyofthe

Implications for the Studio

standard chipsets do not ful l this requirement, but

rather contain signi cant arithmetic noise sources

It is imp ortant that noise-shaping is the very last

that can signi cantly degrade the p erceived dy-

op eration in the mastering pro cess, and that no

namic range of noise-shap ed signals.

further manipulation of the samples o ccurs with

wordlengths shorter than that of the original source

tap e.

Unfortunately this is not the universal industry

practice. Many stand-alone analogue to digital con-

6 Conclusions

verters of 18 or 20 bit resolution incorp orate noise-

shaping algorithms so that critical mid-band dy-

namic range is preserved when recording to a six-

The careful application of dither is essential

teen bit medium suchasDAT or CD-R .Ifthe

when requantising digital audio samples. Ad-

signal is replayed directly then (almost) all is well;

dition of suitable dither is e ective at eliminat-

if, however, the recording is loaded to an editing

ing quantisation-related distortion, with minimal

system then problems b egin to emerge.

degradation of the signal to noise ratio.

Since it is a sixteen bit signal that has b een

Noise-shaping is a p owerful technique that can b e

recorded, the temptation when loading to a hard-

used to enhance the p erceived dynamic range of a

disc-based workstation for editing is to store the

digital recording. It relies on the use of a digital

data as sixteen-bit words, since this is ecientin

lter to alter the sp ectrum of the dither and quan-

terms of space, sp eed and therefore also of cost. As

tisation noise applied to a digital audio signal at

shown ab ove it only requires the level of the signal

the nal stage of mastering. However, to deliver

to b e changed during editing to undo the b ene t

this extended dynamic range to the domestic lis-

of the noise-shaping, but leave b ehind undesirable

tener requires great care.

high-frequency noise.

In particular, arithmetical op erations suchasdig-

If the application of noise-shaping b ecomes

ital ltering, can easily undo all of the b ene ts of

widespread then the use of 24-bit signal paths

noise-shaping. To this end it is not desirable to

throughout the studio is to b e encouraged.

apply noise-shaping techniques to the Nyquist rate

audio until all digital editing op erations are com-

pleted; it should b e the very nal stage of master-

Consumer Equipment

ing.

If nowwe assume that a correctly noise-shap ed

Additionally, digital pro cessing in domestic equip-

signal has b een enco ded on a CD, the problems

ment (including oversampling in a DAC) has to

have still not gone away.Availabilityofcheap DSP

b e approached with care if the b ene ts of a noise-

Recordable Compact Disc. shap ed signal are to b e realised. 6

References App endix: Sony SBM

[1] John Watkinson. The Art of Digital Audio.Fo-

Sup er Bit Mapping is a noise-shaping algorithm

cal Press, 1994.

develop ed by Sony for mastering of Compact Discs

from 20-bit source tap es. It uses an FIR lter of or-

[2] Sony Corp oration. Sup er Bit Mapping; techni-

der 12 for H (z ) and a little reverse engineering from

cal overview.

[2] enables calculation of a set of lter co ecients

(for a 44.1 kHz sample rate) which duplicates the

[3] B.C.J. Mo ore. AnIntroduction to the Psychol-

SBM noise-shaping curve. The magnitude of the

ogy of Hearing. Academic Press, 1989.

noise transfer function is plotted in gure 8.

[4] Analog Devices. Technical data for the AD1890

asynchronous sample rate converter, 1993.

Co ecients for Transversal FIR

b 147933 b 159032

1 2

b 164436 b 136613

3 4

1 1

b 926704 10 b 557931 10

5 6

1 1

b 267859 10 b 106726 10

7 8

2 3

b 285161 10 b 123066 10

9 10

3 3

b 616555 10 b 306700 10

11 12

Table 1: Filter to mimic SBM.

0 amplitude /dB −10

−20

−30

−40 0 5 10 15 20

frequency /kHz

Figure 8: Quantisation noise gain.

Super Bit Mapping and SBM are Sony trademarks. 7