<<

Sequences Dirac’s Delta Function ( ∞ 0 x 6= 0 {xn}n=−∞ is a discrete sequence. δ(x) = , R ∞ δ(x)dx = 1 ∞ x = 0 −∞ Can derive a sequence from a function x(t) by having xn = n R ∞ x(tsn) = x( ). Sampling: f(x)δ(x − a)dx = f(a) fs −∞ P∞ P∞ Absolute summability: n=−∞ |xn| < ∞ Dirac comb: s(t) = ts n=−∞ δ(t − tsn), xˆ(t) = x(t)s(t) P∞ 2 Square summability: n=−∞ |xn| < ∞ (aka energy signal)

Periodic: ∃k > 0 : ∀n ∈ Z : xn = xn+k ( 0 n < 0 jωn un = is the unit-step sequence Discrete Sejong’s of the form xn = e are eigensequences with 1 n ≥ 0 respect to a system T because for any ω, there is some H(ω) ( such that T {xn} = H(ω){xn} 1 n = 0 ∞ δn = is the impulse sequence F{g(t)}(ω) = G(ω) = R g(t)e∓jωtdt 0 n 6= 0 −∞ −1 R ∞ ±jωt F {G(ω)}(t) = g(t) = −∞ G(ω)e dt Systems Linearity: ax(t) + by(t) aX(f) + bY (f) 1 f Time scaling: x(at) |a| X( a ) 1 t A discrete system T transforms a sequence: {yn} = T {xn}. Frequency scaling: |a| x( a ) X(af) −2πjf∆t Causal systems cannot look into the future: yn = Time shifting: x(t − ∆t) X(f)e f(xn, xn−1, xn−2,...) 2πj∆ft Frequency shifting: x(t)e X(f − ∆f) Memoryless systems depend only on the current input: yn = R ∞ 2 R ∞ 2 Parseval’s theorem: ∞ |x(t)| dt = −∞ |X(f)| df f(xn) Continuous : F{(f ∗ g)(t)} = F{f(t)}F{g(t)} Delay systems shift sequences in time: yn = xn−d Discrete convolution: {xn} ∗ {yn} = {zn} ⇐⇒ A system T is time invariant if for all d: {yn} = T {xn} ⇐⇒ X(ejω)Y (ejω) = Z(ejω) {yn−d} = T {xn−d} 0 A system T is linear if for any sequences: T {axn + bxn} = 0 Sample Transforms aT {xn} + bT {xn} Causal linear time-invariant systems are of the form The Fourier transform preserves function even/oddness but flips PN PM real/imaginaryness iff x(t) is odd. k=0 akyn−k = m=0 bmxn−m, but all of them can be rep- resented with only one set of coefficients, i.e. bi = δi. In this 1 1 F{cos(2πf0t)}(f) = 2 δ(f − f0) + 2 δ(f + f0) case the sequence {an} is called the impulse response of T since F{sin(2πf t)}(f) = − j δ(f − f ) + j δ(f + f ) {an} = T {δn}. 0 2 0 2 0 P∞ P∞ n F{ts δ(t − tsn)}(f) = δ(f − ) n=−∞ n=−∞ ts

Convolution Aliasing

P∞ {pn} ∗ {qn} = {rn} ⇐⇒ ∀n ∈ Z : rn = k=−∞ pkqn−k Due to overlapping in the , the Nyquist limit stipulates that frequencies in the signal should be limited so Associativity: ({pn} ∗ {qn}) ∗ {rn} = {pn} ∗ ({qn} ∗ {rn}) fs that |f| ≤ 2 . Bandpass sampled signals can also be re- Commutativity: {pn} ∗ {qn} = {qn} ∗ {pn} constructed if their spectral component lie within the interval fs fs n 2 < |f| < (n + 1) 2 Linearity: {pn} ∗ {aqn + brn} = a({pn} ∗ {qn}) + b({pn} ∗ {rn}) An ideal filter for enforcing the Nyquist criterion is rect(tsf) = Identity: {p } ∗ {δ } = {p } ( f n n n 1 |f| < s 2 −1 sin πtfs withF {rect(f)}(t) = fs . Shifting: {p } = {p } ∗ {δ } fs πtfs n−d n n−d 0 |f| > 2 Sine-wave and exponential sequences form a family of discrete Typically the sampling frequency is chosen so there is a transi- sequences that is closed under convolution with arbitrary se- tion band between the Nyquist limit and the highest frequency quences. expected to be sampled.

1 i Discrete Fourier Transform Hamming: wi = 0.54 − 0.46 cos(2π n−1 ) Since a windowed signal has been forced to 0 outside the sam- n−1 ik P −2πj n Xk = i=0 xie pled region, it may be zero-padded at either end without further 1 n−1 2πj ik distortion. However, that does increase the frequency resolution x = P X e n k n i=0 i of the DFT (albeit without adding any new information!). f The elements X to X n contain the frequency components 0, s , 0 2 n 2fs , ..., fs n 2 Filter Design We can make use of the fast Fourier transform: ik n −1 2ik n−1 Pn−1 −2πj P 2 −2πj Fn{xi} = xie n , so = x2ie n + We can turn a spectrum X(f) into an inverted spectrum i=0 i=0 i=0 f f k n 2ik 0 s s −2πj 2 −1 −2πj X (f) = X( − f) we shift the spectrum by . This can e n P x e n . This repeated subdivision has 2 2 i=0 2i+1 be done by multiplying with the sequence with y = cos(πi). log n rounds and n log n additions and multiplications overall, i 2 2 This can turn a low-pass filter into a high-pass filter. compared to n2 for the equivalent matrix multiplication. The ideal low-pass filter has two problems in that it is not causal By the symmetry properties of the FFT, ∀i : x = R(x ) ⇐⇒ i i and has an infinitely long impulse response. This is solved by ∀i : X = X? and ∀i : x = jI(x ) ⇐⇒ ∀i : X = −X?. n−i i i i n−i i applying a windowing function and a delay to make it causal, We can therefore compute the FFT of two real valued sequences such as: by computing that of x0 + jx00 and saying X0 = 1 (X + X? ) i i i 2 i n−1 f 0 1 ? sin(2π(i− n ) c ) and X = (X − X ). fc 2 fs i 2 i n−1 hi = 2 n f wi for an nth order low pass filter. fs 2π(i− ) c 2 fs We can also compute complex multiplication with 3 multipli- cations and 5 additions rather than 4 multiplications and 2 additions that are required naively if we have the numbers in Z-Transform Cartesian format. P∞ −n If X(z) = n=−∞ xnz , which defines a complex-valued sur- face over . For finite sequences, this surface is defined every- C Application to Convolution xn+1 where, else it converges in the region limn→∞ < |z| < xn

xn+1 We can compute convolution using FFT with O(m log m + limn→−∞ . xn n log n) rather than mn multiplications if we multiply in the Pk Pm Fourier domain instead. Note however that the sequences be- For an LTI system defined by l=0 alyn−l = l=0 blxn−l, −1 −2 −m b0+b1z +b2z +...+bmz ing convolved must be periodic with equal period lengths. If H(z) = −1 −2 −k so that {yn} = {hn} ∗ {xn}. a0+a1z +a2z +...+akz this condition is not fulfilled the sequences must be zero-padded This has m zeroes and k poles at non-zero locations in the z to a length of at least m + n − 1 to ensure the start and end of plane, plus k − m zeroes (if k > m) or m − k poles (if m > k) the resulting sequence do not overlap. at z = 0. We can use this result to perform deconvolution by dividing the The order of a filter is the number of zeroes it has. Possible −1 F{s} filter types are Butterworth, Chebyshev Type I/Type II and Fourier representation: u = F { F{h} } Elliptic.

Spectral Estimation Random Sequences If the input to the DFT is sampled but not periodic, the DFT may still be used to calculate an approximation. However, we In such a sequence every value is the outcome of a one random see leakage of energy to frequency bins that are not strictly variable from a corresponding sequence. This collection of ran- present in the input signal but adjacent to expressed frequen- dom variables is called a random process. Such a process is sta- cies. The peak amplitude also changes as the frequency of a tionary if P (a , . . . , a ) = P (a , . . . , a ) xn1+l,...,xnk+l 1 k xn1,...,xnk 1 k tone changes from one bin to the next, reaching its lowest am- where P (a) = P rob(x ≤ a). plitude of 62% halfway between the two. This is known as xn n R scalloping. It occurs since the effective window introduced by Expected value: mx = ε(xn) = apxn (a)da the DFT is rectangular, which corresponds to convolution with 2 2 2 Variance: V ar(xn) = ε(|xn − ε(xn)| ) = ε(|xn| ) − |ε(xn)| sinc in the frequency domain. ? Correlation: Cor(xn, xm) = ε(xnxm) = φxx(0) A non-periodic signal can have a windowing function applied ? to try and reduce scalloping and leakage. Possibilities include: Cross-correlation: φxy(k) = ε(xn+kyn), Autocorrelation: φxx(k) 2i Triangular: wi = 1 − 1 − n ? Cross-covariance: γxy(k) = ε[(xn+k − mx)(yn − my) ] = i ? Hanning: wi = 0.5 − 0.5 cos(2π n−1 ) φxy(k) − mxmy, Autocovariance: γxx(k)

2 P∞ Deterministic cross-correlation: cxy(k) = i=−∞ xi+kyi, so Recall that ~x is an eigenvector if for some λ ∈ R A~x = λx. Now T that {cxy(k)} = {xk} ∗ {y−k}. This implies that the Fourier any symmetric matrix A can be represented as UΛU where transform Cxx(f) is identical to the power spectrum. Λ = diag(λ1, . . . , λn) of increasing λ and the columns of U are P∞ the corresponding orthonormal eigenvectors. If {yn} = {hn} ∗ {xn} then my = mx k=−∞ hk, {φyy(n)} = {chh(n)} ∗ {φxx(n)} and {φyx(n)} = {hn} ∗ {φxx(n)} The Karhunen-Loeve transform finds a matrix A so that T White is characterized by m = 0 and φ (k) = σ2δ Cov(AX) = ACov(X)A is diagonalized. It can be shown x xx x k that since that UU T = I the U T from above is just such an A DFT can be averaged by cutting a signal into a number of A. This can be applied straightforwardly to decorrelate color windows, taking the DFTs of these and plotting the average of planes, but for spatial decorrelation we work on a matrix where their absolute values: this is known as incoherent averaging. the columns are entire monochrome images! Coherent averaging is done by taking the absolute value af- ter averaging the complex numbers, which suppresses anything It turns out that in practice (for most sample images used to which is not a periodic waveform with a period that divides the calculate decorrelation) the eigenvector matrix is almost indis- number of elements per window. tinguishable from that of the Discrete Cosine Transform: S(u) = √C(u) PN−1 s(x) cos (2x+1)uπ N/2 x=0 2N

Compression s(u) = PN−1 √C(u) S(u) cos (2x+1)uπ x=0 N/2 2N ( A compression system will typically involve several changes: √1 u = 0 Where C(u) = 2 , ensuring orthogonality. 1 u > 0 1. Transducer: converts input into voltage The Discrete Transform ensures that the bandwidth 2. A-to-D converter: samples and quantizes the signal of each output signal is proportional to the highest input fre- 3. Transformation into a perceptual domain quency that it contains. It is defined by the combination of a low pass and high pass filter. For an n-point DFT the vector 4. Quantization based on perceptual model is convolved separately with a low pass and a high pass filter. The results each have n numbers, but as the resolution of each 5. Decorrelation transform to reduce entropy has been halved, half those numbers can be discarded. The 6. Entropy coding for transmission output values of the high pass filter are retained and the low pass information is recursively treated the same way until only a single value remains (the average). Coding Coding techniques which can be used include Huffman (iterated Psychophysics low-probability tree building), arithmetic (successive weighted numeric interval refinement) and run-length. There is also a class of predictor codings, where only the difference be- Weber’s law gives the difference limit (smallest stimulus per- tween the prediction and the reality is transmitted. Simple ceivable at an intensity level): ∆φ = c(φ + a) cases are delta coding (P (x) = x) and linear predictive coding Fechner’s scale gives defines a perception intensity scale with Pn (P (x1, . . . , xn) = i=1 aixi). the sensation limit of φ0 as the origin and the respective differ- ent limit ∆φ as a unit step: ψ = log φ c φ0 Decorrelation Stevens’ law is like Fechner’s scale, but rational so that it re- a flects subjective relations: ψ = k(φ − φ0) If P (X = x ∧ Y = y) 6= P (X = x)P (Y = y) for any x and The human eye processes color and luminosity at different res- y then H(X|Y ) < H(X) ∧ H(Y |X) < H(Y ). This cannot olutions. To exploit this, some TV signals are transmitted be exploited practically since there are too many conditional as luminance and chrominance channels (YUV) rather than probabilities. However, we can approximate it with correlation, RGB. Typically Y = 0.3R + 0.6G + 0.1B, V = R − Y and which captures most dependence relationships. U U = B−Y . There also exist normalized channels Cb = 2.0 +0.5 Cov(X,Y ) V The Pearson correlation coefficient: ρX,Y = √ and Cr = + 0.5 V ar(X)V ar(Y ) 1.6 A pair of pure tones cannot be distinguished as two frequencies The covariance matrix: Cov(X~ ) = (Cov(X ,X )) for X~ ∈ i j i,j if both are in the same critical band. The human ear has 24 n ~ T ~ R . Note that Cov(X) = Cov (X) critical bands, which are expressed on the Bark scale. This If X~ ∈ Rn and Y~ ∈ Rn so that Y~ = AX~ +b with A ∈ Rn×n and can be mapped to from frequency by the approximation b ≈ n T 26.81 b ∈ then E(Y~ ) = AE(X~ ) + b and Cov(Y~ ) = ACov(X~ )A 1960Hz − 0.53 R 1+ f

3 Louder tones increase the sensation limit for nearby critical bands asymmetrically. This extends both backwards and for- wards from the time the masking signal is introduced, peaking at its edges. Due to these sensation limit effects, non-uniform quantization can reduce perceived quantization noise. Typically this will be done with a logarithmic scale, of which µ-law and A-law are the competing flavors.

JPEG

The JPEG proceeds as follows:

1. 8 bit RGB input image transformed into 8 bit YCrCb 2. Chrominance channel resolution reduced by a factor of 2 3. For each channel:

(a) Split channel into 8 × 8 blocks (b) Perform forward DCT on each block (c) Quantize DCT coefficients by perceptual matrix (d) Apply delta coding to coefficients (e) Read remaining values from DCT in a zigzag (f) Apply run-length coding to zeroes (g) Apply Huffman coding

MPEG

This is a video coding scheme very similar to JPEG. However, it adds:

• Spatially scalable coding (for progressive rendering of a kind) • Predictive coding with motion compensation • Interframe coding with I (independent), P (differences rela- tive to previous frame) and B (interpolation between neigh- boring) frames

4