Camera Post-Processing Pipeline Kari Pulli Senior Director

NVIDIANVIDIA Research Research Topics

Filtering blurring sharpening bilateral filter Sensor imperfections (PNU, dark current, vignetting, …) ISO (analog – digital conversion with amplification) Demosaicking and denoising Gamma correction Color spaces Response curve White balance JPEG compression

NVIDIA Research What is image filtering?

Modify the pixels in an image based on some function of a local neighborhood of the pixels

10 5 3 Some function 4 5 1 7 1 1 7 Local image data Modified image data

NVIDIA Research Linear functions

Simplest: linear filtering replace each pixel by a linear combination of its neighbors The prescription for the linear combination is called the “convolution kernel”

10 5 3 0 0 0 4 5 1 0 0.5 0 7 1 1 7 0 1 0.5 Local image data kernel Modified image data

NVIDIA Research Convolution

f [m,n] = I ⊗ g = ∑I[m −k,n −l]g[k,l] k,l

I

NVIDIA Research Linear filtering (warm-up slide) t n e i c

i 1.0 f f e o c ? 0 Pixel offset original

NVIDIA Research Linear filtering (warm-up slide) t n e i c

i 1.0 f f e o c

0 Pixel offset original Filtered (no change)

NVIDIA Research Linear filtering t n e

i 1.0 c i f f e o c ? 0 Pixel offset original

NVIDIA Research Shift t n e

i 1.0 c i f f e o c

0 Pixel offset original shifted

NVIDIA Research Linear filtering t n e i c i f f

e 0.3 o c ? 0 Pixel offset original

NVIDIA Research Blurring t n e i c i f f

e 0.3 o c

0 Pixel offset original blurred (filter applied in both dimensions)

NVIDIA Research Blur examples t n

8 e 2.4 i c i f impulse f e

o 0.3 c

0 original Pixel offset filtered

NVIDIA Research Blur examples t n

8 e 2.4 i c i f impulse f e

o 0.3 c

0 original Pixel offset filtered t n

8 e

i 8 c i f edge 4 f 4 e 0.3 o c

0 original Pixel offset filtered

NVIDIA Research Linear filtering (warm-up slide)

2.0

1.0 ? 0 0 original

NVIDIA Research Linear filtering (no change)

2.0

1.0

0 0

original Filtered (no change)

NVIDIA Research Linear filtering

2.0

0.33 ? 0 0 original

NVIDIA Research (remember blurring) t n e i c i f f

e 0.3 o c

0 Pixel offset original Blurred (filter applied in both dimensions)

NVIDIA Research Sharpening

2.0

0.33

0 0

original Sharpened original

NVIDIA Research Sharpening example

1.7 11.2 t n

8 e 8 i c i f f e o c -0.25 -0.3 original sharpened (differences are accentuated; constant areas are left untouched)

NVIDIA Research Sharpening

before after

NVIDIA Research Bilateral filter

Tomasi and Manduchi 1998 http://www.cse.ucsc.edu/~manduchi/Papers/ICCV98.pdf Related to SUSAN filter [Smith and Brady 95] http://citeseer.ist.psu.edu/smith95susan.html Digital-TV [Chan, Osher and Chen 2001] http://citeseer.ist.psu.edu/chan01digital.html sigma filter http://www.geogr.ku.dk/CHIPS/Manual/f187.htm

NVIDIA Research Start with Gaussian filtering

Here, input is a step function + noise

J = f ⊗ I

output input

NVIDIA Research Start with Gaussian filtering

Spatial Gaussian f

J = f ⊗ I

output input

NVIDIA Research Start with Gaussian filtering

Output is blurred

J = f ⊗ I

output input

NVIDIA Research The problem of edges

Weight f(x, ξ) depends on distance from ξ to x Here, I(ξ) “pollutes” our estimate J(x) It is too different

J (x) = ∑ f (x,ξ) I(ξ) ξ

I ( x) x I ( ξ )

output input

NVIDIA Research Principle of Bilateral filtering

[Tomasi and Manduchi 1998] Penalty g on the intensity difference

1 J (x) = ∑ f (x,ξ) g(I(ξ) − I(x)) I(ξ) k(x) ξ

x I(x) I(ξ )

output input

NVIDIA Research Bilateral filtering

[Tomasi and Manduchi 1998] Spatial Gaussian f

1 J (x) = ∑ f (x,ξ) g(I(ξ) − I(x)) I(ξ) k(x) ξ

x

output input

NVIDIA Research Bilateral filtering

[Tomasi and Manduchi 1998] Spatial Gaussian f Gaussian g on the intensity difference 1 J (x) = ∑ f (x,ξ) g(I(ξ) − I(x)) I(ξ) k(x) ξ

x

output input

NVIDIA Research Normalization factor

[Tomasi and Manduchi 1998] k(x) = ∑ f (x,ξ) g(I(ξ) − I(x)) ξ 1 J (x) = ∑ f (x,ξ) g(I(ξ) − I(x)) I(ξ) k(x) ξ

x

output input

NVIDIA Research Bilateral filtering is non-linear

[Tomasi and Manduchi 1998] The weights are different for each output pixel

1 J (x) = ∑ f (x,ξ) g(I(ξ) − I(x)) I(ξ) k(x) ξ x x

output input

NVIDIA Research Other view

The bilateral filter uses the 3D distance

NVIDIA Research NVIDIA Research From “raw-raw” to RAW

Pixel Non-Uniformity each pixel in a CCD has a slightly different sensitivity to light, typically within 1% to 2% of the average signal can be reduced by calibrating an image with a flat-field image flat-field images are also used to eliminate the effects of vignetting and other optical variations Stuck pixels some pixels are turned always on or off identify, replace with filtered values Dark floor temperature adds noise sensors usually have a ring of covered pixels around the exposed sensor, subtract their signal

NVIDIA Research ISO = amplification in AD conversion

Sensor converts the continuous light signal to a continuous electrical signal

The analog signal is converted to a digital signal at least 10 bits (even on cell phones), often 12 or more (roughly) linear sensor response

Before conversion the signal can be amplified ISO 100 means no amplification ISO 1600 means 16x amplification +: can see details in dark areas better -: noise is amplified as well; sensor more likely to saturate

NVIDIA Research ISO

NVIDIA Research But too dark signal needs amplification

With too little light, and a given desired image , two alternatives use low ISO and brighten digitally amplifies also post-gain noise (e.g., read noise) use high ISO to get brightness directly a bit less noise

But both amplification choices amplify noise ideally, you make sure the signal is high by using a slower or larger aperture

NVIDIA Research Demosaicking or Demosaicing?

NVIDIA Research Demosaicking

NVIDIA Research Your eyes does it too…

NVIDIA Research Demosaicking

NVIDIA Research First choice: bilinear interpolation

Easy to implement

But fails at sharp edges

NVIDIA Research Two-color sampling of BW edge

Luminance profile

True full-color image

Sampled data

Linear interpolation

NVIDIA Research Typical color Moire patterns

Blow-up of electronic camera image. Notice spurious colors in the regions of fine detail in the plants.

NVIDIA Research Brewster’s colors — evidence of interpolation from spatially offset color samples

Scale relative to human photoreceptor size: each line covers about 7 photoreceptors.

NVIDIA Research Color sampling artifacts

NVIDIA Research R-G, after linear interpolation

NVIDIA Research Median filter

Replace each pixel by the median over N pixels (5 pixels, for these examples). Generalizes to “rank order” filters.

In: Out: Spike noise is removed

5-pixel neighborhood

Monotonic edges In: Out: remain unchanged

NVIDIA Research Degraded image

NVIDIA Research Radius 1 median filter

NVIDIA Research Radius 2 median filter

NVIDIA Research R – G

NVIDIA Research R – G, median filtered (5x5)

NVIDIA Research Two-color sampling of BW edge

Sampled data

Linear interpolation

Color difference signal

Median filtered color difference signal

Reconstructed pixel values

NVIDIA Research Recombining the median filtered colors

Linear interpolation Median filter interpolation

NVIDIA Research Take edges into account

Use bilateral filtering ADAPTIVE DEMOSAICKING avoid interpolating across edges Ramanath, Snyder, JEI 2003

NVIDIA Research Take edges into account HIGH-QUALITY LINEAR INTERPOLATION FOR DEMOSAICING OF BAYER-PATTERNED COLOR IMAGES Malvar, He, Cutler, ICASSP 2004 Predict edges and adjust assumptions correlates with RGB edges = luminance change When estimating G at R if the R differs from bilinearly estimated R  luminance changes Correct the bilinear estimate by the difference between the estimate and real value

NVIDIA Research Joint Demosaicing and Denoising Denoising in ISP? Hirakawa, Parks, IEEE TIP 2006

The experimental results confirm that the proposed method suppresses noise (CMOS/CCD noise model) while effectively interpolating the missing pixel components, demonstrating a significant improvement in when compared to treating demosaicking and denoising problems independently.

NVIDIA Research Denoising using non-local means

Most image details occur repeatedly Each color indicates a group of squares in the image which are almost indistinguishable Image self-similarity can be used to eliminate noise it suffices to average the squares which resemble each other

Image and movie denoising by nonlocal means Buades, Coll, Morel, IJCV 2006

NVIDIA Research Restoring a highly compressed image

NL-means is able to remove block artifacts due to compression but at the cost of removing some details, as the difference between the compressed and restored image shows

NVIDIA Research BM3D (Block Matching 3D)

NVIDIA Research How many bits are needed for smooth shading?

With a given adaptation, human vision has contrast sensitivity ~1% call black 1, white 100 you can see differences 1, 1.01, 1.02, … needed step size ~ 0.01 98, 99, 100 needed step size ~ 1 with linear encoding delta 0.01 – 100 steps between 99 & 100  wasteful delta 1 – only 1 step between 1 & 2  lose detail in shadows instead, apply a non-linear power function, gamma provides adaptive step size

NVIDIA Research 61 Gamma encoding

With 6 bits available (for illustration below) for encoding linear loses detail in the dark end

Raise intensity X to power Xγ where γ = 1/2.2 then encode Display applies γ = 2.5 to get back to linear light the difference boosts colors to compensate for a dark viewing environment

NVIDIA Research 62 Gamma on displays 1

CRT inherent gamma of ~2.35 – 2.55 0 1 due to electrostatics of the cathode and electron gun just a coincidence that has an almost perfect inverse match to human vision system non-linearity!

LCD no inherent gamma applied as a look-up table to match with conventions

If gamma is not right, both colors and intensities shift example: if (0, 255, 127)is not gamma corrected, red channel remains 0, green remains 255, blue is decreased by the display

NVIDIA Research 63 Display brightness and contrast

ΙΙ==αδαδγ γ Brightness and contrast knobs control α and γ Which one controls which?

Brightness controls γ, contrast controls α

NVIDIA Research Gamma encoding

With the “delta” ratio of 1.01 need about 480 steps to reach 100 takes almost 9 bits

8 bits, nonlinearly encoded sufficient for broadcast quality digital TV contrast ratio ~ 50 : 1

With poor viewing conditions or display quality fewer bits needed

NVIDIA Research 65 Gamma summary

At the camera or encoding level apply a gamma of around 1 / 2.2 The CRT applies a gamma of 2.5 The residual exponent 2.2 / 2.5 boosts the colors to compensate for the dark environment

See http://www.poynton.com/GammaFAQ.html http://www.poynton.com/notes/color/GammaFAQ.html http://www.poynton.com/PDFs/Rehabilitation_of_gamma.pdf

NVIDIA Research 66 The CIE XYZ System

A standard created in 1931 z by CIE Commission Internationale de L'Eclairage Defined in terms of three color matching functions x Given an emission spectrum, y we can use the CIE matching functions to obtain the x, y and z coordinates y corresponds to luminance perception

NVIDIA Research NVIDIA Research The CIE Chromaticity Diagram

Intensity is measured as the distance from origin black = (0, 0, 0)

Chromaticity coordinates give a notion of color independent of brightness

A projection of the plane x + y + z = 1 yields a chromaticity value dependent on dominant wavelength (= hue), and excitation purity (= saturation) the distance from the white at (1/3, 1/3, 1/33)

NVIDIA Research More About Chromaticity

Dominant wavelengths go around the perimeter of the chromaticity blob a color’s dominant wavelength is where a line from white through that color intersects the perimeter some colors, called nonspectral colors, don’t have a dominant wavelength (which ones? colors that mix red and blue ) Excitation purity is measured in terms of a color’s position on the line to its dominant wavelength Complementary colors lie on opposite sides of white, and can be mixed to get white complement of blue is yellow complement of red is cyan

NVIDIA Research Perceptual (non-)uniformity

The XYZ color space is not perceptually uniform! Enlarged ellipses of constant color in XYZ space

NVIDIA Research CIE L*a*b*: uniform color space

Lab is designed to approximate human vision it aspires to perceptual uniformity L component closely matches human perception of

A good color space for image processing

NVIDIA Research Gamuts

Not every device can reproduce every color A device’s range of reproducible colors is called its gamut x

NVIDIA Research YUV, YCbCr, …

Family of color spaces for encoding including in FCam, video and viewfinder usually YUV Channels Y = luminance [linear]; Y’ = [gamma corrected] CbCr / UV = chrominance [always linear] Y′CbCr is not an absolute color space it is a way of encoding RGB information the actual color depends on the RGB primaries used Colors are often filtered down 2:1, 4:1 Many formulas!

NVIDIA Research Break RGB to Lab channels

NVIDIA Research Blur “a” channel (red-green)

NVIDIA Research Blur “b” channel (blue-yellow)

NVIDIA Research Blur “L” channel

NVIDIA Research Luminance from RGB

If three sources of same radiance appear R, G, B: green will appear the brightest, it has the luminous efficiency red will appear less bright blue will be the darkest

Luminance by NTSC: 0.2990 R + 0.5870 G + 0.1140 B based on phosphors in use in 1953

Luminance by CIE: 0.2126 R + 0.7152 G + 0.0722 B based on contemporary phosphors

Luminance by ITU: 0.2125 R + 0.7154 G + 0.0721 B

1/4 R + 5/8 G + 1/8 B works fine quick to compute: R>>2 + G>>1 + G>>3 + B>>3 range is [0, 252]

NVIDIA Research Cameras use sRGB

sRGB is a standard RGB color space (since 1996) uses the same primaries as used in studio monitors and HDTV and a gamma curve typical of CRTs allows direct display

The sRGB gamma cannot be expressed as a single numerical value the overall gamma is approximately 2.2, consisting of a linear (gamma 1.0) section near black, and a non-linear section elsewhere involving a 2.4 exponent

First need to map from sensor RGB to standard need calibration

NVIDIA Research sRGB from XYZ

RsRGB < 0.0031308 XYZ matrix(3x3) RGBsRGB linear transformation R´sRGB = 12.92 RsRGB RsRGB > 0.0031308 nonlinear distortion RGB´sRGB R´sRGB = 1.055 RsRGB(1/2.4) - 0.055

RGB8Bit R8Bit =quantization round[255 R'sRGB] linear relation between XYZ und sRGB:

X 0.4124 0.3576 0.1805 RsRGB Y = 0.2126 0.7152 0.0722 GsRGB

Z 0.0193 0.1192 0.9505 BsRGB

red green blue

Primaries according to ITU-R BT.709.3 NVIDIA Research Image processing in linear or non-linear space? Simulating physical world use linear light a weighted average of gamma-corrected pixel values is not a linear convolution! Bad for antialiasing want to numerically simulate lens? Undo gamma first

Dealing with human perception using non-linear coding allows minimizing perceptual errors due to quantization

NVIDIA Research 82 Linearizing from sRGB

= { } Csrgb R,G, B / 255  C  , C < 0.04045  12.92 srgb C =  linear  C + 0.0552.4  srgb ÷ >  , Csrgb 0.04045   0.055 

NVIDIA Research Film response curve

Middle follows a power function if a given amount of light turned half of a grain crystals to silver, the same amount turns again half of the rest Toe region the chemical process is just starting Shoulder region close to saturation Film has more dynamic range than print ~12bits

NVIDIA Research 84 Digital camera response curve

Digital cameras modify the response curve

Toe and shoulder preserve more dynamic range around dark and bright areas, at the cost of reduced contrast May use different response curves at different exposures impossible to calibrate and invert!

NVIDIA Research Auto White Balance

The dominant light source (illuminant) produces a color cast that affects the appearance of the scene objects The color of the illuminant determines the color normally associated with white by the human visual system Auto White Balance Identify the illuminant color Neutralize the color of the illuminant

NVIDIA Research (source: www.cambridgeincolour.com) Identify the color of the illuminant

Prior knowledge about the ambient light Candle flame light (18500K) Sunset light (20000K) Summer sunlight at noon (54000K) … Known reference object in the picture best: find something that is white or gray

Assumptions about the scene Gray world assumption (gray in sRGB space!)

NVIDIA Research Best way to do white balance

Grey card take a picture of a neutral object (white or gray) deduce the weight of each channel

If the object is recoded as rw, gw, bw

use weights k/rw, k/gw, k/bw where k controls the exposure

NVIDIA Research Brightest pixel assumption

Highlights usually have the color of the light source at least for dielectric materials

White balance by using the brightest pixels plus potentially a bunch of heuristics in particular use a pixel that is not saturated / clipped

NVIDIA Research x, y chromaticity diagram

Colors of a black-body heated at different temperatures fall on a curve (Planckian locus) Colors change non-linearly with temperature but almost linearly with reciprocal temperatures 1/T

NVIDIA Research Mapping the colors

For a given sensor pre-compute the transformation matrices between the sensor color space and sRGB at different temperatures FCam provides two precomputed transformations for 3200oK and 7000oK

Estimate a new transformation by interpolating between pre-computed matrices ISP can apply the linear transformation

NVIDIA Research Estimating the color temperature

Use scene mode Use gray world assumption (R = G = B) in sRGB space really, just R = B, ignore G Estimate color temperature in a given image apply pre-computed matrix to get sRGB for T1 and T2 calculate the average values R, B solve α, use to interpolate matrices (or 1/T)

1 1 1 = ( 1−α ) +α T T1 T2

1/T1 1/T 1/T2 = −α +α = −α +α RNVIDIA( 1Research )R1 R2, B (1 )B1 B2 JPEG Encoding

1. Transform RGB to YUV or YIQ and subsample color 2. DCT on 8x8 image blocks 3. Quantization 4. Zig-zag ordering and run-length encoding 5. Entropy coding

NVIDIA Research Converting RGB to YUV

YUV is not required for JPEG compression but it gives a better compression rate

Y = 0.299 * R + 0.587 * G + 0.114 * B U = -0.1687 * R – 0.3313 * G + 0.5 * B + 128 V = 0.5 * R – 0.4187 * G – 0.813 * B + 128

NVIDIA Research DCT

The frequency domain is a better representation it makes it possible to separate out information that isn’t very important to human perception the human eye is not very sensitive to high frequency changes

NVIDIA Research DCT on Image Blocks

The frequency domain is a better representation it makes it possible to separate out information that isn’t very important to human perception the human eye is not very sensitive to high frequency changes

The image is divided up into 8x8 blocks 2D DCT is performed independently on each block This is why, when a high degree of compression is requested, JPEG gives a “blocky” image result

NVIDIA Research Quantization

Quantization in JPEG aims at reducing the total number of bits in the compressed image Divide each entry in the frequency space block by an integer, then round Use a quantization matrix Q(u, v)

NVIDIA Research Quantization

Use larger entries in Q for the higher spatial frequencies the lower right part of the matrix

Based on psycho- physical studies maximize compression while minimizing perceptual distortion

After division the entries are smaller we can use fewer bits to encode them

NVIDIA Research Quantization

Different quantization matrices allow the user to choose how much compression to use trades off quality vs. compression ratio more compression means larger entries in Q

NVIDIA Research Original and DCT coded block

Click to edit Master text styles Second level Third level – Fourth level » Fifth level

NVIDIA Research Quantized and Reconstructed Blocks

Click to edit Master text styles Second level Third level – Fourth level » Fifth level

NVIDIA Research After IDCT and Difference from Original

Click to edit Master text styles Second level Third level – Fourth level » Fifth level

NVIDIA Research Same steps on a less homogeneous block

Click to edit Master text styles Second level Third level – Fourth level » Fifth level

NVIDIA Research Steps 2 and 3

Click to edit Master text styles Second level Third level – Fourth level » Fifth level

NVIDIA Research IDCT and Difference

Click to edit Master text styles Second level Third level – Fourth level » Fifth level

NVIDIA Research Run-Length Coding

The AC and DC components are treated differently After quantization we have many 0 AC components and most of the zero components are towards the lower right corner (high spatial frequencies) To take advantage of this, use zigzag scanning to create a 64-vector

NVIDIA Research Run-Length Coding

Replace values in a 64-vector (previously an 8x8 block) by a pair (RUNLENGTH, VALUE) where RUNLENGTH is the number of zeroes in the run and VALUE is the next non-zero value

From the first example we have (32, 6, -1, -1, 0, -1, 0, 0, 0, -1, 0, 0, 1, 0, 0, …, 0)

This becomes (0,6) (0,-1) (0,-1) (1,-1) (3,-1) (2,1) (0,0)

NVIDIA Research Entropy Coding

The coefficients are then entropy coded mostly using Huffman coding for DC coefficients, assumed to vary slowly, additionally Differential Pulse Code Modulation (DPCM) is used: if the first five DC coefficients are 150, 155, 149, 152, 144, we come up with DPCM code 150, 5, -6, 3, -8

These additional data compression steps are lossless most of the lossiness is in the quantization step

NVIDIA Research Alternatives?

JPEG 2000 ISO, 2000 better compression, inherently hierarchical, random access, … but much more complex than JPEG

JPEG XR Microsoft, 2006; ISO / ITU-T, 2010 good compression, supports tiling (random access without having to decode whole image), better color accuracy (incl. HDR), transparency, compressed domain editing

But JPEG stays too large an install base

NVIDIA Research