Camera Post-Processing Pipeline Kari Pulli Senior Director
NVIDIANVIDIA Research Research Topics
Filtering blurring sharpening bilateral filter Sensor imperfections (PNU, dark current, vignetting, …) ISO (analog – digital conversion with amplification) Demosaicking and denoising Gamma correction Color spaces Response curve White balance JPEG compression
NVIDIA Research What is image filtering?
Modify the pixels in an image based on some function of a local neighborhood of the pixels
10 5 3 Some function 4 5 1 7 1 1 7 Local image data Modified image data
NVIDIA Research Linear functions
Simplest: linear filtering replace each pixel by a linear combination of its neighbors The prescription for the linear combination is called the “convolution kernel”
10 5 3 0 0 0 4 5 1 0 0.5 0 7 1 1 7 0 1 0.5 Local image data kernel Modified image data
NVIDIA Research Convolution
f [m,n] = I ⊗ g = ∑I[m −k,n −l]g[k,l] k,l
I
NVIDIA Research Linear filtering (warm-up slide) t n e i c
i 1.0 f f e o c ? 0 Pixel offset original
NVIDIA Research Linear filtering (warm-up slide) t n e i c
i 1.0 f f e o c
0 Pixel offset original Filtered (no change)
NVIDIA Research Linear filtering t n e
i 1.0 c i f f e o c ? 0 Pixel offset original
NVIDIA Research Shift t n e
i 1.0 c i f f e o c
0 Pixel offset original shifted
NVIDIA Research Linear filtering t n e i c i f f
e 0.3 o c ? 0 Pixel offset original
NVIDIA Research Blurring t n e i c i f f
e 0.3 o c
0 Pixel offset original blurred (filter applied in both dimensions)
NVIDIA Research Blur examples t n
8 e 2.4 i c i f impulse f e
o 0.3 c
0 original Pixel offset filtered
NVIDIA Research Blur examples t n
8 e 2.4 i c i f impulse f e
o 0.3 c
0 original Pixel offset filtered t n
8 e
i 8 c i f edge 4 f 4 e 0.3 o c
0 original Pixel offset filtered
NVIDIA Research Linear filtering (warm-up slide)
2.0
1.0 ? 0 0 original
NVIDIA Research Linear filtering (no change)
2.0
1.0
0 0
original Filtered (no change)
NVIDIA Research Linear filtering
2.0
0.33 ? 0 0 original
NVIDIA Research (remember blurring) t n e i c i f f
e 0.3 o c
0 Pixel offset original Blurred (filter applied in both dimensions)
NVIDIA Research Sharpening
2.0
0.33
0 0
original Sharpened original
NVIDIA Research Sharpening example
1.7 11.2 t n
8 e 8 i c i f f e o c -0.25 -0.3 original sharpened (differences are accentuated; constant areas are left untouched)
NVIDIA Research Sharpening
before after
NVIDIA Research Bilateral filter
Tomasi and Manduchi 1998 http://www.cse.ucsc.edu/~manduchi/Papers/ICCV98.pdf Related to SUSAN filter [Smith and Brady 95] http://citeseer.ist.psu.edu/smith95susan.html Digital-TV [Chan, Osher and Chen 2001] http://citeseer.ist.psu.edu/chan01digital.html sigma filter http://www.geogr.ku.dk/CHIPS/Manual/f187.htm
NVIDIA Research Start with Gaussian filtering
Here, input is a step function + noise
J = f ⊗ I
output input
NVIDIA Research Start with Gaussian filtering
Spatial Gaussian f
J = f ⊗ I
output input
NVIDIA Research Start with Gaussian filtering
Output is blurred
J = f ⊗ I
output input
NVIDIA Research The problem of edges
Weight f(x, ξ) depends on distance from ξ to x Here, I(ξ) “pollutes” our estimate J(x) It is too different
J (x) = ∑ f (x,ξ) I(ξ) ξ
I ( x) x I ( ξ )
output input
NVIDIA Research Principle of Bilateral filtering
[Tomasi and Manduchi 1998] Penalty g on the intensity difference
1 J (x) = ∑ f (x,ξ) g(I(ξ) − I(x)) I(ξ) k(x) ξ
x I(x) I(ξ )
output input
NVIDIA Research Bilateral filtering
[Tomasi and Manduchi 1998] Spatial Gaussian f
1 J (x) = ∑ f (x,ξ) g(I(ξ) − I(x)) I(ξ) k(x) ξ
x
output input
NVIDIA Research Bilateral filtering
[Tomasi and Manduchi 1998] Spatial Gaussian f Gaussian g on the intensity difference 1 J (x) = ∑ f (x,ξ) g(I(ξ) − I(x)) I(ξ) k(x) ξ
x
output input
NVIDIA Research Normalization factor
[Tomasi and Manduchi 1998] k(x) = ∑ f (x,ξ) g(I(ξ) − I(x)) ξ 1 J (x) = ∑ f (x,ξ) g(I(ξ) − I(x)) I(ξ) k(x) ξ
x
output input
NVIDIA Research Bilateral filtering is non-linear
[Tomasi and Manduchi 1998] The weights are different for each output pixel
1 J (x) = ∑ f (x,ξ) g(I(ξ) − I(x)) I(ξ) k(x) ξ x x
output input
NVIDIA Research Other view
The bilateral filter uses the 3D distance
NVIDIA Research NVIDIA Research From “raw-raw” to RAW
Pixel Non-Uniformity each pixel in a CCD has a slightly different sensitivity to light, typically within 1% to 2% of the average signal can be reduced by calibrating an image with a flat-field image flat-field images are also used to eliminate the effects of vignetting and other optical variations Stuck pixels some pixels are turned always on or off identify, replace with filtered values Dark floor temperature adds noise sensors usually have a ring of covered pixels around the exposed sensor, subtract their signal
NVIDIA Research ISO = amplification in AD conversion
Sensor converts the continuous light signal to a continuous electrical signal
The analog signal is converted to a digital signal at least 10 bits (even on cell phones), often 12 or more (roughly) linear sensor response
Before conversion the signal can be amplified ISO 100 means no amplification ISO 1600 means 16x amplification +: can see details in dark areas better -: noise is amplified as well; sensor more likely to saturate
NVIDIA Research ISO
NVIDIA Research But too dark signal needs amplification
With too little light, and a given desired image brightness, two alternatives use low ISO and brighten digitally amplifies also post-gain noise (e.g., read noise) use high ISO to get brightness directly a bit less noise
But both amplification choices amplify noise ideally, you make sure the signal is high by using a slower exposure or larger aperture
NVIDIA Research Demosaicking or Demosaicing?
NVIDIA Research Demosaicking
NVIDIA Research Your eyes does it too…
NVIDIA Research Demosaicking
NVIDIA Research First choice: bilinear interpolation
Easy to implement
But fails at sharp edges
NVIDIA Research Two-color sampling of BW edge
Luminance profile
True full-color image
Sampled data
Linear interpolation
NVIDIA Research Typical color Moire patterns
Blow-up of electronic camera image. Notice spurious colors in the regions of fine detail in the plants.
NVIDIA Research Brewster’s colors — evidence of interpolation from spatially offset color samples
Scale relative to human photoreceptor size: each line covers about 7 photoreceptors.
NVIDIA Research Color sampling artifacts
NVIDIA Research R-G, after linear interpolation
NVIDIA Research Median filter
Replace each pixel by the median over N pixels (5 pixels, for these examples). Generalizes to “rank order” filters.
In: Out: Spike noise is removed
5-pixel neighborhood
Monotonic edges In: Out: remain unchanged
NVIDIA Research Degraded image
NVIDIA Research Radius 1 median filter
NVIDIA Research Radius 2 median filter
NVIDIA Research R – G
NVIDIA Research R – G, median filtered (5x5)
NVIDIA Research Two-color sampling of BW edge
Sampled data
Linear interpolation
Color difference signal
Median filtered color difference signal
Reconstructed pixel values
NVIDIA Research Recombining the median filtered colors
Linear interpolation Median filter interpolation
NVIDIA Research Take edges into account
Use bilateral filtering ADAPTIVE DEMOSAICKING avoid interpolating across edges Ramanath, Snyder, JEI 2003
NVIDIA Research Take edges into account HIGH-QUALITY LINEAR INTERPOLATION FOR DEMOSAICING OF BAYER-PATTERNED COLOR IMAGES Malvar, He, Cutler, ICASSP 2004 Predict edges and adjust assumptions luminance correlates with RGB edges = luminance change When estimating G at R if the R differs from bilinearly estimated R luminance changes Correct the bilinear estimate by the difference between the estimate and real value
NVIDIA Research Joint Demosaicing and Denoising Denoising in ISP? Hirakawa, Parks, IEEE TIP 2006
The experimental results confirm that the proposed method suppresses noise (CMOS/CCD image sensor noise model) while effectively interpolating the missing pixel components, demonstrating a significant improvement in image quality when compared to treating demosaicking and denoising problems independently.
NVIDIA Research Denoising using non-local means
Most image details occur repeatedly Each color indicates a group of squares in the image which are almost indistinguishable Image self-similarity can be used to eliminate noise it suffices to average the squares which resemble each other
Image and movie denoising by nonlocal means Buades, Coll, Morel, IJCV 2006
NVIDIA Research Restoring a highly compressed image
NL-means is able to remove block artifacts due to compression but at the cost of removing some details, as the difference between the compressed and restored image shows
NVIDIA Research BM3D (Block Matching 3D)
NVIDIA Research How many bits are needed for smooth shading?
With a given adaptation, human vision has contrast sensitivity ~1% call black 1, white 100 you can see differences 1, 1.01, 1.02, … needed step size ~ 0.01 98, 99, 100 needed step size ~ 1 with linear encoding delta 0.01 – 100 steps between 99 & 100 wasteful delta 1 – only 1 step between 1 & 2 lose detail in shadows instead, apply a non-linear power function, gamma provides adaptive step size
NVIDIA Research 61 Gamma encoding
With 6 bits available (for illustration below) for encoding linear loses detail in the dark end
Raise intensity X to power Xγ where γ = 1/2.2 then encode Display applies γ = 2.5 to get back to linear light the difference boosts colors to compensate for a dark viewing environment
NVIDIA Research 62 Gamma on displays 1
CRT inherent gamma of ~2.35 – 2.55 0 1 due to electrostatics of the cathode and electron gun just a coincidence that has an almost perfect inverse match to human vision system non-linearity!
LCD no inherent gamma applied as a look-up table to match with conventions
If gamma is not right, both colors and intensities shift example: if (0, 255, 127)is not gamma corrected, red channel remains 0, green remains 255, blue is decreased by the display
NVIDIA Research 63 Display brightness and contrast
ΙΙ==αδαδγ γ Brightness and contrast knobs control α and γ Which one controls which?
Brightness controls γ, contrast controls α
NVIDIA Research Gamma encoding
With the “delta” ratio of 1.01 need about 480 steps to reach 100 takes almost 9 bits
8 bits, nonlinearly encoded sufficient for broadcast quality digital TV contrast ratio ~ 50 : 1
With poor viewing conditions or display quality fewer bits needed
NVIDIA Research 65 Gamma summary
At the camera or encoding level apply a gamma of around 1 / 2.2 The CRT applies a gamma of 2.5 The residual exponent 2.2 / 2.5 boosts the colors to compensate for the dark environment
See http://www.poynton.com/GammaFAQ.html http://www.poynton.com/notes/color/GammaFAQ.html http://www.poynton.com/PDFs/Rehabilitation_of_gamma.pdf
NVIDIA Research 66 The CIE XYZ System
A standard created in 1931 z by CIE Commission Internationale de L'Eclairage Defined in terms of three color matching functions x Given an emission spectrum, y we can use the CIE matching functions to obtain the x, y and z coordinates y corresponds to luminance perception
NVIDIA Research NVIDIA Research The CIE Chromaticity Diagram
Intensity is measured as the distance from origin black = (0, 0, 0)
Chromaticity coordinates give a notion of color independent of brightness
A projection of the plane x + y + z = 1 yields a chromaticity value dependent on dominant wavelength (= hue), and excitation purity (= saturation) the distance from the white at (1/3, 1/3, 1/33)
NVIDIA Research More About Chromaticity
Dominant wavelengths go around the perimeter of the chromaticity blob a color’s dominant wavelength is where a line from white through that color intersects the perimeter some colors, called nonspectral colors, don’t have a dominant wavelength (which ones? colors that mix red and blue ) Excitation purity is measured in terms of a color’s position on the line to its dominant wavelength Complementary colors lie on opposite sides of white, and can be mixed to get white complement of blue is yellow complement of red is cyan
NVIDIA Research Perceptual (non-)uniformity
The XYZ color space is not perceptually uniform! Enlarged ellipses of constant color in XYZ space
NVIDIA Research CIE L*a*b*: uniform color space
Lab is designed to approximate human vision it aspires to perceptual uniformity L component closely matches human perception of lightness
A good color space for image processing
NVIDIA Research Gamuts
Not every device can reproduce every color A device’s range of reproducible colors is called its gamut x
NVIDIA Research YUV, YCbCr, …
Family of color spaces for video encoding including in FCam, video and viewfinder usually YUV Channels Y = luminance [linear]; Y’ = luma [gamma corrected] CbCr / UV = chrominance [always linear] Y′CbCr is not an absolute color space it is a way of encoding RGB information the actual color depends on the RGB primaries used Colors are often filtered down 2:1, 4:1 Many formulas!
NVIDIA Research Break RGB to Lab channels
NVIDIA Research Blur “a” channel (red-green)
NVIDIA Research Blur “b” channel (blue-yellow)
NVIDIA Research Blur “L” channel
NVIDIA Research Luminance from RGB
If three sources of same radiance appear R, G, B: green will appear the brightest, it has the luminous efficiency red will appear less bright blue will be the darkest
Luminance by NTSC: 0.2990 R + 0.5870 G + 0.1140 B based on phosphors in use in 1953
Luminance by CIE: 0.2126 R + 0.7152 G + 0.0722 B based on contemporary phosphors
Luminance by ITU: 0.2125 R + 0.7154 G + 0.0721 B
1/4 R + 5/8 G + 1/8 B works fine quick to compute: R>>2 + G>>1 + G>>3 + B>>3 range is [0, 252]
NVIDIA Research Cameras use sRGB
sRGB is a standard RGB color space (since 1996) uses the same primaries as used in studio monitors and HDTV and a gamma curve typical of CRTs allows direct display
The sRGB gamma cannot be expressed as a single numerical value the overall gamma is approximately 2.2, consisting of a linear (gamma 1.0) section near black, and a non-linear section elsewhere involving a 2.4 exponent
First need to map from sensor RGB to standard need calibration
NVIDIA Research sRGB from XYZ
RsRGB < 0.0031308 XYZ matrix(3x3) RGBsRGB linear transformation R´sRGB = 12.92 RsRGB RsRGB > 0.0031308 nonlinear distortion RGB´sRGB R´sRGB = 1.055 RsRGB(1/2.4) - 0.055
RGB8Bit R8Bit =quantization round[255 R'sRGB] linear relation between XYZ und sRGB:
X 0.4124 0.3576 0.1805 RsRGB Y = 0.2126 0.7152 0.0722 GsRGB
Z 0.0193 0.1192 0.9505 BsRGB
red green blue
Primaries according to ITU-R BT.709.3 NVIDIA Research Image processing in linear or non-linear space? Simulating physical world use linear light a weighted average of gamma-corrected pixel values is not a linear convolution! Bad for antialiasing want to numerically simulate lens? Undo gamma first
Dealing with human perception using non-linear coding allows minimizing perceptual errors due to quantization
NVIDIA Research 82 Linearizing from sRGB
= { } Csrgb R,G, B / 255 C srgb , C < 0.04045 12.92 srgb C = linear C + 0.0552.4 srgb ÷ > , Csrgb 0.04045 0.055
NVIDIA Research Film response curve
Middle follows a power function if a given amount of light turned half of a grain crystals to silver, the same amount turns again half of the rest Toe region the chemical process is just starting Shoulder region close to saturation Film has more dynamic range than print ~12bits
NVIDIA Research 84 Digital camera response curve
Digital cameras modify the response curve
Toe and shoulder preserve more dynamic range around dark and bright areas, at the cost of reduced contrast May use different response curves at different exposures impossible to calibrate and invert!
NVIDIA Research Auto White Balance
The dominant light source (illuminant) produces a color cast that affects the appearance of the scene objects The color of the illuminant determines the color normally associated with white by the human visual system Auto White Balance Identify the illuminant color Neutralize the color of the illuminant
NVIDIA Research (source: www.cambridgeincolour.com) Identify the color of the illuminant
Prior knowledge about the ambient light Candle flame light (18500K) Sunset light (20000K) Summer sunlight at noon (54000K) … Known reference object in the picture best: find something that is white or gray
Assumptions about the scene Gray world assumption (gray in sRGB space!)
NVIDIA Research Best way to do white balance
Grey card take a picture of a neutral object (white or gray) deduce the weight of each channel
If the object is recoded as rw, gw, bw
use weights k/rw, k/gw, k/bw where k controls the exposure
NVIDIA Research Brightest pixel assumption
Highlights usually have the color of the light source at least for dielectric materials
White balance by using the brightest pixels plus potentially a bunch of heuristics in particular use a pixel that is not saturated / clipped
NVIDIA Research Color temperature x, y chromaticity diagram
Colors of a black-body heated at different temperatures fall on a curve (Planckian locus) Colors change non-linearly with temperature but almost linearly with reciprocal temperatures 1/T
NVIDIA Research Mapping the colors
For a given sensor pre-compute the transformation matrices between the sensor color space and sRGB at different temperatures FCam provides two precomputed transformations for 3200oK and 7000oK
Estimate a new transformation by interpolating between pre-computed matrices ISP can apply the linear transformation
NVIDIA Research Estimating the color temperature
Use scene mode Use gray world assumption (R = G = B) in sRGB space really, just R = B, ignore G Estimate color temperature in a given image apply pre-computed matrix to get sRGB for T1 and T2 calculate the average values R, B solve α, use to interpolate matrices (or 1/T)
1 1 1 = ( 1−α ) +α T T1 T2
1/T1 1/T 1/T2 = −α +α = −α +α RNVIDIA( 1Research )R1 R2, B (1 )B1 B2 JPEG Encoding
1. Transform RGB to YUV or YIQ and subsample color 2. DCT on 8x8 image blocks 3. Quantization 4. Zig-zag ordering and run-length encoding 5. Entropy coding
NVIDIA Research Converting RGB to YUV
YUV is not required for JPEG compression but it gives a better compression rate
Y = 0.299 * R + 0.587 * G + 0.114 * B U = -0.1687 * R – 0.3313 * G + 0.5 * B + 128 V = 0.5 * R – 0.4187 * G – 0.813 * B + 128
NVIDIA Research DCT
The frequency domain is a better representation it makes it possible to separate out information that isn’t very important to human perception the human eye is not very sensitive to high frequency changes
NVIDIA Research DCT on Image Blocks
The frequency domain is a better representation it makes it possible to separate out information that isn’t very important to human perception the human eye is not very sensitive to high frequency changes
The image is divided up into 8x8 blocks 2D DCT is performed independently on each block This is why, when a high degree of compression is requested, JPEG gives a “blocky” image result
NVIDIA Research Quantization
Quantization in JPEG aims at reducing the total number of bits in the compressed image Divide each entry in the frequency space block by an integer, then round Use a quantization matrix Q(u, v)
NVIDIA Research Quantization
Use larger entries in Q for the higher spatial frequencies the lower right part of the matrix
Based on psycho- physical studies maximize compression while minimizing perceptual distortion
After division the entries are smaller we can use fewer bits to encode them
NVIDIA Research Quantization
Different quantization matrices allow the user to choose how much compression to use trades off quality vs. compression ratio more compression means larger entries in Q
NVIDIA Research Original and DCT coded block
Click to edit Master text styles Second level Third level – Fourth level » Fifth level
NVIDIA Research Quantized and Reconstructed Blocks
Click to edit Master text styles Second level Third level – Fourth level » Fifth level
NVIDIA Research After IDCT and Difference from Original
Click to edit Master text styles Second level Third level – Fourth level » Fifth level
NVIDIA Research Same steps on a less homogeneous block
Click to edit Master text styles Second level Third level – Fourth level » Fifth level
NVIDIA Research Steps 2 and 3
Click to edit Master text styles Second level Third level – Fourth level » Fifth level
NVIDIA Research IDCT and Difference
Click to edit Master text styles Second level Third level – Fourth level » Fifth level
NVIDIA Research Run-Length Coding
The AC and DC components are treated differently After quantization we have many 0 AC components and most of the zero components are towards the lower right corner (high spatial frequencies) To take advantage of this, use zigzag scanning to create a 64-vector
NVIDIA Research Run-Length Coding
Replace values in a 64-vector (previously an 8x8 block) by a pair (RUNLENGTH, VALUE) where RUNLENGTH is the number of zeroes in the run and VALUE is the next non-zero value
From the first example we have (32, 6, -1, -1, 0, -1, 0, 0, 0, -1, 0, 0, 1, 0, 0, …, 0)
This becomes (0,6) (0,-1) (0,-1) (1,-1) (3,-1) (2,1) (0,0)
NVIDIA Research Entropy Coding
The coefficients are then entropy coded mostly using Huffman coding for DC coefficients, assumed to vary slowly, additionally Differential Pulse Code Modulation (DPCM) is used: if the first five DC coefficients are 150, 155, 149, 152, 144, we come up with DPCM code 150, 5, -6, 3, -8
These additional data compression steps are lossless most of the lossiness is in the quantization step
NVIDIA Research Alternatives?
JPEG 2000 ISO, 2000 better compression, inherently hierarchical, random access, … but much more complex than JPEG
JPEG XR Microsoft, 2006; ISO / ITU-T, 2010 good compression, supports tiling (random access without having to decode whole image), better color accuracy (incl. HDR), transparency, compressed domain editing
But JPEG stays too large an install base
NVIDIA Research