XCELLENCE IN VIDEO

Image Sensor Color Calibration Using the Zynq-7000 All Programmable SoC

by Gabor Szedo Staff Video Design Engineer Xilinx Inc. [email protected]

Steve Elzinga Video IP Design Engineer Xilinx Inc. [email protected]

Greg Jewett Video Marketing Manager Xilinx Inc. [email protected]

14 Xcell Journal Fourth Quarter 2012 XCELLENCE IN VIDEO

Xilinx image- and video-processing cores and kits provide the perfect prototyping platform for camera developers.

mage sensors are used in a wide The signal power measured by a tristimulus values that a camera or range of applications, from cell detector can be expressed as: image sensor measures, such that the I phones and video surveillance spectral responses match those of the ∞ λ λ λ λ products to automobiles and missile P=∫ 0 I( )R( )S( )d CIE standard observer. systems. Almost all of these applica- tions require white-balance correction In order to get a color image, the WHITE BALANCE (also referred to as color correction) human eye, as well as photographic You may view any object under various in order to produce images with colors and video equipment, uses multiple lighting conditions—for example, illu- that appear correct to the human eye adjacent sensors with different spec- minated by natural sunlight, the light of regardless of the type of illumina- tral responses. Human vision relies on a fire, fluorescent or incandescent tion—daylight, incandescent, fluores- three types of light-sensitive cone cells bulbs. In all of these situations, human cent and so on. to formulate color perception. In vision perceives the object as having Implementing automatic white-bal- developing a based on the same color, a phenomenon called ance correction in a programmable human perception, the International “” or “color con- logic device such as a Xilinx® FPGA or Commission on Illumination (CIE) has stancy.” However, a camera with no Zynq™-7000 All Programmable SoC is defined a set of three color-matching adjustment or automatic compensa- likely to be a new challenge for many functions, x¯ (λ), y¯ (λ) and z¯ (λ). These tion for illuminants may register the developers who have used ASIC or can be thought of as the spectral sensi- color as varying. When a camera cor- ASSP devices previously. Let’s look at tivity curves of three linear light detec- rects for this situation, it is referred to how software running on an embed- tors that yield the CIE XYZ tristimulus as white-balance correction. ded processor, such as an ARM9 pro- values Px, Py, and Pz, known collec- According to the top equation at the cessing system on the Zynq-7000 All tively as the “CIE standard observer.” right of Figure 1, describing spectra of Programmable SoC, can control cus- Digital image sensors predominant- the illuminants, the reflective proper- tom image- and video-processing logic ly use two methods to measure tristim- ties of objects in a scene and the spec- to perform real-time pixel-level ulus values: a color filter array overlay tral sensitivity of the detector all con- color/white-balance correction. above inherently monochromatic pho- tribute to the resulting color measure- To set the stage for how this is todiodes; and stacked photodiodes ment. Therefore, even with the same done, it’s helpful to first examine some that measure the absorption depth of detectors, measurement results will basic concepts of color perception and photons, which is proportional to mix information from innate object camera calibration. wavelength λ. colors and the spectrum of the illumi- However, neither of these methods nant. White balancing, or the separa- CAMERA CALIBRATION creates spectral responses similar to tion of innate reflective properties R(λ) The measured color and intensity of those of the human eye. As a result, from the spectrum of the illuminant reflections from a small, uniform sur- color measurements between different I(λ), is possible only if: face element with no inherent light photo detection and reproduction equip- emission or opacity depend on three ment will differ, as will measurements • Some heuristics, e.g. the spatial functions: the spectral power distribu- between image sensors and human frequency limits on the illuminant, tion of the illuminant, I(λ); the spec- observers when photographing the or object colors are known a pri- tral reflective properties of the surface same scene—the same (Iλ) and (Rλ). ori. For example, when photo- material, R(λ); and the spectral sensi- Thus, the purpose of camera cali- graphing a scene with natural sun- tivities of the imager, S(λ). bration is to transform and correct the light, it is expected that the spec-

Fourth Quarter 2012 Xcell Journal 15 XCELLENCE IN VIDEO

Figure 1 – Spectral responses of the “standard observer”

tral properties of the illuminant ly 8-, 10- or 12-bit) vector of R,G,B prin- that D1 S f1 =D2 S f2, where S is the will remain constant over the cipal color components. Based on cone sensitivity matrix. In the LMS entire image. Conversely, when an whether you are going to perform map- (long-, medium-, short-wave sensitive image is projected onto a white ping linearly and whether color com- cone-response space), screen, spectral properties of the ponents are corrected independently, L /L 00 illuminant change dramatically the mapping function can be catego- 2 1 D = D –1D = 0 M /M 0 from pixel to pixel, while the rized as shown in Table 1. 1 2 2 1 00S /S reflective properties of the scene 2 1 (the canvas) remain constant. THE VON KRIES HYPOTHESIS The advantage of this method is its rela- When both illuminant and reflec- The simplest, and most widely used tive simplicity and easy implementation tive properties change abruptly, it method for camera calibration is with three parallel multipliers as part of is very difficult to isolate the based on the von Kries Hypothesis either a digital image sensor or the scene’s objects and illuminants. [1], which aims to transform colors to image sensor pipeline (ISP): the LMS , then performs • Detector sensitivity S(λ) and the correction using only three multipli- k L' L 00L illuminant spectrum I(λ) do not ers on a per-channel basis. The k M' 0 M 0 M have zeros in the range of spec- hypothesis rests on the assumption 00k S' S S trum observed. You cannot gain that color constancy in the human any information about the reflec- visual system can be achieved by indi- In a practical implementation, instead tive properties of objects outside vidually adapting the gains of the of using the LMS space, the RGB color the illuminant spectrum. For three cone responses; the gains will space is used to adjust channel gains example, when a scene is illumi- depend on the sensory context, that such that one color, typically white, is nated by a monochromatic red is, the color history and surround. represented by equal R,G,B values. source, a blue object will look just Cone responses from two radiant However, adjusting the perceived cone

as black as a green one. spectra, f1 and f2, can be matched by responses or R,G,B values for one an appropriate choice of diagonal color does not guarantee that other

PRIOR METHODS adaptation matrices D1 and D2 such colors are represented faithfully. In digital imaging systems, the prob- lem of camera calibration for a known illuminant can be represented Linear Nonlinear as a discrete, three-dimensional vec- tor function: Independent von Kries Component correction

_x'=F(x) Dependent Color-correction matrix Full lookup table where F(x) is the mapping vector Table 1 – Camera calibration methods function and x_ is the discrete (typical-

16 Xcell Journal Fourth Quarter 2012 XCELLENCE IN VIDEO

COMPONENT CORRECTION table is 230 word (4 Gbytes) deep and COLOR-CORRECTION MATRIX For any particular color component, 30 bits wide. The second problem is The calibration method we describe in the von Kries Hypothesis can only rep- initialization values. Typically only a this article demonstrates how you can resent linear relationships between few dozen to a few hundred camera use a 3x3-matrix multiplier to perform input and output. Assuming similar input/expected-value pairs are estab- a coordinate transformation aiming to data representation (e.g. 8, 10 or 12 lished via calibration measurements. orthogonalize measured red, green bits per component), unless k is 1.0, The rest of the sparse lookup-table and blue components. The advantage some of the output dynamic range is values have to be interpolated. This of this method over the von Kries unused or some of the input values interpolation task is not trivial, as the approach is that all three color chan- correspond to values that need to be heterogeneous component input-to- nels are involved in the calibration clipped/clamped. Instead of multipli- output functions are neither mono- process. For example, you can incor- ers, you can represent any function tone nor smooth. Figure 2a presents porate information from the red and defining input/output mapping using the measured vs. expected-value blue channels when adjusting green- small, component-based lookup pairs for R,G,B input (rows) and out- channel gains. Also, this solution lends tables. This way you can address put (columns) values. itself well for camera calibration and sensor/display nonlinearity and A visual evaluation of empirical white-balance correction to be per- gamma correction in one block. In an results interpolated (Figure 2b) did formed simultaneously using the same FPGA image-processing pipeline not show significant quality improve- module, updating matrix coefficients implementation, you can use the ment over a gamma-corrected, color- to match changing illuminants smooth- Xilinx Gamma Correction IP block to correction matrix-based solution. ly on a frame-by-frame basis. perform this operation. Most image- or video-processing sys- The two simplest algorithms for tems are constrained on accessible white-balance correction—the Gray FULL LOOKUP TABLE bandwidth to external memory. The World and the algo- Camera calibration assigns an large size of the lookup table, which rithms—use the RGB color space. expected value to all possible cam- mandates external memory use; the The Gray World algorithm [2] is era input tristimulus values. A brute- significant bandwidth demand the per- based on the heuristics that although force approach to the problem is to pixel accesses pose; and the static different objects in a scene have dif- use a large lookup table containing nature of lookup-table contents (diffi- ferent, distinct colors, the average of expected values for all possible cult to reprogram on a frame-by-frame scene colors (average of red, green input RGB values. This solution has basis) limit practical use of a full LUT- and blue values) should result in a two drawbacks. The first is memory based solution in embedded video- neutral, gray color. Consequently, the size. For 10-bit components, the and image-processing applications. differences in R,G,B color values

Figure 2a – R,G,B measured vs. expected mapping values Figure 2b – R component output as a function of R,G,B inputs

Fourth Quarter 2012 Xcell Journal 17 XCELLENCE IN VIDEO

Figure 3 – Typical image sensor pipeline averaged over a frame provide infor- remove inherent scene colors. Also, the light source with inherent hue, such as mation about the illuminant color, method is easily compromised by satu- a candlelit picture with the flame in and correction should transform col- rated pixels. focus, may lead to fully saturated, white ors such that the resulting color aver- More-refined methods take advan- pixels present on the image. ages are identical. The Gray World tage of color-space conversions, where algorithm is relatively easy to imple- hue can be easily isolated from color OTHER WAYS TO IMPROVE ment. However, it introduces large saturation and luminance, reducing WHITE-BALANCE RESULTS errors in which inherent scene colors three-dimensional color correction to a Separating foreground and back- may be removed or altered in the one-dimensional problem. ground is another approach to color presence of large, vivid objects. For example, color gamut mapping correction. The autofocus logic, cou- The White Point [2] algorithm is builds a two-dimensional histogram in pled to multizone metering, in digital based on the assumption that the light- the YCC, YUV, L*a*b* or Luv color cameras allows spatial distinction of est pixels in an image must be white or spaces, and fits a convex hull around pixels in focus around the center and light gray. The difference in red, green the base of the histogram. The UV or the background around the edges. and blue channel maxima provides (Cr, Cb) averages are calculated and The assumption is that the objects information about the illuminant color, used to correct colors, such that the photographed, with only a few domi- and correction should transform colors resulting color UV, or CbCr histograms nant colors, are in focus at the center such that the resulting color maxima are centered on the neutral, or gray of the image. Objects in the distance are identical. However, to find the point in the YUV, YCC, Luv or Lab are closer to the edge, where the Gray white point, it’s necessary to rank pix- space. The advantage of these methods World hypothesis prevails. els by luminance values. In addition, is better color performance. The disad- Another technique centers on shape you may also have to perform spa- vantage is that implementation may detection. Face or skin-color detection tiotemporal filtering of the ordered list require floating-point arithmetic. helps cameras identify image content to suppress noise artifacts and aggre- All of the methods described above with expected hues. In this case, gate ranked results into a single, white may suffer from artifacts due to incor- white-balance correction can be limit- color triplet. The advantage of using the rect exposure settings or extreme ed to pixels with known, expected White Point algorithm is easy imple- dynamic ranges in scene illumination. hues. Color correction will take place mentation. The downside is that it too For example, saturated pixels in an to move the colors of these pixels clos- can introduce large errors and may image that’s illuminated by a bright er to the expected colors. The disad-

18 Xcell Journal Fourth Quarter 2012 XCELLENCE IN VIDEO

vantage of this method is the costly light, cool-white fluorescent, warm drop-off using the background around segmentation and recognition logic. fluorescent and incandescent. We the registered ColorChecker target. Most commercial applications also used an off-the-shelf color target In order to attenuate measurement combine multiple methods, using a (an X-Rite ColorChecker 24 Patch noise, we identified rectangular zones strategy of adapting to image contents Classic) with color patches of known within the color patches. Within these and photographic environment. [2] reflective properties and expected zones, we averaged (R,G,B) pixel data RGB and sRGB values. to represent each color patch with an ISPs FOR CAMERA CALIBRATION To begin the process of implement- RGB triplet. A MATLAB script with a AND COLOR CORRECTION ing the camera-calibration algorithm, GUI helps identify the patch centers Our implementation uses a typical we first placed the color target in the and calculates averaged RGB triplets image sensor pipeline, illustrated in light booth, flat against the gray back- corresponding to the expected RGB

Figure 3. We built the hardware com- ground of the light box. We made sure values of each color patch (Re, Ge, Be ). ponents of the ISP (the blue blocks) to position the color target such that We implemented the simulated with Xilinx image-processing cores illumination from all light sources annealing optimization method to identi- using configurable logic. Meanwhile, was as even as possible. fy color-correction coefficients and off- we designed the camera calibration Next we captured the images taken sets. The measured uncalibrated (R,G,B) and white-balancing algorithms as C by the sensor to be calibrated, with the color triplets are transformed to correct- code (pink blocks) running on one of all illuminants, with no color correc- ed (R',G',B') triplets using the Color the embedded ARM processors. This tion (using “bypass” color-correction Correction module of Figure 3. same ARM processor runs embed- settings: identity matrix loaded to the k k k R' R Roffs ded Linux to provide a user interface color-correction matrix). 11 12 13 G' = k k k G + Goffs to a host PC. We then used MATLAB® scripts 21 22 23 B' k k k B Boffs The portion of the ISP relevant to available from Xilinx to assist with 31 32 33 white balancing and camera calibra- compensating for barrel (geometric) tion is the feedback loop, including: lens distortion and lens shading (light The simulated annealing algorithm intensity dropping off toward the cor- minimizes an error function returning a

• The image statistics module, which ners). The MATLAB script allows us scalar. In the following discussion (Rk, gathers zone-based statistical data to identify control points on the Gk, Bk) reference a subset or superset of on a frame-by-frame basis; recorded images, then warps the measured color-patch pixel values. The image to compensate for barrel dis- user is free to limit the number of patch- • The embedded drivers and the tortion. The rest of the script esti- es included in the optimization (subset), application software, which ana- mates horizontal and vertical light or include a particular patch multiple lyzes the statistical information and programs the color-correction mod- ule on a frame-by-frame basis;

• The color-correction module, which performs color transformations on a pixel-by-pixel basis.

We implemented the ISP as part of the Zynq Video and Imaging Kit (ZVIK) 1080P60 Camera Image Processing Reference Design.

DETAILED ALGORITHM DESCRIPTION In order to calibrate the colors of our sensor, we used an off-the-shelf color-viewing booth (X-Rite Macbeth Judge II), or light box, which has four standard illuminants Figure 4 – Sensor images with different illuminants before lens correction with known spectra: simulated day-

Fourth Quarter 2012 Xcell Journal 19 XCELLENCE IN VIDEO

times, thereby increasing its relative • The sum of absolute differences WHITE BALANCING weight during the optimization process. between expected and trans- Using the camera-calibration method The integer n represents the number of formed triplets in the RGB color above, we established four sets of color- color patches selected for inclusion in space: correction coefficients and offsets, the optimization. If all patches are CCMk, k={1,2,3,4}, that result in optimal n included exactly once in the optimiza- E=∑k=0 |Rk'–Rek| +|Gk'– Gek|+|Bk'–Bek| color representation assuming that the tion process, for the X-Rite Color- illuminant is correctly identified. The Checker 24 Patch Classic, n=24. • The sum of squared differences white-balancing algorithm, implement- As the optimization algorithm has between expected and transformed ed in software running on the embedded freedom to set 12 variables (CCM triplets in the YUV color space: processor, has to perform the following coefficients and offsets) only, typi- operations on a frame-by-frame basis. n 2 2 cally no exact solution exists that E=∑k=0 (Uk'–Uek) +(Vk'–Vek) Using statistical information, it esti- maps all measured values to precise- mates the illuminant weights (Wk). ly the expected color patch values. • Or absolute differences between Weights are low-pass filtered to com- However, the algorithm seeks to min- expected and transformed triplets pensate for sudden scene changes, in the YUV color space: imize an error function to provide resulting in illuminant probabilities (pk). optimal error distribution over the The color-correction matrix module is n E= |U '–Ue | +|V '– Ve | range of patches used. ∑k=0 k k k k programmed with the combined CCMk We set up error functions that cal- values according to weights pk. culate one of the following: where U' and V' correspond to The advantage of this method is that R'G'B' values transformed to the a linear combination of calibration • The sum of squared differences YUV color space. Similarly, error CCMk values will limit color artifacts in between expected and trans- functions can be set up to the case scene colors and illuminant colors formed triplets in the RGB color L*u*v* or L*a*b* color spaces. You are not properly separated. In the case space: can use any of the above error of underwater photography, for exam- functions in the simulated anneal- ple, where a strong blue tinge is pres- n 2 2 2 E=∑k=0 (Rk'–Rek) +(Gk'–Gek) +Bk'–Bek) ing minimization. ent, a simple white-balancing algorithm such as Gray World would compensate to remove all blue, severely distorting the innate colors of the scene. For all illuminants k={1,2,3,4} with different scene setups in the light booth, we also recorded the two- dimensional YUV histograms of the scenes by binning pixel values by chrominance, and weighing each pixel by its luminance value (luminance- weighed chrominance histogram). This method de-prioritizes dark pixels, or those in which a small difference in R,G,B values results in large noise in the chrominance domain. Using a mask, we eliminated his- togram bins which pertain to vivid col- ors that cannot possibly originate from a neutral (gray or white) object illumi- nated by a typical illuminant (Figure 6). A typical mask contains nonzero values only around the neutral (white) point, where most illuminants are located. We hard-coded the masked two-dimen-

sional histogram values Hk (x,y), as Figure 5 – Color-calibrated, lens-corrected images with different illuminants well as the CCMk values, into the white-

20 Xcell Journal Fourth Quarter 2012 XCELLENCE IN VIDEO

quicker the filter responds to changes in lighting conditions. Finally, we programmed the color- correction module of the ISP (Figure 3) with a linear combination of the precal- culated color-correction coefficients

and offsets (CCMk):

4 CCM=∑k=1 pkCCMk

Real-time white-balance implemen- tation results (Figure 7) from a scene illuminated by both natural daylight and fluorescent light show significant improvement in perceived image quality and color representation. The Zynq Video and Imaging Kit, along with MATLAB scripts available from Xilinx, complement and provide an implementation example for the Figure 6 – Illuminants with different temperatures in CIE color space algorithms we have presented. Real-time color balancing is becoming increasingly challenging as resolutions and frame rates of industrial, consumer balancing application running on the Based on the measured histogram and automotive video applications embedded processor. differences D , normalized similarity k improve. The algorithm we have During real-time operation, the white- values are calculated using: described illustrates how software run- balancing application collects similar ning on an embedded processor, such two-dimensional, luminance-weighed 1/D w = ______i as the ARM9 cores of the Zynq process- chrominance histograms. The measured i 4 ∑ 1/D ing platform, can control custom two-dimensional histograms are also k=1 k image- and video-processing logic per- masked, and the sum of absolute differ- To avoid abrupt frame-by-frame forming pixel-level color correction. ences or sum of squared differences is tone changes, we smoothed nor- calculated among the four stored his- malized similarity values over time. References tograms and the measured one: We used a simple low-pass IIR fil- 1. H.Y. Chong, S.J. Gortler and T. Zickler, 15 15 2 ter, implementing “The von Kries Hypothesis and Basis for D = (H (x,y) – H(x,y)) k ∑k=0 ∑k=0 k Color Constancy,” Proceedings of the pi=cwi+(1–c)pi–1 IEEE International Conference on where Hk(x,y) are the precalculated ref- Computer Vision (ICCV), 2007. erence two-dimensional histograms per- where 0 < c < 1 controls the impulse 2. S. Bianco, F. Gasparini and R. Schettini, taining to known illuminants {k=1,2,3,4}, response of the IIR filter. The small- “Combining Strategies for White and H(x,y) is the real-time histogram er the values of c, the smoother the Balance,” Proceedings of SPIE (2007), measurement. transitions. The larger the value, the Volume 39, pages 65020D-65020D-9

Figure 7 – Scene captured with no correction (left) and with white-balance correction

Fourth Quarter 2012 Xcell Journal 21