© 2000 Wiley-Liss, Inc. Cytometry 39:300–305 (2000)

Mean and of Ratio Estimators Used in Fluorescence Ratio Imaging

G.M.P. van Kempen1* and L.J. van Vliet2 1Central Analytical Sciences, Unilever Research Vlaardingen, Vlaardingen, The Netherlands 2Pattern Recognition Group, Faculty of Applied Sciences, Delft University of Technology, Delft, The Netherlands Received 7 April 1999; Revision Received 14 October 1999; Accepted 26 October 1999

Background: The ratio of two measured fluorescence Results: We tested the three estimators on simulated data, signals (called x and y) is used in different applications in real-world fluorescence test images, and comparative ge- fluorescence microscopy. Multiple instances of both sig- nome hybridization (CGH) data. The results on the simu- nals can be combined in different ways to construct dif- lated and real-world test images confirm the presented ferent ratio estimators. theory. The CGH show that our new estima- Methods: The and variance of three estimators for tor performs better than the existing estimators. the ratio between two random variables, x and y, are Conclusions: We have derived an unbiased ratio estima- discussed. Given n samples of x and y, we can intuitively tor that outperforms intuitive ratio estimators. Cytometry construct two different estimators: the mean of the ratio 39:300–305, 2000. © 2000 Wiley-Liss, Inc. of each x and y and the ratio between the mean of x and the mean of y. The former is biased and the latter is only asymptotically unbiased. Using the statistical characteris- tics of this estimator, a third, unbiased estimator can be Key terms: ratio imaging; CGH; ratio estimation; mean; constructed. variance

In fluorescence microscopy, ratio imaging is applied to article. We will assume x and y to be stochastic variables ϭ␮ ϭ ϭ␮ ϭ a number of applications. In ratio labeling, the ratio be- with E{ x} x X and E{ y} y Y. In the second tween the intensities of different fluorophores is used to section we study the statistical properties of two estima- expand the number of labels for an in situ hybridization tors of the ratio R. Based on these results we show in the procedure (1). The number of fluorophores that can be third section that, under certain conditions (known vari- spectrally separated by fluorescence microscopy normally ance, ), a third, unbiased estimator can be con- restricts the total number of labels. structed. The simulations and experiments in the fourth Fluorescence ratio imaging is also used to measure spa- section support the presented theory. In the fifth section, tial and temporal differences in ion concentrations within we discuss the application of the presented theory on a single cell. This is achieved by using fluorophores whose CGH ratio imaging. We conclude in the sixth section. excitation or emission spectrum changes as a function of the Ca2ϩ or pH concentration (2). MEAN AND VARIANCE OF TWO In a third application of ratio imaging, known as com- RATIO ESTIMATORS parative genome hybridization (CGH) (3,4), one tries to Given n samples of x and y, we can obtain two obvious estimate the DNA sequence copy number as a function of estimators for the ratio R ϭ X/Y: the chromosomal location. This is achieved by measuring the hybridization between “tumor” DNA and “normal” x x៮ r ϭ ͩ ͪ r ϭ (2) DNA to detect gene amplifications and deletions (5). 1 y 2 y៮ In fluorescence ratio imaging, one is interested in the ratio R between two random variables X and Y, This research was presented at ASCI’95, First Annual Conference of the Advanced School for Computing and Imaging, Heijen, The Netherlands, 16–18 May 1995. X R ϭ . (1) The research was performed while G.M.P. van Kempen was a member Y of the Pattern Recognition Group. *Correspondence to: Dr. Ir. Geert M.P. van Kempen, Unilever Research Vlaardingen, Olivier van Noortlaan 120, 3133 AT Vlaardingen, P.O. Box In practice, one cannot measure X and Y, but only “noisy” 114, 3130 AC Vlaardingen, The Netherlands. realizations of X and Y, termed x and y in the present E-mail: [email protected]. FLUORESCENCE RATIO IMAGING 301

with x៮ the average of x. To find an approximate expres- x៮ x៮ x៮ 2 var͑r ͒ ϭ varͩ ͪ ϭ Eͭͩ Ϫ Eͭ ͮͪ ͮ sion for the expectation of r1 and r2, we have used a 2 y៮ y៮ y៮ ␮ ␮ Taylor series expansion of x/y around x, y: x៮ ␮ 2 Ϸ Ϫ x Eͭͩ ៮ ␮ ͪ ͮ x y y ץ x ץ x x Ϸ ͯ ϩ ͑x Ϫ ␮ ͒ ͩ ͪͯ ϩ ͑y Ϫ ␮ ͒ ͩ ͪͯ ͑ ៮ ͒ ␮2 ͑ ៮ ͒ ␮ ͑ ៮ ៮ ͒ var x x var y 2 x cov x, y ץ y ץ x y y ␮ ,␮ x y ␮ ,␮ y y ␮ ,␮ Ϸ ϩ Ϫ x y x y x y ␮2 ␮4 ␮3 y y y ͑ ͒ ␮2 ͑ ͒ ␮ ͑ ͒ x 1 var x x var y 2 x cov x, y 2ץ 1 2 Ϸ ͩ ϩ Ϫ ͪ ϩ ͑ x Ϫ ␮ ͒ ͩ ͪͯ ␮2 ␮4 ␮3 . (7) x2 y n y y y ץ x 2 ␮x,␮y

Estimators r1 and r2 have, only in first-order Taylor series x expansion, an equal variance, which diminishes for an 2ץ 1 ϩ ͑ y Ϫ ␮ ͒2 ͩ ͪͯ 3 ϱ y2 y infinite number of samples (n ). Therefore both ץ y 2 ␮x,␮y estimators r1 and r2 are consistent. An expression similar to equations (6) and (7) for the variance of x/y was found .(x by Kendall and Stuart (6 2ץ ϩ ͑ Ϫ ␮ ͒͑ Ϫ ␮ ͒ ͩ ͪͯ ץ ץ x x y y x y y ␮ ␮ x, y AN UNBIASED RATIO ESTIMATOR In the previous section, we found that the estimator r 3 2 x is only asymptotically unbiased. However, having found ץ ץ ϩ Oͩͩ͑x Ϫ ␮ ͒ ϩ ͑y Ϫ ␮ ͒ ͪ ͩ ͪͪ. ͑3͒ y y an analytical expression of the bias, we can derive anץ x yץ x

unbiased estimator for R based on r2: ៮ ៮ ␮ ␮ A Taylor series expansion of x/y around x, y is similar ៮ ␮ ͑ ͒ to equation (3). The mean of r1 and r2 can be found by x 1 x cov x, y r ϭ Ϫ ͩ var͑ y͒ Ϫ ͪ . (8) applying the expectation operator to the individual terms 3 y៮ n ␮3 ␮2 (ignoring all terms higher than two), y y The expectation for this estimator is, of course, ␮ ␮ ͑ ͒ x x x x cov x, y E͕r ͖ ϭ Eͭͩ ͪͮ ϭ Eͭ ͮ Ϸ ϩ var͑ y͒ Ϫ ៮ ␮ ͑ ͒ ␮ 1 ␮ ␮3 ␮2 x 1 x cov x, y x y y y y y E͕r ͖ ϭ Eͭ Ϫ ͩ var͑ y͒ Ϫ ͪͮ Ϸ . (9) 3 y៮ n ␮3 ␮2 ␮ (4) y y y and The variance of this estimator yields

៮ ␮ ␮ ͑ ៮ ៮ ͒ ␮ x៮ x៮ 2 x x x cov x, y x ͑ ͒ ϭ ͭͩ Ϫ ͭ ͮͪ ͮ ϭ ͑ ͒ ͕ ͖ ϭ ͭ ͮ ϭ ϩ ͑ ៮ ͒ Ϫ Ϸ var r3 E E var r2 . (10) E r2 E ៮ ␮ var y ␮3 ␮2 ␮ y៮ y៮ y y y y y 1 ␮ cov͑ x, y͒ ϩ ͩ ͑ ͒ x Ϫ ͪ EXPERIMENTS var y ␮3 ␮2 (5) n y y Some simulations were performed to support the ex-

pressions for the mean and variance of r1, r2, and r3.In with E{r} the expectation of r, var( y) the variance of y, the first we used computer-generated noise; and cov( x, y) the covariance of x and y. It is clear from in the second experiment we used noise images generated these two expressions that r2 is asymptotically unbiased in fluorescence image acquisition. 3 ϱ ϭ␮ ␮ (lim n E{r2} x/ y) and that r1 is a biased Simulations estimator of R. An approximation of the variance of r1 and r2 is obtained by using the first-order terms of the Taylor An image of size n ϫ m was filled with m realizations ␮ ␴ series expansion: of x, N( x, x), each containing n samples (numerical recipes) (7) and another image was filled with realizations ␮ ␴ x x x 2 of y, N( y, y). Figure 1 shows the mean and variance of ͑ ͒ ϭ ͩͩ ͪͪ ϭ ͭͩͩ ͪ Ϫ ͭͩ ͪͮͪ ͮ var r1 var E E r1, r2, and r3 as a function of the number of samples n. y y y ␮ ␮ Note that r2 asymptotically converges to x/ y, r1 re- x ␮ 2 mains biased even for large values of n, whereas r is Ϸ ͩͩ ͪ Ϫ xͪ 3 E ␮ unbiased even for small number of samples. The of y y r and r are in agreement with the predictions, whereas ͑ ͒ ␮2 ͑ ͒ ␮ ͑ ͒ 2 3 1 var x x var y 2 x cov x, y Ϸ ͩ ϩ Ϫ ͪ the mean of r1 converges to a slightly higher ratio than ␮2 ␮4 ␮3 (6) n y y y predicted when using a second-order Taylor approxima- tion, as in equation (5). The estimated of r1 and and r3 are in agreement with the theoretical variances as 302 VAN KEMPEN AND VAN VLIET

␮ ϭ␮ ϭ ␴ ϭ␴ ϭ FIG. 1. The mean and variance of r1, r2, and r3 measured as a function of n, with x y 40.0 and x y 8.0. Each point is an average of 10,000 realizations.

␮ ϭ ␮ ϭ FIG. 2. The mean of r1, r2, and r3 measured as a function of n using Poisson noise, with x 200 and y 100.0 on the left and vice versa on the ␴ ϭ ͌␮ ␴ ϭ ͌␮ right. In all cases, x x and y y. Each point is an average of 10,000 realizations. derived in the second section. The estimated variance of r1 is larger than we have expected from our calculations when using only a first-order Taylor approximation (indi- cated by the continuous line in the right graph). We performed a second simulation experiment with Poisson noise to test ratios unequal to one. Figure 2 shows that for a ratio of 0.5 and 2.0 the results are in good agreement with the expected ratio values. We tested the three ratio estimators on signals hav- ing a covariance. We simulated the covariance by gen- erating a Poisson noise image with a constant mean and used this image as the mean for the Poisson pro- cesses that generated the X and Y image. Setting the mean to 100.0, we obtained images with a variance of 200.0 and a covariance of 100.0. The results of this experiment are shown in Figure 3. The results FIG. 3. The mean of r1, r2, and r3 measured as a function of n using ␮ ϭ␮ ϭ Poisson noise, with x y 100.0. The variance is equal to 200.0, and show a good correspondence with the expected ratio the covariance is 100.0. Each point is an average of 10,000 realizations. values.

Experiments on Real Data KAF 1400 CCD chip; Photometrics, Tucson, AZ) to ac- To test the presented theory in practice, we performed quire a fluorescence image of a homogenous fluorescence the following experiment. We used a Nikon inverted mi- sample. The acquired data are distorted by several noise croscope equipped with a Photometrics camera (with a sources (8): FLUORESCENCE RATIO IMAGING 303

FIG. 4. The mean and variance of r1, r2, and r3 measured as a function of n. The mean value of the two noise images was set to 100.0. The variances of the two images are 328.84 and 329.37.

● photon shot noise (Poisson noise due to the counting of electrons induced by single photons) ● read-out noise (Gaussian noise caused by the pream- plifier) ● quantitization noise (uniform noise caused by analog- to-digital conversion) ● dark current (very small, Poisson distributed signal) ● bias, space-dependent offset signal.

Such a noise image will not have a constant mean number of photons over the image due to nonhomogene- ity of the illumination. If we acquire a second image of the same scene and subtract the two, we are left with a noise image of zero mean and twice the variance. This can be repeated to produce a second independent noise image.

Both noise images can be added to a constant valued FIG. 5. The mean of rmn, r1, and r2 measured as function of n, with a image, after which the ratio estimators r , r , and r can constant number of samples divided over m and n in such a way that m ϫ 1 2 3 n ϭ 32, ␮ ϭ␮ ϭ 36.0, and ␴ ϭ␴ ϭ 6.0. be applied. Noise images acquired this way will contain x y x y photon shot noise, read-out noise, and quantization noise. Figure 4 shows the mean and variance of the ratio of the two noise images, as estimated by the three estimators r1, where m is the number of profiles and n is the number of r2, and r3 as a function of the number of samples n. x, y values at a certain position on each profile. Equation (11) shows that estimator r is identical to applying APPLICATION TO CGH nm estimator r2 m times on n samples followed by averaging CGH (3–5) estimates the DNA sequence copy number over the obtained m ratios. An approximation of the as function of the chromosomal location. Ratio imaging of expectation and variance of rmn can be found with a “tumor” DNA and “normal” DNA allows detection of gene Taylor series expansion: amplifications and deletions. To improve the precision and accuracy of such a ratio 1 ͕ ͖ ϭ ͕ ͖ ͑ ͒ ϭ ͑ ͒ profile, it is common practice in CGH analysis to average E rmn E r1 var rmn var r2 . (12) over a number of ratio profiles of the same chromosome m

(i.e., r2 is used to calculate one profile, and r1 to average The expressions for the mean and variance of rmn show these profiles). If we call this ratio estimator rmn,wecan that averaging over ratio profiles only improves the pre- write it as, cision of the estimation of R. It does not improve the 1 m 1 n 1 n 1 m x៮ accuracy. ϭ ͸ ͸ ͸ ϭ ͸ rmn ͭ xiͲ yiͮ ͭ ͮ (11) A simulation experiment was performed to compare m n n m y៮ jϭ1 iϭ1 iϭ1 jϭ1 the performance of estimator rmn with r1 and r2 for a 304 VAN KEMPEN AND VAN VLIET

FIG. 6. Image of the metaphase (left) from which a single chromosome has been selected. The right three images show the “normal” or “reference” chromosome. The “tumor” or “unknown” chromosome, and the mask used to calculate the ratio.

FIG. 7. The mean (left) and variance (right) of r1, r2, and r3 (with and without estimating the covariance) measured as function of n. constant number of m ϫ n samples. Two images were straightening of the chromosome pixel data. We used a filled with Gaussian distributed noise, and the ratios be- normalized Gaussian convolution to estimate the back- tween both images were estimated using rmn, r1, and r2. ground intensity, which was subtracted from the chromo- The mean and variance of rmn were calculated as function some intensity values (9). To correct for the different of the ratio between n and m; the mean and variance of chromatic efficiencies of the color filters and camera used, ϫ r1 and r2 were calculated from m n samples. the intensities of the two CGH signals of the chromosome Figure 5 shows the results of an experiment with rmn, were scaled such that the average intensities were equal. ϫ r1, and r2 where m n equals 32. Note that estimator r1 We calculated the mean and variance of r1, r2, and r3 has already converged for this number of samples. It is from a region of interest (mask in Fig. 6), with size n,in clear that estimator rmn yields a suboptimal estimation of the middle of both chromosomes that excluded the cen- the ratio for a given number of m ϫ n samples. This tromeric and telomeric regions of the chromosomes. Fig- shows that averaging before taking the ratio yields a better ure 7 shows the mean and variance values of the three ratio estimate. estimators as a function of the number of samples n, taken In a second CGH experiment we used CGH control in the horizontal direction of the image. The mean and images to verify whether the differences between estima- variance were estimated over 95 realizations (the vertical tors r1, r2, and r3 could be found on real data. The CGH dimension of the mask is 95 pixels). The experiment control images are made with normal DNA for both test shows that the estimator r3 outperforms the other two and reference DNA; thus, after calibration a ratio of 1.0 is estimators. All three estimators, however, do decrease to be expected. From a set of CGH images, we selected a their bias as function of the number of samples, which we straight chromosome (Fig. 6) to avoid errors due to the contribute to the fact, that for this application, our model FLUORESCENCE RATIO IMAGING 305 for the X and Y signals (constant signals with noise) is ACKNOWLEDGMENTS quite a crude model for the chromosomes. Nevertheless, We thank Jim Piper, Joe Gray, and Damir Sudar for all the results show that the estimator based on this simple fruitful discussions and for providing us with a CGH data- approximation (estimator r3) does improve the estimation set. This work was supported in part by the Royal Neth- of the ratio. erlands Academy of Arts and Sciences (KNAW) and by the Rolling Grants program of the Foundation for Fundamen- CONCLUSIONS tal Research in Matter (FOM). We thank the reviewers and In this article we have derived expressions for the mean the associate editor for their constructive comments to and variance of two intuitive ways for estimating the ratio improve the technical note. between two random variables. We have shown that one of these estimators (r1) is biased and that the other (r2)is only asymptotically unbiased. However, from the derived LITERATURE CITED expression for the bias of r2, we propose an unbiased 1. Nederlof PM. Methods for quantitative and multiple in situ hybridiza- ratio estimator r3. These results were supported by ex- tion. Leiden: Leiden University; 1991. periments. The noise data for these experiments were 2. Shotton D. Electronic light microscopy. New York: Wiley-Liss; 1993. 3. Kallioniemi A, Kallioniemi O-P, Sudar D, Rutovitz D, Gray JW, Wald- either computer generated or measured camera noise. man F, Pinkel D. Comparative genome hybridization for molecular For a particular application of ratio imaging, CGH, we cytogenetic analysis of solid tumors. Science 1982;258:818–821. have shown that the usual averaging over multiple ratio 4. Piper J, Rutovitz D, Sudar D, Kallioniemi A, Kallioniemi OP, Waldman profiles is not optimal. Measurements on CGH images FM, Gray JW, Pinkel D. Computer image analysis of comparative genomic hybridization. Cytometry 1995;19:10–26. confirm the differences between the ratio estimators, as 5. Moore DH, Pallavicini M, Cher ML, Gray JW. A t- for objective found in the other presented experiments. interpretation of comparative genomic hybridization (cgh) profiles. It should be noted that, when the population of realiza- Cytometry 1997;28:183–190. 6. Kendall M, Stuart A. Advanced theory of . London: Charles tions of X and Y is heterogeneous, the estimators r2 and Griffin & Company; 1977. r3 will mask such heterogeneity, and a of pri- 7. Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical mary x/y ratios will be necessary to reveal the heteroge- recipes in C, 2nd ed. Cambridge: Cambridge University Press; 1992. 8. van Vliet LJ, Sudar D, Young IT. Digital fluorescence imaging using neity. The example shown in Figure 6, therefore, only cooled charge-coupled device array cameras. In: Celis JE, editor. Cell makes sense because the ratio is known a priori to have an biology: a laboratory handbook, 2nd ed, volume 3. London: Academic expectation value of 1.0 for all positions. For a true bio- Press, 1998. p 109–120. 9. Ligtenberg KAW. Analysis of CGH images [master’s thesis]. Delft (The logical sample, averaging along the chromosome would Netherlands): Pattern Recognition Group, Delft University of Tech- not be a sensible procedure. nology; 1996.