<<

EURASIP Journal on Applied Signal Processing

Applications of Signal Processing in Astrophysics and Cosmology

Guest Editors: Ercan E. Kuruoglu and Carlo Baccigalupi

EURASIP Journal on Applied Signal Processing Applications of Signal Processing in Astrophysics and Cosmology

EURASIP Journal on Applied Signal Processing Applications of Signal Processing in Astrophysics and Cosmology

Guest Editors: Ercan E. Kuruoglu and Carlo Baccigalupi

Copyright © 2005 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in volume 2005 of “EURASIP Journal on Applied Signal Processing.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Editor-in-Chief Marc Moonen, Belgium

Senior Advisory Editor K. J. Ray Liu, College Park, USA Associate Editors Gonzalo Arce, USA Arden Huang, USA King N. Ngan, Hong Kong Jaakko Astola, Finland Jiri Jan, Czech Douglas O’Shaughnessy, Canada Kenneth Barner, USA Søren Holdt Jensen, Denmark Antonio Ortega, USA Mauro Barni, Italy Mark Kahrs, USA Montse Pardas, Spain Jacob Benesty, Canada Thomas Kaiser, Germany Wilfried Philips, Belgium Kostas Berberidis, Greece Moon Gi Kang, Korea Vincent Poor, USA Helmut Bölcskei, Switzerland Aggelos Katsaggelos, USA Phillip Regalia, France Joe Chen, USA Walter Kellermann, Germany Markus Rupp, Austria Chong-Yung Chi, Taiwan Lisimachos P. Kondi, USA Hideaki Sakai, Japan Satya Dharanipragada, USA Alex Kot, Singapore Bill Sandham, UK Petar M. Djurić, USA C.-C. Jay Kuo, USA Dirk Slock, France Jean-Luc Dugelay, France Geert Leus, The Netherlands Piet Sommen, The Netherlands Frank Ehlers, Germany Bernard C. Levy, USA Dimitrios Tzovaras, Greece Moncef Gabbouj, Finland Mark Liao, Taiwan Hugo Van hamme, Belgium Sharon Gannot, Israel Yuan-Pei Lin, Taiwan Jacques Verly, Belgium Fulvio Gini, Italy Shoji Makino, Japan Xiaodong Wang. USA A. Gorokhov, The Netherlands Stephen Marshall, UK Douglas Williams, USA Peter Handel, Sweden C. Mecklenbräuker, Austria Roger Woods, UK Ulrich Heute, Germany Gloria Menegaz, Italy Jar-Ferr Yang, Taiwan John Homer, Australia Bernie Mulgrew, UK

Contents

Editorial, Ercan E. Kuruoglu and Carlo Baccigalupi Volume 2005 (2005), Issue 15, Pages 2397-2399

Separation of Correlated Astrophysical Sources Using Multiple-Lag Data Covariance Matrices, L. Bedini, D. Herranz, E. Salerno, C. Baccigalupi, E. E. Kuruoğlu, and A. Tonazzini Volume 2005 (2005), Issue 15, Pages 2400-2412

Adapted Method for Separating Kinetic SZ Signal from Primary CMB Fluctuations, Olivier Forni and Nabila Aghanim Volume 2005 (2005), Issue 15, Pages 2413-2425

Detection of Point Sources on Two-Dimensional Images Based on Peaks, M. López-Caniego, D. Herranz, J. L. Sanz, and R. B. Barreiro Volume 2005 (2005), Issue 15, Pages 2426-2436

Blind Component Separation in Wavelet Space: Application to CMB Analysis, Y. Moudden, J.-F. Cardoso, J.-L. Starck, and J. Delabrouille Volume 2005 (2005), Issue 15, Pages 2437-2454

Analysis of the Spatial Distribution of by Multiscale Methods, J-L. Starck, V. J. Martínez, D. L. Donoho, O. Levi, P. Querre, and E. Saar Volume 2005 (2005), Issue 15, Pages 2455-2469

Cosmological Non-Gaussian Signature Detection: Comparing Performance of Different Statistical Tests, J. Jin, J.-L. Starck, D. L. Donoho, N. Aghanim, and O. Forni Volume 2005 (2005), Issue 15, Pages 2470-2485

Time-Scale and Time-Frequency Analyses of Irregularly Sampled Astronomical Time Series, C. Thiebaut and S. Roques Volume 2005 (2005), Issue 15, Pages 2486-2499

Restoration of Astrophysical Images—The Case of Poisson Data with Additive Gaussian Noise, H. Lantéri and C. Theys Volume 2005 (2005), Issue 15, Pages 2500-2513

A Data-Driven Multidimensional Indexing Method for Data Mining in Astrophysical Databases, Marco Frailis, Alessandro De Angelis, and Vito Roberto Volume 2005 (2005), Issue 15, Pages 2514-2520

Virtually Lossless Compression of Astrophysical Images, Cinzia Lastri, Bruno Aiazzi, Luciano Alparone, and Stefano Baronti Volume 2005 (2005), Issue 15, Pages 2521-2535

Astrophysical Information from Objective Prism Digitized Images: Classification with an Artificial Neural Network, Emmanuel Bratsolis Volume 2005 (2005), Issue 15, Pages 2536-2545 Multiband Segmentation of a Spectroscopic Line Data Cube: Application to the HI Data Cube of the Spiral NGC 4254, Farid Flitti, Christophe Collet, Bernd Vollmer, and François Bonnarel Volume 2005 (2005), Issue 15, Pages 2546-2558

Adaptive DFT-Based Interferometer Fringe Tracking, Edward Wilson, Ettore Pedretti, Jesse Bregman, Robert W. Mah, and Wesley A. Traub Volume 2005 (2005), Issue 15, Pages 2559-2572

Technique for Automated Recognition of Sunspots on Full-Disk Solar Images, S. Zharkov, V. Zharkova, S. Ipson, and A. Benkhalil Volume 2005 (2005), Issue 15, Pages 2573-2584

On-board Data Processing to Lower Bandwidth Requirements on an Infrared Astronomy Satellite: Case of Herschel-PACS Camera, Ahmed Nabil Belbachir, Horst Bischof, Roland Ottensamer, Franz Kerschbaum, and Christian Reimers Volume 2005 (2005), Issue 15, Pages 2585-2594 EURASIP Journal on Applied Signal Processing 2005:15, 2397–2399 c 2005 Hindawi Publishing Corporation

Editorial

Ercan E. Kuruoglu Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” (ISTI), Area della ricerca CNR di Pisa, via G. Moruzzi 1, 56124 Pisa, Italy Email: [email protected] Carlo Baccigalupi Scuola Internazionale Superiore di Studi Avanzati (SISSA/ISAS), via Beirut 4, 34014 Trieste, Italy Email: [email protected]

We live in an where the frontiers of our investigation formidable challenge for signal processing. We need state-of- and comprehension of fundamental physics depend largely the-art techniques that can analyse, summarise, and extract on the light coming from the sky, that is, on the study of the necessary information from this ocean of data. galactic and extra-galactic radiation. Watching the sky, in Tocontinue with the example above, the microwave sky is principle, we have access to the highest energies conceiv- dominated by the CMB radiation, but several processes con- able, generated by the laws of nature in extreme conditions, tribute to the total emission, coming for instance from all the such as nearby black holes or even close to the origin of processes occurring along the line of sight, such as the emis- the universe itself. For example, in the microwave band, sion from other galaxies or clusters of those, as well as from the extra-Galactic radiation is dominated by a markedly the diffuse gas in our own Galaxy. Each of these processes are isotropic component, obeying a black body spectrum char- most relevant in different contexts in astrophysics and cos- acterized by a temperature of about 2.726 . That is mology. Recently, the astrophysics field has benefited a great the relic of the Big Bang, originated just 300 000 af- deal from the rich research work going on source separation ter the initial starting point of the universe. This radiation, in the signal processing field. Source separation aims at the namely the cosmic microwave background (CMB) radia- recovery of the various different components from the multi- tion, today is the most important observable we have to ac- band observations exploiting the differences between them, cess the mysterious physics of the Big Bang itself. The lat- induced by their independent physical origins. ter is telling us about the unknown fundamental interac- Despite the mutual interest, the two disciplines suffer tions and particles, the physics of spacetime, and the na- from lack of a common publication ground, implying that ture of quantum gravity, and represents the only way to ad- the results produced in one of them are not immediately vis- dress those issues in physics today. Electronics hardware tech- ible in the other. The aim of the present issue is to provide nology has reached in these very recent years the capabil- a unified platform that would strengthen the bridge between ity to study the tiniest details of the CMB, carrying the im- signal processing and astrophysics and cosmology and enable age of the primordial stage of cosmic geometry, structure, the sharing of information. We would like to provide astro- and composition. Such a fantastic challenge is ongoing in physicists and cosmologists with a spectrum of the most ad- this very moment, while several CMB detectors are operating vanced signal processing techniques and the signal process- and advanced probes are being designed for the forthcoming ing community an exposure to various vital real problems in decades. analysing astrophysics data that await solution. Finally, our Many breakthroughs in physics are made possible by aim is to provide a reference for present and future literature, the use of the most advanced data analysis techniques. The in the widest possible context, accounting for various appli- present datasets obtained in astrophysical and cosmologi- cations and algorithms proposed. Indeed, as the reader may cal observations are huge, and cover the entire electromag- see, the topics we collected range from solar physics, thus on netic spectrum, dealing with very different processes, from the scale of , to the reconstructuon of the most ambi- gamma and X-rays of the high-energy astrophysics of com- tious signal from the Big Bang, with the reconstruction of pact stars or black holes, to the microwave and infrared emis- the CMB pattern on all sky. The methods presented in the sion from the whole large-scale universe. This variety of the issue range from transform domain analysis of such wavelets observational techniques and signals to deal with represents a to data mining techniques. 2398 EURASIP Journal on Applied Signal Processing

We start the special issue with four papers addressing the Lastly, to add a flavour of implementation issues of dig- problem of separating components in astrophysical radiation ital signal processing systems for astrophysical tasks we in- maps which is a very hot problem due to the recent availabil- cluded a work by Belbachir et al. who describe a DSP sys- ity of WMAP satellite radiation maps and the future Planck tem for infrared astronomy that implements a combination satellite mission. The first paper on the problem is by Bedini of lossy and lossless compression. et al. who consider the separation of components which are We would like to thank all of the authors who contributed mutually dependent, therefore differing from classical ICA to this issue for their very interesting work. We are very grate- approaches. Therefore the paper also houses novelties from a ful to the referees who gave their time and energy in the re- signal processing point of view. The second paper in the se- view process and contributed immensely to the value of this quel, by Forni and Aghanim, considers a more specific prob- issue. Finally, we would like to thank the Editor-in-Chief Dr. lem, namely the separation of kinetic SZ signal from pri- Moonen and the EURASIP JASP staff for helping us at every mary CMB fluctuations. The next paper, by Lopez-Caniego´ stage of the creation of this issue. We do hope that the reader et al., provides instead a technique based on Bayesian detec- will find the issue useful, interesting, and inspirational. tion theory for the separation of point sources from the rest of the astrophysical radiation map. Moudden et al., in con- Ercan E. Kuruoglu trast to these works, consider separation in the transform do- Carlo Baccigalupi main. In particular, they propose a new separation method in the wavelet space. A related paper, by Starck et al., again in the wavelet methodological frame follows: it analyses the spatial dis- Ercan E. Kuruoglu was born in Ankara, Turkey, in 1969. He obtained his B.S. and tribution of the galaxies by multiscale methods. Jin et al. ff M.S. degrees both in electrical and elec- also concentrate their e orts on the analysis of the statisti- tronics engineering from Bilkent University cal distribution of the CMB utilising wavelet transform, with in 1991 and 1993, respectively. He com- the particular aim of detecting non-Gaussianity. Continuing pleted his graduate studies with M.Phil. and with transform domain analysis techniques, closely related to Ph.D. degrees in information engineering waveletideas,ThiebautandRoquespresenttime-frequency at the Cambridge University, in the Signal analysis techniques applied to irregularly sampled astronom- Processing Laboratory, in 1995 and 1998, ical time series. respectively. Upon graduation from Cam- Astrophysical images are obtained by mechanisms far bridge, he joined the Xerox Research Center in Cambridge as a per- from perfect and the images are corrupted with noise and manent member of the Collaborative Multimedia Systems Group. are blurred by the cameras’ limited resolution and may be After two years in Xerox, he won an ERCIM fellowship which he spent in INRIA-Sophia Antipolis, France, and IEI CNR, Pisa, distorted by nonlinearities in the cameras. Lanteri´ and Theys Italy. In January 2002, he joined ISTI-CNR, Pisa, as a permanent provide a technique for the restoration of the images in the member. His research interests are in statistical signal processing case of Poisson data and additive Gaussian noise. and information and coding theory with applications in image Recent satellite missions have provided us with vast processing, astronomy, telecommunications, intelligent user inter- amounts of data. Therefore, it is of paramount importance faces, and bioinformatics. He is currently on the Editorial Board to build efficient indexing methods and equally design data of Digital Signal Processing and an Associate Editor for the IEEE mining methods to recover information from big databases. Transactions on Signal Processing. He was the Guest Editor for a To this end, Frailis et al. propose a multidimensional index- special issue of the Signal Processing Journal on “Signal Process- ing method. Again due to the vast amount of data available, ing with Heavy-tailed Distributions,” December 2002. He is the it is important to be able to store them in an economic way. Special Sessions Chair for EURASIP European Signal Processing Conference, EUSIPCO 2005, and is the Technical Chair for EU- Lastri et al. suggest a new compression technique which is SIPCO 2006. He is also a Member of the IEEE Technical Com- virtually lossless. mittee on Signal Processing Theory and Methods. He has more Bratsolis tackles the problem of classification of astro- than 50 publications and holds 5 US, European, and Japanese physical images and his paper gives a flavour of the ap- patents. plication of neural-network-based techniques to the prob- lem. Flitti et al. deal with the problem of 3D segmentation Carlo Baccigalupi is currently an Assistant and present a new technique which they apply on real data. Professor at SISSA/ISAS. He is a member Wilson et al. present the adaptive DFT-based interferometer of the Planck and EBEx cosmic microwave tracking algorithm they have designed. background (CMB) polarization experi- While most of the works presented here are on cosmo- ment. In Planck, he is leading the work- logical problems of dimensions, surely solar science, in which ing group on component separation, and in EBEx he is responsible for the control of the important developments take place, is also in the interest area foreground polarized contamination to the of the issue. In particular, Zharkov et al. present a detailed CMB radiation. He is the author of about 40 system for the automated recognition of sunspots. They use papers on refereed international scientific a combination of elaborate morphological operators which review, on topics ranging from the theory of gravity to CMB data make the paper interesting also from an image processing analysis. He is teaching linear cosmological perturbations and CMB point of view. anisotropies courses for the Astroparticle Ph.D. course at SISSA. Editorial 2399

He is involved in long-term international projects. The most im- portant ones are the Long-term Space Astrophysics funded by NASA for the duration of five years, on component separation on COBE, WMAP, and future CMB experiments, and a one- mer- cator professorship to be carried out in the University of Heidelberg in the academic year 2005/2006. EURASIP Journal on Applied Signal Processing 2005:15, 2400–2412 c 2005 Hindawi Publishing Corporation

Separation of Correlated Astrophysical Sources Using Multiple-Lag Data Covariance Matrices

L. Bedini Istituto di Scienza e Tecnologie dell’Informazione, CNR, Area della Ricerca di Pisa, via G. Moruzzi 1, 56124 Pisa, Italy Email: [email protected]

D. Herranz Istituto di Scienza e Tecnologie dell’Informazione, CNR, Area della Ricerca di Pisa, via G. Moruzzi 1, 56124 Pisa, Italy Email: [email protected]

E. Salerno Istituto di Scienza e Tecnologie dell’Informazione, CNR, Area della Ricerca di Pisa, via G. Moruzzi 1, 56124 Pisa, Italy Email: [email protected]

C. Baccigalupi International School for Advanced Studies, via Beirut 4, 34014 Trieste, Italy Email: [email protected]

E. E. Kuruoglu˘ Istituto di Scienza e Tecnologie dell’Informazione, CNR, Area della Ricerca di Pisa, via G. Moruzzi 1, 56124 Pisa, Italy Email: [email protected]

A. Tonazzini Istituto di Scienza e Tecnologie dell’Informazione, CNR, Area della Ricerca di Pisa, via G. Moruzzi 1, 56124 Pisa, Italy Email: [email protected]

Received 8 June 2004; Revised 18 October 2004

This paper proposes a new strategy to separate astrophysical sources that are mutually correlated. This strategy is based on second- order statistics and exploits prior information about the possible structure of the mixing matrix. Unlike ICA blind separation ap- proaches, where the sources are assumed mutually independent and no prior knowledge is assumed about the mixing matrix, our strategy allows the independence assumption to be relaxed and performs the separation of even significantly correlated sources. Besides the mixing matrix, our strategy is also capable to evaluate the source covariance functions at several lags. Moreover, once the mixing parameters have been identified, a simple deconvolution can be used to estimate the probability density functions of the source processes. To benchmark our algorithm, we used a database that simulates the one expected from the instruments that will operate onboard ESA’s Planck Surveyor Satellite to measure the CMB anisotropies all over the celestial sphere. Keywords and phrases: statistical, image processing, cosmic microwave background.

1. INTRODUCTION sion from galactic dust, the galactic synchrotron and free- free emissions. If one is only interested in estimating the Separating the individual radiations from the measured sig- CMB anisotropies, the interfering signals can just be treated nals is a common problem in astrophysical data analysis [1]. as noise, and reduced by suitable cancellation procedures. As an example, in cosmic microwave background anisotropy However, the foregrounds have an interest of their own, and surveys, the cosmological signal is normally combined with it could be useful to extract all of them from multichannel foreground radiations from both extragalactic and galactic data, by exploiting their different emission spectra. sources, such as the Sunyaev-Zeldovich effects from clusters Some authors [2, 3]havetriedtoextractanumberof of galaxies, the effect of the individual galaxies, the emis- individual radiation data from measurements on different Separation of Correlated Astrophysical Sources 2401 frequency channels, assuming that the physical mixture sumption, and to pursue identification by optimisation of model is perfectly known. Unfortunately, such an assump- a suitable function. A further advantage of this strategy is tion is rather unrealistic and could overconstrain the prob- that the relevant correlation coefficients between pairs of lem, thus leading to unphysical solutions. Attempts have sources can also be estimated. In our particular case, more- been made to avoid this shortcoming by introducing crite- over, being able to parametrise the mixing matrix allows us ria to evaluate a posteriori the closeness to reality of the mix- to substantially reduce the number of unknowns. This per- ture model and allowing individual sources to be split into mits to improve the performance of our technique. We will separate templates to take spatial parameter variability into show that a very fast model learning algorithm can be de- account [4, 5]. vised by matching the theoretical and the observed covari- A class of techniques capable of estimating the source sig- ance matrices, even if all the cross-covariances are nonnegli- nals as well as identifying the mixture model has recently gible. been proposed in astrophysics [6, 7, 8, 9]. In digital signal The paper is organised as follows. In Section 2,wefor- processing, these techniques are referred to as blind source malise the problem and introduce the relevant notation. separation (BSS) and rely on statistical assumptions on the In Section 3, we describe how the mixing matrix can be source signals. In particular, mutual independence and non- parametrised in our case. In Sections 4 and 5, we describe the Gaussianity of the source processes are often required [10]. methods we used to learn the mixing model and to estimate This totally blind approach, denoted as independent com- the original sources, respectively. In Section 6,wepresent ponent analysis (ICA), has already given promising results, some experimental results, with both stationary and nonsta- proving to be a valid alternative to assuming a known data tionary noises. In the final section, we give some remarks and model. On the other hand, most ICA algorithms do not per- future directions. mit to introduce prior information. Since all available in- formation should always be used, semiblind techniques are being studied to make astrophysical source separation more 2. PROBLEM STATEMENT flexible with respect to the specific knowledge often available As usual [2, 6],weassumethateachradiationprocess in this type of problem [11]. Moreover, the independence as- s˜c(ξ, η, ν) from the microwave sky has a spatial pattern sumption is not always justified; if there is evidence of cor- sc(ξ, η) that is independent of its frequency spectrum Fc(ν): relation between pairs of sources, it should be made possible to take this information into account, thus abandoning the strict ICA approach. s˜c(ξ, η, ν) = sc(ξ, η)Fc(ν). (1) The first blind technique proposed to solve the separation problem in astrophysics [6] was based on ICA, and allowed Here, ξ and η are angular coordinates on the celestial sphere, simultaneous model identification and signal estimation to and ν is frequency. The total radiation observed in a certain be performed. The independence requirement was fulfilled direction at a certain frequency is given by the sum of a num- by taking the statistics of all orders into account, as in all ICA ber N of signals (processes, or components) of the type (1), methods presented in the literature (see, e.g., [10, 12, 13]). where subscript c has the meaning of a process index. As- The problem of estimating all the model parameters and suming that the effects of the telescope beam on the angu- source signals cannot be solved by just using second-order lar resolution at different measurement channels have been statistics, since these are only able to enforce uncorrelation. equalised (see [16]), the observed signal at M different fre- However, this has been done in special cases, where ad- quencies can be modelled as ditional hypotheses on the spatial correlations or, equiva- lently, on the spectra of the individual signals are assumed [9, 14, 15]. As will be clear in the following, within the x(ξ, η) = As(ξ, η)+n(ξ, η), (2) framework of any noisy linear mixture model, the data co- variance matrix at a particular lag is related to the source where x ={xd, d = 1, ..., M} is the M-vector of the observa- covariance matrix at the same lag, the mixing matrix, and tions, d being a channel index, A is an M × N mixing matrix, the noise covariance matrix. If there is a sufficient num- s ={sc, c = 1, ..., N} is the N-vector of the individual source ber of lags for which the source covariance matrices are processes, and n ={nd, d = 1, ..., M} is the M-vector of in- not null, then it is possible to identify the model parame- strumental noise. The elements of A are related to the source ters by estimating the data covariance matrices from the ob- spectra and to the frequency responses through the following served data. Indeed, if we know the noise covariance matrix, formula: we are able to write a number of relationships from which the unknown parameters can be estimated. This is what is done by the second-order blind identification (SOBI) algo- adc = Fc(ν)bd(ν)dν,(3) rithmpresentedin[15]. SOBI, however, relies on joint di- agonalization of covariance matrices at different lags, which is only applicable in the case of uncorrelated source signals. where bd(ν) is the instrumental frequency response in the dth In our approach, we assumed that the mixing matrix can be measurement channel, which is normally known very well. If parametrised. This allows us to relax the independence as- we assume that the source spectra are constant within the 2402 EURASIP Journal on Applied Signal Processing passbands of the different channels, (3)canberewrittenas In the present approach, we found experimentally that, if a noise covariance map is known, even nonstationary noise can be treated. adc = Fc νd bd(ν)dν. (4)

Frequency-dependent telescope beams The element adc is thus proportional to the spectrum of the cth source at the center frequency νd of the dth channel. The model assumed in (2) is valid if the telescope radiation The separation problem consists in estimating the source patterns are the same in all the frequency channels. As the vector s from the observed vector x. Several estimation al- beams are frequency-dependent, a way to tackle the problem gorithms have been derived assuming a perfect knowledge is to preprocess the observed data in order to equalise the res- of the mixing matrix. As already said, however, this ma- olution on all the measurement channels, as in [16]. This also trix is related to both the instrumental frequency responses, changes the autocorrelation function of each noise process, ff which are known, and the emission spectra Fc(ν), which are but in a way that can be exactly evaluated. A di erent way to normally unknown. For this reason, relying on an assumed tackle the problem has been to approach it in the frequency mutual independence of the source processes sc(ξ, η), some domain [2, 9]. Also in these cases, the validity of the solution blind separation algorithms have been proposed [6, 7, 17], relies on a number of simplifiying assumptions, such as the which are able to estimate both the mixing matrix and the perfect circular symmetry of the telescope beams. Moreover, source vector. Assuming that the source signals are mutually the actual capability of extrapolating the spectrum at spatial independent, the MN mixing coefficients can be estimated frequencies where reduced information is available has still to by finding a linear mixture that, when applied to the data be assessed, especially in the cases where the signal-to-noise vector, nullifies the cross-cumulants of all orders. If, how- ratio is particularly low. ever, some prior information allows us to reduce the num- ber of unknowns, the identification problem can be solved Structure of the source covariance matrices by only using second-order statistics. This is the case with In the Planck experiment, the sources of interest are the our approach, which is based on a parametrisation of ma- CMB signal and the foregrounds. While no correlation is ex- trix A.Thisapproach,describedinSection 4, does not need a pected between the CMB signal and foregrounds, some sta- strict mutual independence assumption. Logically, any blind tistical dependence between pairs of foregrounds has to be separation algorithm is divided into two phases: using the taken into account. The off-diagonal entries of the source co- notation introduced here, the estimation of A will be re- variance matrices related to pairs of correlated sources will ferred to as system identification (or model learning), and the thus be nonzero, whereas all the remaining off-diagonal ele- estimation of s will be referred to as source separation.In ments will be zero. When it is known that some of the cross- this paper, we first address aspects related to learning, and covariances are close to zero, these can be kept fixed at zero, then give some details on source separation strategies de- thus further reducing the total number of unknowns. For in- rived from standard reconstruction procedures. Before de- stance, in a 3 × 3 case, if we assume the following structure scribing our algorithm in detail, we recall here some applica- for the source covariance matrix at zero shift: bility issues.   σ 00 Source and noise processes  11  Cs(0, 0) =  0 σ22 σ23 ,(5) To estimate the covariance matrices from the available data, 0 σ32 σ33 the source and the noise processes must necessarily be as- sumed stationary. While CMB satisfies this assumption, the foregrounds are not stationary all over the celestial sphere. this means that we assume zero or negligible correlations be- This assumption can be made for small sky patches. How- tween sources 1 and 2 and sources 1 and 3, and the remaining cross-covariance σ23 = σ32 betweensources2and3isanun- ever, depending on the particular sky scanning strategy, noise σ is normally nonstationary, even within small patches, and known of the problem, along with the autocovariances ii. can also be autocorrelated. The noise covariance function Note that, for the typical scaling ambiguity of the blind iden- tification problem, the absolute values of both the diagonal should be known for any shift and for any angular coordi- ff τ ψ nate in the celestial sphere. Provided that the noise nonsta- and o -diagonal elements of matrices Cs( , )havenophys- tionarity and cross-correlation between sources can be ne- ical significance, while, by calculating ratios of the type glected, various methods are available, both in space and fre- 2 quency domains, to estimate samples of the noise covari- σij ance function or, equivalently, of noise spectrum [9]. Tack- ,(6) σiiσjj ling the space-variant nature of the noise process is difficult, and no simple method has been proposed so far to this pur- pose. In [11] the noise variance at each pixel is assumed to be we can actually estimate the correlation coefficients between known and a method is proposed to estimate the mixing ma- different sources, whatever the values of the individual co- trix and the probability density function of each component. variances. Separation of Correlated Astrophysical Sources 2403

3. PARAMETRISATION OF THE MIXING MATRIX 4. A SECOND-ORDER IDENTIFICATION ALGORITHM While in a general source separation problem the elements Let us consider the source and noise signals in (2) as realisa- adc are totally unknown, in our case we have some knowl- tions of two stationary vector random processes. The covari- edge about them. In fact, the integral in (4) is related to ance matrices of these processes are, respectively, known instrumental features and to the emission spectra of the single source processes, on which we do have some T Cs(τ, ψ) = s(ξ, η) − µs s(ξ + τ, η + ψ) − µs , knowledge. As an example, if the observations are made in (10) T the microwave and millimeter-wave range, the dominant ra- Cn(τ, ψ) = n(ξ, η) − µn n(ξ + τ, η + ψ) − µn , diations are the cosmic microwave background, the galactic dust, the free-free emission and the synchrotron (see [18]). where · denotes expectation under the appropriate joint Another significant signal comes from the extragalactic point probability, µs and µn are the mean vectors of processes s and sources. It is not possible to treat the point sources as a sin- n, respectively, and the superscript T means transposition. gle signal to be separated from the others on the basis of its As usual, the noise process is assumed signal-independent, emission spectrum, since each source has its own spectrum. white, and zero-mean, with known variances. Thus, for both Since the brightest point sources are the ones that affect more τ and ψ equal to zero, Cn is a known diagonal matrix whose strongly the study of the CMB [19], the usual approach is to elements are the noise variances in all the measurement remove them from the data before separating the other fore- channels, whereas for any τ or ψ different from zero Cn is grounds. Bright resolved point sources can be removed by us- the null M × M matrix. ing some of the specific techniques proposed in the literature As already proved [15, 22], covariance matrices, that [19, 20, 21]. Faint unresolved point sources are usually con- is, second-order statistics, permit blind separation to be sidered as an additional noise term in (2)(referredtoas“con- achieved when the sources show a spatial structure, namely, fusion noise” in the radio astronomy literature). For simplic- when they are spatially correlated. Thus, the mutual indepen- ity, we will not consider extragalactic point sources in our test dence requirement of ICA can be replaced by an equivalent examples. Moreover, although other sources (such as SZ and requirement on the spatial structure of the signal, and the free-free) could be taken into account, in our experiments identifiability of the system is assured. In other words, find- we only considered the synchrotron and dust foregrounds, ing matrices A and Cs is generally not possible from covari- which are the most significant in the Planck frequency range. ances at zero shift alone; to identify the mixing operator, ei- The emission spectrum of the cosmic microwave back- ther higher-order statistics or the covariance matrices at sev- ground is perfectly known, being a blackbody radiation. In eral nonzero shift pairs (τ, ψ) must be taken into account. Of terms of antenna temperature, it is course, this is also a requirement on the sources, since if the τ ψ ν˜2 exp(ν˜) covariance matrices are null for any pair ( , ), identification Fcmb(ν) = ,(7)is not possible. This aspect will become clearer below. ν − 2 exp(˜) 1 Let us now see our approach to system identification. By where ν˜ is the frequency in GHz divided by 56.8. From (4) exploiting (2), the covariance of the observed data can be and (7), the column of A related to the CMB radiation is thus written as known up to an unessential scale factor. For the synchrotron T radiation, we have Cx(τ, ψ) = x(ξ, η) − µx x(ξ + τ, η + ψ) − µx (11) T −ns = s τ ψ n τ ψ Fsyn(ν) ∝ ν . (8) AC ( , )A + C ( , ),

Thus, the column of A related to synchrotron only depends where Cx(τ, ψ)can be estimated from on a scale factor and the spectral index ns. For the thermal galactic dust, we have 1 T Cx(τ, ψ) = x(ξ, η) − µx x(ξ + τ, η + ψ) − µx , Np ν¯m+1 ξ,η F (ν) ∝ ,(9) dust exp(ν¯) − 1 (12) where ν¯ = hν/kTdust, h is the Planck constant, k is the Boltz- where Np is the number of pixels. Equation (11)providesa mann constant, and Tdust is the physical dust temperature. If number of independent nonlinear relationships that can be we assume a uniform temperature value, the frequency law used to estimate both A and Cs. Obviously, this possibility (9), that is, the column of A related to dust emission, only does not rely on mutual independence between the source depends on a scale factor and the parameter m. signals, as required by the ICA approach: the only require- The above properties enable us to describe the mixing ment is having a sufficient number of nonzero covariance matrix by means of just a few parameters. As an example, if matrices. In other words, spatial structure can be used in the we assume to have a perfectly known source spectrum (such place of mutual independence as a basis for model learning as the one of CMB) and N − 1 sources with one-parameter and signal separation. As assumed in the previous section, spectra, the number of unknowns in the identification prob- in this particular application the number of unknowns is re- lem is N − 1 instead of NM. duced by parametrising the mixing matrix. This allows us to 2404 EURASIP Journal on Applied Signal Processing solve the identification problem from the relationships made lated, the unknowns to be determined are 4 + 5 + Ns · 6, by available by (11) by only using the zero-shift covariance ma- using a maximum of M(M +1)/2+Ns · M2 equations. This trix, even if some of the sources are cross-correlated. We in- means that in this case, as soon as M = 4, the number of in- vestigated this possibility in [23]. In a general case, matrices dependent equations is larger than the number of unknowns A and Cs(τ, ψ)canbeestimatedfrom even for Ns = 0. Γ, Σ(:, :) = arg min A(Γ)Cs Σ(τ, ψ) AT(Γ) τ,ψ 5. SIGNAL SEPARATION STRATEGY (13) − Cx(τ, ψ) − Cn(τ, ψ). Model learning is only the first step in solving the problem of source separation. Although, in principle, one could sim- The minimisation is performed over vectors Γ and Σ(:, :), ply use multichannel inverse filtering to recover the source where Γ is the vector of all the parameters defining A (pos- maps, this approach is not feasible in practice, for the pres- sibly consisting of all the matrix elements), and Σ(:, :) is the ence of noise. In our treatment, the data are assumed to be an ergodic process, in order to be able to evaluate its statistics vector containing all the unknown elements of matrices Cs for every shift pair. The matrix norm adopted is the Frobe- from the available sample. This entails a space invariant noise nius norm. Our present strategy to find the minimiser in (13) process. The estimation of the individual source maps should is to perform a stochastic minimisation in Γ, considering that be made on the basis of all the products of the learning stage. In our case, we have estimates of the mixing matrix and of Cs(Σ(τ, ψ)), for each (τ, ψ), can be calculated exactly once A(Γ) is fixed. A more accurate minimisation strategy is now the source covariance matrices at several shift pairs. In the being studied. hypothesis of stationary noise, we could exploit this infor- From the above scheme, it is clear that for each indepen- mation to implement a multichannel Wiener filter for source reconstruction. If the noise is not stationary, a generalized dent element of the matrices Cx(τ, ψ)wehaveanindepen- dent equation for the estimation of vector Γ and of all the vec- Kalman filter should be used. Our point here is on model learning, and thus we do not address the separation issues in tors Σ(τ, ψ). Since for (τ, ψ) = (0, 0) matrix Cx is symmetric, for zero shift we have M(M +1)/2 independent equations. detail. We only observe that a possible Bayesian separation scheme would make use of the source probability densities, For any other shift pair, Cx is a general matrix and thus, pro- videdthatitisnotzero,wehaveM2 additional independent and these can be estimated from our mixing matrix. Indeed, let us assume that our learning procedure has given a good equations. If Ns is the total number of nonzero shift pairs generating nonzero data covariance matrices, we thus have a estimate of A.LetB be its Moore-Penrose generalised inverse. In our case, we have M ≥ N, thus, as is known, total number of M(M+1)/2+Ns ·M2 = M[(2Ns +1)M+1]/2 independent equations. The number of unknowns is at most = T −1 T. NM+N(N +1)/2+Ns ·N2, in the case where all the elements B A A A (14) of A are unknown and all the source covariance matrices are full, that is, all the sources at any shift are correlated to each From (2), we have other. Note that, in this worst case situation, if it is M = N, N2 we always have more unknowns than equations, indepen- Bx = s + Bn. (15) dently of Ns.AssoonaswehaveM>N, there are always a number of nonzero shift pairs for which we have more in- LetusdenoteeachoftheN rows of B as an M-vector bi, dependent equations than unknowns to be estimated. This i = 1, ..., N, and consider the generic element yi of the N- observation gives an idea of the amount of information we vector Bx, have available for our estimation problem. The number of independent equations affects the behaviour of the nonlin- y = T · = s T · = s n . ear optimization landscape in (13). Qualitatively, we can af- i : bi x i + bi n : i + ti (16) firm that the more independent equations we have, the more well-posed the optimization problem will be. In particular, The probability density function of yi, p(yi), can be esti- it is likely that, in absence of any prior information about mated from bi and the data record x(ξ, η), while the prob- n p n the structure of A and Cs(τ, ψ), having a number of observed ability density function of ti , ( ti ), is a Gaussian, whose channels equal to the number of sources always leads to in- parameters can be easily derived from Cn and bi. The pdf of y p s p n sufficient information, independently of the number of shift i is the convolution between ( i)and ( ti ): pairs chosen. If, instead, the number of the available obser- p y = p s ∗ p n . vations is larger than the number of sources, the possibility i i ti (17) of estimating the unknowns relies on the number of shift pairs for which the data covariance matrices are nonzero. The From this relationship, p(si)canbeestimatedbydeconvo- availability of prior information, as in the application con- lution. As is well known, deconvolution is normally an ill- sidered here, can of course alleviate these requirements. For posed problem and, as such, it lacks a stable solution. In our example, if we have a 4 × 4 mixing matrix only depending case, we can regularise it by enforcing smoothness, positivity, on four parameters and only two sources significantly corre- and the normalisation condition for pdfs. Separation of Correlated Astrophysical Sources 2405

Any Bayesian estimation approach should exploit the The results from learning are the mixing matrix and the knowledge of the source densities to regularise the solution, source covariance matrices at the shift pairs chosen. From but these are normally unknown. In the case examined here, the estimate of the mixing matrix, it is also possible to de- the source distributions can be efficiently estimated as sum- rive the marginal source densities, by using relationships (16) marised above, and the computational cost of otherwise ex- and (17). As already mentioned, the estimates of the mix- pensive Bayesian algorithms can be reduced. As an exam- ing matrix and of the source covariance matrices are very ple, in [11], the source densities are modelled as mixtures robust against noise. Conversely, the estimates of the source of Gaussians, and the related parameters are estimated by distributions by means of (16)and(17) are more sensitive an independent factor analysis approach (see [24, 25]). The to noise. To obtain satisfactory results, it is necessary to rely method we propose here could well be used to fix the source on regularization methods; the choice of regularization pa- densities, thus reducing the overall cost of the identification- rameters, however, is known to be critical. In our case, we separation task. selected them empirically, by checking the smoothness of the From (15), it can be seen that the generalised inverse so- solutions. lution is already an estimate of the sources, since it is com- Our separation results are all derived from the applica- posed of the original source vectors corrupted by amplified tion of the Moore-Penrose pseudoinverse of the estimated noise. Thus, a simple source estimation strategy could be first mixing matrix, followed by a classical Wiener filtering on to apply (15) and then to reduce the influence of noise by fil- each output image. From this processing, estimates of the tering the result. In next section, we show some experimental source maps are obtained. Also, estimated source power results obtained by pseudoinversion of the estimated mix- spectra can be obtained from either the maps or the source ing matrix, followed by Wiener filtering of each individual autocorrelation matrices. In particular, the results we show source. This strategy would be strictly valid with stationary here are derived from the unfiltered pseudoinverse solutions, noise and high signal-to-noise ratio, however, interesting re- showing that, although the reconstructed images are heavily sults have been found even with strong nonstationary noise. affected by noise, the derived power spectra can be corrected Multichannel Wiener filtering for stationary noise and an ex- for the theoretical noise spectrum and thus estimated quite tended Kalman filter for the nonstationary case are now be- accurately. ing developed. The results presented here will all be related to a single data record, derived from a simulated 15◦ × 15◦ sky patch centered at 40◦ galactic longitude and 0◦ galactic latitude. 6. EXPERIMENTAL RESULTS It is to be noted that in such a patch, located on the galac- In this section, we present some results from our extensive tic plane, the measured data will be affected by strong fore- experimentation with the method described above. Our data ground interference, thus making the problem very difficult were drawn from a data set that simulates the one expected to solve. Indeed, many separation approaches experimented from Planck (see the Planck homepage).1 The source maps so far simply fail in proximity of the galactic plane, and they we considered were the CMB anisotropy, the galactic syn- are normally applied after masking the all-sky data in the chrotron, and thermal dust emissions over the four mea- high-interference regions. Here, the dust emission is stronger surement channels centred at 30 GHz, 44 GHz, 70 GHz, and than CMB, and separation is strictly necessary if CMB is to be 100 GHz. The test data maps have been generated by ex- distinguished from the foregrounds. Our method performed tracting several sky patches at different galactic coordinates very well with these data, and all the relevant parameters were from the simulated database, scaling them exactly accord- satisfactorily estimated even with the strongest noise com- ing to formulas (7), (8), and (9), generating the mixtures ponents. The noise standard deviation we adopted in the for the channels chosen, and adding realisations of Gaussian, case shown here is 30% the standard deviation of CMB at signal-independent, white noise. Several noise levels have 100 GHz. The noise level in the other channels has been sim- been used, from a ten percent to more than one hundred per- ply obtained by scaling the level at 100 GHz in accordance cent of the CMB standard deviation. The range chosen con- with the expected Planck sensitivity at those frequencies. For tains noise levels within the Planck specifications. Although each patch considered, we tried different noise levels, up to our method would be only suited for uniform noise, we also more than 100% of the CMB level at 100 GHz, and for each tried to apply it to data corrupted by nonuniform noise, and noise level, we performed a Monte Carlo simulation with obtained promising results. hundreds of different noise realizations. This analysis is not Within this section, we will divide the results obtained in reported in detail here, but we can say that no significant bias model learning from the results in separation, and the cases has been found in the results. with stationary noise from those with nonstationary noise. It is to remark that, at high galactic latitudes, the CMB ra- In these latter cases, knowledge of a noise variance map is diation is dominant at our frequencies, and the foregrounds assumed, and the additional problem arises of choosing the are well below the noise level assumed in our experiments. appropriate noise covariance matrix. Thus, the CMB is almost the only measured radiation, and is estimated very well with all the assigned signal-to-noise ratios. Conversely, as expected with these noise levels, the 1http://astro.estec.esa.nl/SA-general/Projects/Planck/. foregrounds cannot be estimated correctly. Assuming much 2406 EURASIP Journal on Applied Signal Processing

(a) (b) (c)

Figure 1: Source maps from a 15◦ × 15◦ patchcenteredat0◦ galactic latitude and 40◦ galactic longitude, at 100 GHz: (a) CMB; (b) synchrotron; (c) thermal dust. lower noise levels, our method, as other techniques such as in Figure 3. The typical elapsed times per run were a few min- ICA (see [6]), allows the foregrounds to be estimated satis- utes on a 2 GHz CPU computer, with a Matlab interpreted factorily. code. In the case described here, we estimated ns = 2.8985 In Figure 1, we show the three source maps we used in the and m = 1.7957, corresponding to the mixing matrix situation described above. In this figure and in all the others   shown here, the grayscale is linear with black correspond- 111   ing to the maximum image value. We assigned the sources 1.1353 2.8118 0.5494 A =   . (20) s1 to CMB, s2 to synchrotron, and s3 to dust, and the sig- 1.2241 10.8009 0.2473 nals x1, x2, x3,andx4 to the measurement channels at 100, 1.2570 32.7775 0.1267 70, 44, and 30 GHz, respectively. Therefore, the first, second, and third columns of the mixing matrix will be related to As a quality index for our estimation, we adopted the matrix T −1 −1 T −1 CMB, synchrotron, and dust, respectively, and the first, sec- Q = (A Cn A) (A Cn Ao), which, in the ideal case, should ond, third, and fourth rows of the mixing matrix will be re- be the N × N identity matrix I. In the present case, we have lated to the 100 GHz, 70 GHz, 44 GHz, and 30 GHz channels,   respectively. The mixing matrix, Ao,hasbeenderivedfrom . − . − . 1 0000 0 0074 0 0013 (7), (8), and (9) with spectral indices ns = 2.9andm = 1.8 Q = 0.0000 1.0020 0.0000  . (21) (see, e.g., [26, 27]): 0.0000 0.0054 1.0013   111 −   The Frobenius norm of matrix Q I should be zero in the 1.1353 2.8133 0.5485 case of perfect model learning. In this case, it is 0.0096. Ao =   . (18) 1.2241 10.8140 0.2464 These results have been found by considering 25 uni- 1.2570 32.8359 0.1260 formly distributed shift pairs, with 0 ≤ τ ≤ 20 and 0 ≤ ψ ≤ 20. As a synthetic index for the quality of the reconstructed In Figure 2, we show the data maps for stationary noise. source covariance matrices, we adopted a matrix E,where Also, note that the case examined does not fit the ICA as- each element is the relative error in the same covariance ele- sumptions. For example, the normalized source covariance ment, averaged over all the pairs (τ, ψ): matrix at zero shift is   τ ψ − τ ψ . . . 1 Csi,j (, ) Csi,j( , ) 1 0000 0 1961 0 0985 Ei,j = , (22)   Ns +1 C i j (τ, ψ) Cs(0, 0) = 0.1961 1.0000 0.6495 , (19) τ,ψ s , 0.0985 0.6495 1.0000 where Cs are the estimated source covariance matrices. Of where a significant correlation, of the order of 65%, can be course, matrix (22) is only defined when all the denomina- observed between the dust and synchrotron maps. tors are nonzero. A more accurate analysis of the results can For the data described above, we ran our learning al- be made from the element-by-element comparison of the es- gorithm for 500 different noise realisations; for each run, timated and the original matrices, but we do not report these 10 000 iterations of the minimisation procedure described results here. For the case shown above, we have in the previous section were performed. The unknown pa-   ns m . . . rameters were the spectral indices and , and all the el- 0 0274 0 0392 0 0496 ements of matrices Cs(τ, ψ).Thecostdefinedin(13), as a E = 0.0472 0.0170 0.0120 . (23) function of the iteration number in a particular run, is shown 0.0917 0.0125 0.0050 Separation of Correlated Astrophysical Sources 2407

(a) (b)

(c) (d)

Figure 2: Noisy data maps at (a) 100 GHz; (b) 70 GHz; (c) 44 GHz; (d) 30 GHz.

×10−3 uate more quantitatively the results of the whole learning- . 1 8 separation procedure, we compared the power spectrum of 1.6 the CMB map with the one of the reconstructed map. This comparison is shown in Figure 6, where we also show the 1.4 possibility of correcting the reconstructed spectrum for the . n 1 2 known theoretical spectrum of the noise component t1 ,ob- 1 tained as in (16). As can be seen, the reconstructed spectrum is very similar to the original within a multipole l = 2000. 0.8 Strictly speaking, our algorithm could not be applied 0.6 to nonstationary processes. However, let us assume that the original sources are stationary, and the noise is nonstationary . 0 4 but still spatially white and uncorrelated. This means that its 0.2 pixel-dependent covariance matrices, Rn(τ, ψ; ξ, η), are zero τ ψ 0 for any nonzero shift pair ( , ). We tried our method on 0 2000 4000 6000 8000 10000 nonstationary data, by assuming to know Rn(0, 0; ξ, η), and using a constant covariance matrix given by Figure 3: Norm of the residual in (13) as a function of the iteration number. 1 Cn(0, 0) = Rn(0, 0; ξ, η). (24) Np ξ,η

The reconstructed probability density functions of the The nonstationary data were obtained from a spatial tem- source processes, estimated from (16)and(17), are shown plate of noise standard deviations expected for typical Planck in Figure 4. observations, shown in Figure 7. The actual standard devi- We separated the sources by multiplying the data ma- ations were adjusted so as to obtain the average signal-to- trix by the Moore-Penrose generalised inverse, as in (15), and noise ratios desired for the different channels. The separa- then by applying a Wiener filter to the results thus obtained. tion results for a case where these SNRs were the same as in As already said, this is not the best choice reconstruction al- the above stationary case are shown in Figure 8, where the gorithm at all, especially when the data are particularly noisy degradation in the reconstruction is apparent in the regions and the noise is not stationary. However, the results we ob- where the noise is stronger. The results, in terms of recon- tained are visually very good, as shown in Figure 5.Toeval- tructed power spectra, are perfectly comparable to the ones 2408 EURASIP Journal on Applied Signal Processing

0.07 0.035

0.06 0.03

0.05 0.025

0.04 0.02

0.03 0.015

0.02 0.01

0.01 0.005

0 0 −1 −0.500.51 −0.100.10.20.3

Real source density functions Real source density functions Estimated source density functions Estimated source density functions

(a) (b)

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 −2 −10 1 2

Real source density functions Estimated source density functions

(c)

Figure 4: Real (dotted) and estimated (solid) source density functions for (a) CMB, (b) synchrotron, and (c) dust.

(a) (b) (c)

Figure 5: Wiener-filtered estimated maps: (a) CMB; (b) synchrotron; (c) dust. Separation of Correlated Astrophysical Sources 2409

0.1 0.05 0.09 0.04 0.08 0.07 0.03 0.06 ∗ l ∗ l . . C 0 02 C 0 05 0.04 0.01 0.03 0.02 0 0.01 0 −0.01 0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 2500 Multipole l Multipole l

Estimated CMB power spectra Estimated CMB power spectra Real CMB power spectra Real CMB power spectra Theoretical power spectrum

(a) (b)

Figure 6: (a) Real (dotted) and estimated (solid) CMB power spectra. The dashed line represents the theoretical power spectrum of the n noise component t1 in (16), evaluated from the noise covariance and the Moore-Penrose pseudoinverse of the estimated mixing matrix. (b) Real (dotted) and estimated (solid) CMB power spectra, corrected for theoretical noise.

1 The Frobenius norm of matrix Q − I is now 0.0736, that is, 0.9 slightly worse than for the above stationary case.

0.8 7. CONCLUDING REMARKS 0.7 By exploiting the spatial structure of the sources, we devel- 0.6 oped an identification and separation algorithm that is able 0.5 to exploit any available information on possible structure of the mixing matrix and the source covariance matrices. This . 0 4 can include the fully blind approach and the case exempli- 0.3 fied here, where the mixing matrix is known to only depend on two parameters. The identification task is performed by 0.2 a simple optimization strategy, while the proper separation can be faced by different approaches. We experimented the Figure 7: Map of noise standard deviations used to generate simplest one, but we are also developing more accurate tech- nonstationary data. niques, especially suited to treat nonstationary noise on the data. Our method is suitable to work directly with all-sky exemplified in Figure 6. The estimated spectral indices were maps, but it could be necessary to apply it to small patches, ns = 2.8885 and m = 1.7881, corresponding to the mixing as is shown in the above experimental section, to cope with matrix the expected variability of the spectral indices and the noise   variances in different sky regions. 111   It has been observed that it does not make sense to try 1.1353 2.8018 0.5509 source separation in those regions where the foreground A =   . (25) 1.2241 10.7128 0.2488 emissions are much smaller than CMB and well below the 1.2570 32.3861 0.1279 noise level. In any case, the CMB angular power spectrum has always been estimated fairly well up to a multipole l = Theaverageerroroncovariancematricesisinthiscase 2000, irrespective of the galactic latitude. The estimation   of the source densities has also given good results. Source . . . separation by our method has been particularly interesting 0 0158 0 1165 0 1930 E = 0.1163 0.0331 0.0254 . (26) with data from low galactic latitudes, where the foreground 0.2440 0.0261 0.0144 variance is often higher than the one of the CMB signal. 2410 EURASIP Journal on Applied Signal Processing

(a) (b) (c) Figure 8: Wiener-filtered estimated maps from nonstationary data: (a) CMB; (b) synchrotron; (c) dust.

Note that many separation strategies, both blind and non- Franc¸ois Bouchet (IAP) for setting up and distributing the blind, have failed their goal in this region of the celestial database. Extensive use of the HEALPix scheme (Hierarchi- sphere. As an example, WMAP data analysis (see [28]) was cal, Equal Area and iso-Latitude Pixelisation of the sphere, often performed by using pixel intensity masks that exclude http://www.eso.org/science/healpix), by Krysztof M. Gorski´ the brightest sky portion from being considered. Another et al., has been made throughout this work. interesting feature of our method is that significant cross- correlations between pairs of foregrounds can be straigh- REFERENCES forwardly taken into account. Recently, some methods for a completely blind separation of correlated sources have been [1] M. Tegmark, D. J. Eisenstein, W. Hu, and A. de Oliveira-Costa, proposed in the literature (see, e.g., [29]). Their effective- “Foregrounds and forecasts for the cosmic microwave back- ground,” Astrophysical Journal, vol. 530, no. 1, pp. 133–165, ness in astrophysical map separation has not been proved yet. 2000. Moreover, they have a high computational complexity. [2] M. P. Hobson, A. W. Jones, A. N. Lasenby, and F. R. Bouchet, Recently [9], a frequency-domain implementation of the “Foreground separation methods for satellite observations of method in [15] has been proposed. This method allows to the cosmic microwave background,” Monthly Notices of the take antenna beam effects into account straightforwardly by Royal Astronomical Society, vol. 300, no. 1, pp. 1–29, 1998. including the effect of the antenna transfer functions in the [3] F. R. Bouchet, S. Prunet, and S. K. Sethi, “Multifrequency model. It also permits to introduce prior information about Wiener filtering of cosmic microwave background data with polarization,” Monthly Notices of the Royal Astronomical Soci- the entries of the mixing matrix and the spatial power spec- ety, vol. 302, no. 4, pp. 663–676, 1999. tra of the components. An open problem is the extension of [4]A.W.Jones,M.P.Hobson,P.Mukherjee,andA.N.Lasenby, these methods to the case of correlated sources. A possible ex- “The effect of a spatially varying Galactic spectral index on the tended method might be implemented in the space or in the maximum entropy reconstruction of Planck Surveyor satellite frequency domain according to convenience. Another prob- data,” Astrophysical Letters & Communications,vol.37,no.3- lem that is still open with the expected Planck data is the dif- 6, pp. 369–375, 2000. ferent resolution of the data maps in some of the measure- [5]R.B.Barreiro,M.P.Hobson,A.J.Banday,etal.,“Foreground separation using a flexible maximum-entropy algorithm: an ment channels. The identification part of our method can application to COBE data,” Monthly Notices of the Royal As- work with maps whose resolution has been degraded in or- tronomical Society, vol. 351, no. 2, pp. 515–540, 2004. der to be the same in all the channels. The result would be [6] C. Baccigalupi, L. Bedini, C. Burigana, et al., “Neural net- an estimate of the mixing matrix, which can be used in any works and the separation of cosmic microwave background nonblind separation approach with channel-dependent reso- and astrophysical signals in sky maps,” Monthly Notices of the lution, such as maximum entropy [2]. However, the possible Royal Astronomical Society, vol. 318, no. 3, pp. 769–780, 2000. [7] D. Maino, A. Farusi, C. Baccigalupi, et al., “All-sky astrophysi- asymmetry of the telescope beam patterns should be taken cal component separation with Fast Independent Component into account in verifying this possibility. Analysis (fastica),” Monthly Notices of the Royal Astronomical Society, vol. 334, no. 1, pp. 53–68, 2002. ACKNOWLEDGMENTS [8] C. Baccigalupi, F. Perrotta, G. De Zotti, et al., “Extracting cos- mic microwave background polarization from satellite astro- This work has been partially supported by the Italian Space physical maps,” Monthly Notices of the Royal Astronomical So- Agency, under Contract ASI/CNR 1R/073/01. D. Herranz ciety, vol. 354, no. 1, pp. 55–70, 2004. is supported by the European Community’s Human Po- [9] J. Delabrouille, J.-F. Cardoso, and G. Patanchon, “Multidetec- tential Programme under Contract HPRN-CT-2000-00124 tor multicomponent spectral matching and applications for CMBNET. The authors adopted the simulated sky tem- cosmic microwave background data analysis,” Monthly Notices of the Royal Astronomical Society, vol. 346, no. 4, pp. 1089– plates provided by the Planck Technical Working Group ff 1102, 2002. 2.1 (di use component separation). In particular, the au- [10] A. Hyvarinen¨ and E. Oja, “Independent component analysis: thors are grateful to Martin Reinecke (MPA), Vlad Stol- algorithms and applications,” Neural Networks, vol. 13, no. 4- yarov (Cambridge), Andrea Moneti, Simon Prunet, and 5, pp. 411–430, 2000. Separation of Correlated Astrophysical Sources 2411

[11] E. E. Kuruoglu,˘ L. Bedini, M. T. Paratore, E. Salerno, and A. [29] A. K. Barros, “Dependent component analysis,” in Advances in Tonazzini, “Source separation in astrophysical maps using in- Independent Component Analysis, M. Girolami, Ed., Springer, dependent factor analysis,” Neural Networks, vol. 16, no. 3-4, New York, NY, USA, July 2000. pp. 479–491, 2003. [12] P. Comon, “Independent component analysis, a new con- cept?” Signal Processing, vol. 36, no. 3, pp. 287–314, 1994. L. Bedini graduated cum laude in elec- [13] J.-F. Cardoso, “Blind signal separation: statistical principles,” tronic engineering from the University of Proc. IEEE, vol. 86, no. 10, pp. 2009–2025, 1998. Pisa, Italy, in 1968. Since 1970, he has been [14] L. Tong, R.-W. Liu, V. C. Soon, and Y.-F. Huang, “Indetermi- a researcher of the Italian National Re- nacy and identifiability of blind identification,” IEEE Trans. search Council, Istituto di Scienza e Tec- Circuits Syst., vol. 38, no. 5, pp. 499–509, 1991. nologie dell’Informazione, Pisa, Italy. His [15] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. interests have been in modelling, identifica- Moulines, “A blind source separation technique using second- tion, and parameter estimation of biologi- order statistics,” IEEE Trans. Signal Processing, vol. 45, no. 2, cal systems applied to noninvasive diagnos- pp. 434–444, 1997. tic techniques. At present, his research inter- [16] E. Salerno, C. Baccigalupi, L. Bedini, et al., “Independent est is in the field of digital signal processing, image reconstruction, component analysis approach to detect the cosmic microwave and neural networks applied to image processing. He is a coauthor background radiation from satellite measurements,” Tech. of more than 80 scientific papers. From 1971 to 1989, he was an Rep. B4-04, IEI-CNR, Pisa, Italy, 2000. Associate Professor of system theory at the Computer Science De- [17] G. Patanchon, H. Snoussi, J.-F. Cardoso, and J. Delabrouille, “Component separation for Cosmic Microwave Background partment, University of Pisa, Italy. data: a blind approach based on spectral diversity,” in Proc. D. Herranz received the B.S. degree in 1995 3rd Workshop on Physics in Signal and Image Processing (PSIP ’03), pp. 17–20, Grenoble, France, January 2003. and the M.S. degree in 1995 from the Uni- [18] G. De Zotti, L. Toffolatti, F. Argueso,¨ et al., “The planck versidad Complutense de Madrid, Madrid, surveyor mission: astrophysical prospects,” in 3K Cosmology, Spain, and the Ph.D. degree in astrophysics Proc. EC-TMR Conference, vol. 476, pp. 204–204, American from Universidad de Cantabria, Santander, Institute of Physics, Rome, Italy, October 1999. Spain, in 2002. He was a CMBNET Post- [19] P. Vielva, E. Mart´ınez-Gonzalez,´ L. Cayon,J.M.Diego,J.L.´ doctoral Fellow at the Istituto di Scienza Sanz, and L. Toffolatti, “Predicted Planck extragalactic point- e Tecnologie dell’Informazione “A. Faedo” source catalogue,” Monthly Notices of the Royal Astronomical (CNR), Pisa, Italy, from 2002 to 2004. He Society, vol. 326, no. 1, pp. 181–191, 2001. is currently at the Instituto de Fisica de [20] L. Tenorio, A. H. Jaffe, S. Hanany, and C. H. Lineweaver, “Ap- Cantabria, Santander, Spain, under an MEC Juan de la Cierva con- plications of wavelets to the analysis of cosmic microwave tract. His research interests are in the areas of cosmic microwave background maps,” Monthly Notices of the Royal Astronomi- background astronomy and extragalactic point source statistics cal Society, vol. 310, no. 3, pp. 823–834, 1999. as well as the application of statistical signal processing to astro- [21] L. Cayon,J.L.Sanz,R.B.Barreiro,etal.,“Isotropicwavelets:a´ nomical data, including blind source separation, linear and non- powerful tool to extract point sources from cosmic microwave linear data filtering, and statistical modeling of heavy-tailed pro- background maps,” Monthly Notices of the Royal Astronomical cesses. Society, vol. 315, no. 4, pp. 757–761, 2000. [22] A. K. Barros and A. Cichocki, “Extraction of specific signals E. Salerno graduated in electronic engineering from the Univer- with temporal structure,” Neural Computation,vol.13,no.9, sity of Pisa, Italy, in 1985. In September 1987, he joined the Ital- pp. 1995–2004, 2001. ian National Research Council (CNR) as a permanent researcher. [23] L. Bedini, S. Bottini, C. Baccigalupi, et al., “A semi-blind He is now with the Institute of Information Science and Technolo- approach for statistical source separation in astrophysical gies (ISTI), Signals and Images Laboratory, Pisa. His scientific in- maps,” Tech. Rep. ISTI-2003-TR-35, ISTI-CNR, Pisa, Italy, terests are in applied inverse problems, image reconstruction and 2003. restoration, nondestructive evaluation, and blind signal separation. [24] E. Moulines, J.-F. Cardoso, and E. Gassiat, “Maximum like- He has been assuming various responsibilities in research programs lihood for blind separation and deconvolution of noisy sig- in nondestructive testing, robotics, numerical models for image re- nals using mixture models,” in Proc. IEEE Int. Conf. Acoustics, construction and computer vision, and neural network techniques Speech, Signal Processing (ICASSP ’97), vol. 5, pp. 3617–3620, Munich, Germany, April 1997. in astrophysical imagery. Dr Salerno is an Associate Investigator [25] H. Attias, “Independent factor analysis,” Neural Computation, with the Planck-LFI Consortium, and a Member of the Italian vol. 11, no. 4, pp. 803–851, 1999. Society for Information and Communication Technology (AICT- [26] A. J. Banday and A. W. Wolfendale, “Galactic dust emission AEIT). and the cosmic microwave background,” Monthly Notices of the Royal Astronomical Society, vol. 252, pp. 462–472, 1991. C. Baccigalupi is currently an Assistant Pro- [27] G. Giardino, A. J. Banday, K. M. Gorski,´ K. Bennett, J. L. fessor at SISSA/ISAS. He is a member of the Jonas, and J. Tauber, “Towards a model of full-sky Galactic Planck and EBEx cosmic microwave back- synchrotron intensity and linear polarisation: a re-analysis of ground (CMB) polarization experiments. the Parkes data,” Astronomy & Astrophysics, vol. 387, no. 1, In Planck, he is leading the working group pp. 82–97, 2002. on component separation, and in EBEx, [28] C. L. Bennett, R. S. Hill, G. Hinshaw, et al., “First-year he is responsible for the control of the Wilkinson Microwave Anisotropy Probe (WMAP) observa- foreground polarized contamination to the tions: Foreground emission,” Astrophysical Journal Supple- CMB radiation. He is the author of about 40 ment Series, vol. 148, no. 1, pp. 97–117, 2003. papers on refereed international scientific 2412 EURASIP Journal on Applied Signal Processing reviews, on topics ranging from the theory of gravity to CMB data analysis. He is teaching linear cosmological perturbations and CMB anisotropies courses for the Astroparticle Ph.D. course at SISSA. He is involved in long-term international projects. The most important are the Long Term Space Astrophysics funded by NASA for the du- ration of five years, on component separation on COBE, WMAP, and future CMB experiments, and a one-year Mercator Professor- ship to be carried out in the University of Heidelberg in the aca- demic year 2005/2006.

E. E. Kuruoglu˘ was born in Ankara, Turkey, in 1969. He obtained his B.S. and M.S. de- grees both in electrical and electronics engi- neering from Bilkent University in 1991 and 1993, respectively. He completed his gradu- ate studies with M.Phil. and Ph.D. degrees in information engineering from the Cam- bridge University, in the Signal Processing Laboratory, in 1995 and 1998, respectively. During this period, he received the British Council Scholarship, Cambridge Overseas Trust Scholarship, and the Lundgren Award. Upon graduation from Cambridge, he joined the Xerox Research Center in Cambridge as a permanent member of the Collaborative Multimedia Systems Group. After two years in Xerox, he won an ERCIM Fellowship which he spent in INRIA- Sophia Antipolis, France, and IEI CNR, Pisa, Italy. In January 2002, he joined ISTI-CNR, Pisa, as a permanent member. His research interests are in statistical signal processing, human-computer in- teraction, and information and coding theory with applications in image processing, astronomy, telecommunications, intelligent user interfaces, and bioinformatics. He is currently in the Editorial Board of Digital Signal Processing and an Associate Editor for the IEEE Transactions on Signal Processing. He was the Guest Editor for a special issue on signal processing with heavy-tailed distribu- tions published in signal processing, December 2002. He is the Spe- cial Sessions Chair for EURASIP European Signal Processing Con- ference, EUSIPCO 2005, and is the Tutorials Chair for EUSIPCO 2006. In 2005, he has been elected to become a Member of the IEEE Technical Committee on Signal Processing Theory and Methods. He has more than 50 publications and holds 5 US, European, and Japanese patents.

A. Tonazzini graduated cum laude in math- ematics from the University of Pisa, Italy, in 1981. In 1984, she joined the Istituto di Scienza e Tecnologie dell’Informazione of the Italian National Research Council (CNR) in Pisa, where she is currently a re- searcher at the Signals and Images Labora- tory. She cooperated in special programs for basic and applied research on image pro- cessing and computer vision, and is a coau- thor of over 60 scientific papers. Her present interest is in inverse problems theory, image restoration and reconstruction, document analysis and recognition, independent component analysis, and neural networks and learning. EURASIP Journal on Applied Signal Processing 2005:15, 2413–2425 c 2005 Hindawi Publishing Corporation

Adapted Method for Separating Kinetic SZ Signal from Primary CMB Fluctuations

Olivier Forni IAS-CNRS, Universit´e Paris Sud, Batimentˆ 121, 91405 Orsay Cedex, France Email: [email protected]

Nabila Aghanim IAS-CNRS, Universit´e Paris Sud, Batimentˆ 121, 91405 Orsay Cedex, France Email: [email protected] Division of Theoretical Astronomy, National Astronomical Observatory of Japan, Osawa 2-21-1, Mitaka, Tokyo 181-8588, Japan

Received 30 May 2004; Revised 11 December 2004

In this first attempt to extract a map of the kinetic Sunyaev-Zel’dovich (KSZ) temperature fluctuations from the cosmic microwave background (CMB) anisotropies, we use a method which is based on simple and minimal assumptions. We first focus on the intrinsic limitations of the method due to the cosmological signal itself. We demonstrate using simulated maps that the KSZ reconstructed maps are in quite good agreement with the original input signal with a correlation coefficient between original and reconstructed maps of 0.78 on average, and an error on the standard deviation of the reconstructed KSZ map of only 5% on average. To achieve these results, our method is based on the fact that some first-step component separation provides us with (i) a map of Compton parameters for the thermal Sunyaev-Zel’dovich (TSZ) effect of galaxy clusters, and (ii) a map of temperature fluctuations which is the sum of primary CMB and KSZ signals. Our method takes benefit from the spatial correlation between KSZandTSZeffects which are both due to the same galaxy clusters. This correlation allows us to use the TSZ map as a spatial template in order to mask, in the CMB + KSZ map, the pixels where the clusters must have imprinted an SZ fluctuation. In practice, a series of TSZ thresholds is defined and for each threshold, we estimate the corresponding KSZ signal by interpolating the CMB fluctuations on the masked pixels. The series of estimated KSZ maps is finally used to reconstruct the KSZ map through the minimisation of a criterion taking into account two statistical properties of the KSZ signal (KSZ dominates over primary anisotropies at small scales, KSZ fluctuations are non-Gaussian distributed). We show that the results are quite sensitive to the effect of beam convolution, especially for large beams, and to the corruption by instrumental noise. Keywords and phrases: cosmic microwave background, data analysis.

1. INTRODUCTION DASI [4], and Archeops [5], which achieved a firm detection The cosmic microwave background (CMB) temperature of the so-called “first peak” in the CMB anisotropy angular anisotropies field encloses so-called primary anisotropies, di- power spectrum at the degree scale. This detection was re- rectly related to the initial density fluctuations at early stages cently confirmed by the WMAP satellite [6].Aseriesofsmall of the universe, and so-called secondary anisotropies gener- scale CMB experiments (e.g., VSA [7], CBI [8], ACBAR [9]) ated after matter and radiation decoupled. The secondary showed rather conclusive evidence for a second and a third anisotropies arise from the interaction of the CMB pho- peak. The positions, heights, and widths of these features in tons with gravitational potential wells or with ionised gas the angular power spectrum already give us a good idea of along their way towards us. More “local” contributions to the cosmological model. the CMB signal are due to foreground emissions from our It is clear however that such constraints necessitate the galaxy and from distant galaxies. One of the major goals of “cleanest” possible cosmological signal, or in other words observational cosmology is to use the CMB anisotropies to they need the best possible monitoring of contaminating sig- probe the cosmological model mainly through cosmological nals such as secondary anisotropies or foreground emissions. parameter estimation. This is already performed by a num- This is the objective of the component separation for CMB ob- ber of groups using ground-based and balloon-borne exper- servations. In most cases, the different contributing signals to iments such as TOCO [1], BOOMERanG [2], MAXIMA [3], CMB anisotropies exhibit both different frequency (ν)and 2414 EURASIP Journal on Applied Signal Processing spectral (in Fourier or spherical harmonic domains) depen- 8 dences. As a consequence CMB experiments often observe at several frequencies to be able to separate the different as- 6 trophysical signals. Numerous methods were adapted and 4 developed for the CMB problem like the Wiener filtering, maximum entropy, independent component analysis, and so 2 forth [10, 11, 12, 13, 14]. All gave very satisfactory results and showed clearly that one can extract the CMB primary signal 0

from the observed mixture. Obviously the success of the sep- I (in arbitrary units) −2 aration methods greatly depend on how different from each ∆ other the signals, in the observed mixture, are. Signals shar- −4 ing with the primary anisotropies the same frequency de- pendence and/or the same power spectrum will be badly de- −6 tected or even undetected by the separation methods men- 0 5 10 15 20 x tioned above. We present here a new method optimised to extract, from Figure 1: Frequency dependences of the intensity variations due to the primary signal, temperature fluctuations which have the the TSZ effect (solid line) and KSZ effect (dashed line) as a func- x = hν/kT h same frequency dependence and are associated with a ma- tion of the dimensionless frequency CMB ( is the Planck k T = . jor secondary effect, namely, the kinetic Sunyaev-Zel’dovich constant, is the Boltzmann constant, and CMB 2 728). K is the (KSZ) effect (for details see [15]). Our method is based on mean temperature of the CMB. a two-step strategy in which we derive the best estimate of the CMB signal on masked pixels by interpolation, and then We test and develop our method on a set of numerical we deduce the best estimate of the KSZ map by minimisa- simulations that will be described in Section 2.InSection 3, tion. We show that we retrieve the KSZ signal, in the best we detail the methodology adopted to separate KSZ from pri- possible way, in terms of its amplitude, its distribution and mary CMB signal. We focus on the way our method is intrin- its power spectrum provided we use (i) a well-chosen spa- sically limited by pure cosmological signals primary CMB tial template for the masked pixels, and (ii) adapted signal and SZ effect. We perform some sensitivity tests in Section 4 processing techniques for both interpolating and minimis- and explore the effects of beam convolution and instrumen- ing. For the first point, we use the TSZ maps as a template tal white noise on our results. Finally, we discuss our results since both TSZ and KSZ are associated with the same ob- in Section 5. jects (clusters of galaxies). For the second point, namely, the signal processing techniques, there are several issues to ad- dress in order to optimise the extraction of the KSZ signal 2. THE ASTROPHYSICAL SIGNALS fromtheprimaryCMB.Wehavethustomakesureateach Among all secondary anisotropies, the dominant contribu- step that we use adapted techniques. At the interpolation on tion to CMB signal comes from the Sunyaev-Zel’dovich (SZ) the masked pixels defined by the template, we can obviously effect [20, 21] which represents the inverse Compton scatter- use several methods like for example constrained 2D realisa- ing of CMB photons by free electrons in ionised and hot in- tions of the underlying CMB (sensitive to our knowledge of tracluster gas. This so-called thermal SZ (TSZ) effect, whose the CMB through the confidence intervals on the cosmologi- amplitude is given by the Compton parameter y, is the inte- cal parameters), textures [16] (sensitive to the morphological gral of the pressure along the line of sight. The inverse Comp- description of the processes), or simply cubic B-spline meth- ton effect moves the CMB photons from the lower to the ods. The latter, which we use in the present study, is a classi- higher frequencies of the spectrum. This results in a peculiar cal and robust method giving reliable results especially when spectral signature with a brightness decrement at long wave- we set proper continuity and boundary conditions. In order lengths and an increment at short wavelengths (Figure 1, to separate the primary CMB signal from KSZ, several meth- solid line). When the moves with respect to the ods can be used such as principal component analysis or least CMB rest frame, with a peculiar vr , a Doppler square minimisation. Here, we choose to minimise over sta- shift called the kinetic SZ (KSZ) effect generates a tempera- tistical criteria. At the minimisation step,itisthusimportant ture anisotropies with the same spectral signature as the pri- to use the tools which emphasize the different statistical char- mary CMB fluctuations, at first order (Figure 1, dashed line). acteristics of the signal (power, non-Gaussian signatures). The importance of the SZ effect for cosmology has been Among the numerous possible tools used to exhibit non- recognized very early (see reviews by [22, 23]). It is a pow- Gaussian signatures (higher-order moments in real space, bi- erful tool to detect high-redshift galaxy clusters since it is and tri-spectrum (e.g., [17]), Minkowski functionals (e.g., redshift independent. In combination with X-ray observa- [18]), higher criticism (e.g., [19]), the multiscale transforms tions, it can be used to determine the Hubble constant and seem to be the most satisfactory and will thus be used in the probe the intracluster gas distribution. Moreover, the KSZ following. In the same spirit, we use the most sensitive non- effect may be one of the best ways of measuring the cluster Gaussian estimator among the coefficients in a biorthogonal peculiar velocities by combining thermal and kinetic effects wavelet transform, namely, the diagonal coefficients. [21]. Formally, the KSZ can be distinguished from the TSZ Separating Kinetic SZ from CMB 2415 effect due to their different frequency dependences. The KSZ From this simple statement we build a two-step compo- intensity reaches its maximum at ∼218 GHz, just where the nent separation strategy in which (i) we first derive the best TSZ intensity is zero (Figure 1). In practice, very few mea- estimate of the CMB map by interpolation, and (ii) conse- surements of the peculiar velocities were attempted [24, 25]. quently deduce the best estimate of the KSZ map by minimi- With usual component separation techniques it has been sation. shown that the TSZ signal can be extracted from the CMB rather easily (its frequency dependence is quite different from 3.1. Estimating primary CMB anisotropies a black-body spectrum) while the KSZ component remains The basic idea in order to estimate the primary CMB indistinguishable from the primary CMB anisotropies due anisotropies is to use the TSZ map as a mask to select in to same frequency distribution. In early works, [26] used an the δT map the pixels where only primary anisotropies are optimal filtering (Wiener), with a spatial filter derived from present. These pixels contain no TSZ fluctuations in the y X-ray observations of galaxy clusters, that minimises confu- map. The rest of the pixels in the δT map are masked pix- sion with CMB, and [27] used a matched filter optimised els. We then interpolate the δT signal on these masked pixels on simulated data and independent of the underlying CMB with the constraint that pixels where the signal is associated model. with only primary CMB, keep their values after the interpo- We simulate (512 × 512 pixels) primary CMB (∆T/ lation. We therefore obtain an estimated primary CMB map. T RMS = σ = . × −5 σ = . × −5 y )CMB CMB 1 9 10 ), TSZ (mean y 1 17 10 , It is worth noting that map is an observable quantity that is ∆T/T RMS =−. × −5 rather easy to obtain from multifrequency observations due i.e., mean ( )TSZ 2 34 10 at 2 mm) and KSZ ∆T/T RMS = σ = . × −6 to its spectral signature. This is what makes it useful for the (mean ( )KSZ KSZ 1 85 10 ) maps with a pixel- size of 1.5 arcmin. A precise description of the SZ simulations mask definition. is given in [28]. The CMB signal is a Gaussian field whose Formally, the KSZ map can then be estimated simply by ff δ power spectrum is computed from an inflationary flat, low computing the di erence between the original unmasked T matter density, model. The KSZ effect induces temperature map and the primary CMB map obtained from the interpo- KSZ lation. fluctuations given by δT = (∆T/T)KSZ =−(vr /c)τ,withc τ and the velocity of light and the cluster Thomson optical 3.1.1. Interpolation of the masked pixels depth. The primary CMB and the KSZ anisotropies having the same spectral shape (at first order), we construct maps of The reconstruction of the KSZ map depends on the perfor- mances of the interpolation. We use the method described thermodynamic temperature fluctuations, δT , by adding the in [31] and consider the problem of the minimisation of a two signals δT = (∆T/T)KSZ +(∆T/T)CMB. We are thus left with two simulated datasets of pure cosmological signals, one general criterion written as consisting of the temperature fluctuation maps (CMB+KSZ)    E u = w k l f k l − u k l 2 and the other consisting of the Compton parameter maps, y, ( ) ( , ) ( , ) ( , ) (k,l)∈Z2 for the TSZ effect. For our study, we restrict the analysis to 15   2  2 (1) simulated maps which span a representative range of ampli- + λ dx ∗ u(k, l) + dy ∗ u(k, l) , tudes for all signals. (k,l)∈Z2 Note that in “real life,” it is the multifrequency CMB ex- where f is an input image, u is the desired solution, w ≥ 0isa periments that can provide us, after classical component sep- map of space-varying weights, and dx and dy are the horizon- aration, with two sets of maps. One contains both CMB and tal and vertical gradient operators, respectively. The second KSZ temperature fluctuations, as they are indistinguishable, space-invariant term in (1) is a membrane spline regulariser; and the second consists of Compton parameter maps associ- λ ff the amount of smoothness is controlled by the parameter . ated with the TSZ e ect. Taking the partial derivative of (1)withrespecttou,wefind that u is the solution of the differential equation 3. METHOD FOR SEPARATING KSZ f = Wu λLu = Au FROM CMB SIGNAL w + ,(2)

From the two available types of maps (y maps for TSZ and δT where W is the diagonal weight matrix, fw = Wf the maps for CMB + KSZ). Our goal is to obtain the best possi- weighted data vector, L is the discrete Laplacian operator, and ble estimate of the KSZ buried in the dominant CMB signal. A = W + λL a symmetric definite matrix. The inversion of We benefit for this from the fact that TSZ and KSZ features (2) is achieved using a multigrid technique [32]. Typically, are spatially correlated (e.g., [29, 30]). The spatial correla- we need two V-cycles with two iterations in the smoothing tion simply means that both effects are due to galaxy clus- Gauss-Seidel part of the algorithm to reach a residual of the ters. Therefore, where TSZ signal is present, so are KSZ fluc- order of 10−6. tuations regardless of their signs or amplitudes. Conversely, The interpolation of the primary CMB map can be where the TSZ fluctuations are absent, so are the KSZ fluctu- achieved by setting the weights to zero where the data are ations and the signal at that position in the δT map is associ- missing, that is in the masked pixels, and to one elsewhere ated with the CMB primary anisotropies only. Note however and by resolving (2). The value of λ then determines the thatclustersatrestwithrespecttoCMB(vr = 0) will have tightness of the fit at the known data points (unmasked pix- no KSZ signal. els), while the surface u is interpolated such that the value of 2416 EURASIP Journal on Applied Signal Processing the Laplacian of u is zero elsewhere. In the present work, we δT map) and the original KSZ map. When the threshold is impose a low value for λ so that the recovered values at the low, we take into account a large fraction of clusters, but the known data points are equal to the original values. This cri- interpolated surfaces are large and the quality of the interpo- terion can be relaxed to take into account corruption of the lation suffers from that. Moreover, the characteristic scale of data by additive white noise [31]. In this case, the optimum the interpolated surfaces becomes of the order of the CMB regularisation parameter λ can be defined as fluctuations leading to “confusion effects.” ffi σ2 Sinceitisdi cult to choose one single optimal TSZ λ = ,(3)threshold, we retrieve a set of interpolated CMB maps cor- E f · Lf − σ2 ( ) 4 responding to a set of TSZ threshold values. The later are where σ2 is the variance of the noise and E( f · Lf)denotes defined as follows: we compute the cumulative distribution y an estimate of the correlation between the noisy image f and function of the TSZ values in the map and we search for its Laplacian Lf. In the case of nonwhite noise, the optimal the values corresponding to 15%–95% of the total number regularisation parameter λ may be determined from the data of pixels (with a step of 5%). This gives us a set of 17 thresh- y using cross-validation methods [33], or from a given mea- old values. All pixels in the TSZ map that have parameters surement model of the signal + noise [34]. above the thresholds are identified as missing data points in δ Theperformancesoftheinterpolationareimprovedif the simulated T map, that is, the mask. the values of the Laplacian of u at the missing data points 3.1.3. Results are nonzero. Moreover, the values are set such that the first and second derivatives of the interpolated signal are contin- For each of the 15 simulated maps of our datasets, we ob- δ uous throughout the interval. These continuity conditions tain 17 TSZ thresholds, and thus 17 masked T maps. We characterise the cubic B-spline functions which are known interpolate the missing data points to recover the primary for their simplicity and their performances in terms of sig- CMB signal in the masked regions. We evaluate 17 associ- nal reconstruction [35, 36]. In practice, these conditions im- ated KSZ maps by subtracting the interpolated primary CMB δ ply that the source term fw in (2) is modified to impose maps from the total T map. nonzero values at the points where the weights are set to zero We compute for each of the 17 KSZ estimated maps ffi (i.e., masked pixels). An equivalent way to solve (2) with the the correlation coe cient between the original input KSZ ffi above-mentioned conditions is to replace the Laplacian op- map and the 17 estimated KSZ maps. The correlation coe - erator L by the quadratic operator L2. cients are plotted as a function of the standard deviation of Obviously other interpolation methods can be proposed the estimated KSZ map for each of the 17 threshold values and used to estimate the CMB data in the masked pixels. We (Figure 2). The diamonds and the dashed line represent the could for example improve the interpolation by using tex- case where the interpolation is such that the Laplacian val- tures [16]. The latter account for the morphological proper- ues are set to zero, and the triangles and the solid line are for ties of the signal. Such method is limited by our knowledge the case in which the Laplacian values are nonzero. Figure 2a ffi of these characteristics. We could also think of using con- shows our best recovery case in terms of correlation coe - strained 2D realisations of the CMB to obtain the values in cient. Figure 2b is for our worst case. ffi the mask. This method is simple; however, it suffers from the From Figure 2, we see that the correlation coe cient be- precision to which the CMB power spectrum is estimated, or tween the original and the estimated KSZ maps is higher in other words the precision on the cosmological parameters when the Laplacian values are nonzero than when they are set used for the realisation. to zero. This is especially true for the maps with low standard deviations. The improvement due to the biharmonic opera- 3.1.2. Defining the masked pixels tor is of the order of 20% in our worst case (Figure 2b). We We now define the mask, that is, how we select the missing will therefore use, in the following, the L2 operator as it gives data points. Besides the pixels that actually contain no galaxy better interpolation. In addition, we see that the correlation clusters, that is, no SZ contributions, we fix a threshold value coefficient increases when the standard deviation of the es- for the TSZ amplitude below which TSZ signal is considered timated KSZ map increases (i.e., TSZ threshold decreases). too small to be detected. The corresponding pixels in the δT The correlation coefficient reaches a maximum value and maps are then associated only with primary CMB signal. On then decreases for the highest KSZ standard deviations (i.e., the contrary, above this threshold, pixels in the δT map are the lowest TSZ thresholds). considered to be the missing data points (masked pixels) that we want to interpolate. The number and location of the miss- 3.2. Reconstructing the KSZ map ing data depend on the threshold. The choice of this thresh- From the previous step, we obtained a set of 17 estimated old has thus important consequences on the quality of the in- KSZ maps. Now, we search for a method that gives us either terpolation. When the threshold is high, the number of miss- the reconstructed KSZ map which is the closest to the orig- ing data is small and the interpolated surface is good but the inal KSZ signal or the combination of 17 KSZ maps giving selection retains only the clusters with the highest TSZ and the best estimate of the original KSZ map. We compare the misses the majority of clusters. In this case, we expect a low reconstructed maps with the original KSZ maps. This allows correlation coefficient between the retrieved KSZ map (ob- us to calibrate our method and thus provides us with the in- tained by subtracting the interpolated CMB map from the trinsic limitations of the reconstructing methods. Separating Kinetic SZ from CMB 2417

1 However, in this case, the standard deviations of the recon- Best case structed maps are lower than the original KSZ signal by al- most 25% on average. Furthermore, the results of the SVD 0.8 least square minimisation depend on the set of estimated cient maps that are used which is clearly undesirable. ffi The two previous attempts being not quite satisfactory, . 0 6 we thus need as much map-independent results as possible. We must identify a criterion, to minimise on, which should ffi . ideally give the largest possible correlation coe cient and the Correlation coe 0 4 reconstructed KSZ maps with the closest possible standard deviations to those of the original KSZ signal. Moreover, a 0.2 good minimisation criterion would characterise the KSZ sig- 123456nal only, excluding the primary CMB signatures. We have −6 Standard deviation ∆T/T ×10 identified two properties of the KSZ fluctuations that fulfill this definition. (a) (i) The KSZ signal dominates primary CMB at high wave 1 numbers (small angular scales). ff Worst case (ii) The KSZ e ect is a highly non-Gaussian process con- trary to primary CMB which is a Gaussian process. . 0 8 The KSZ effect is due to galaxy clusters whose typical sizes cient

ffi are a few to a few tens of arcmin. As a result, SZ anisotropies produced either by KSZ or TSZ intervene at small angu- 0.6 lar scales where they show a maximum amplitude (Figure 3, dashed line). At those scales, primary CMB anisotropies are . severely damped and the angular power spectrum decreases Correlation coe 0 4 sharply (Figure 3, solid line). Therefore, at small scales, both the power and the statistical properties of the total δT signal 0.2 should be those of the dominant signal, that is, KSZ effect. 0123456In order to focus on the KSZ signal and also to enhance the × −6 Standard deviation ∆T/T 10 signal-to-noise ratio, we perform multiscale wavelet decom- δ (b) position of the T map. The above-mentioned properties re- main true in the wavelet domain as it was first recognised by Figure 2: The correlation coefficient between the original KSZ map [37] and applied by [38]. Thus, the statistical properties of and the series of 17 estimated KSZ maps as a function of the stan- the wavelet coefficients at the lowest decomposition scale (3 dard deviations of the estimated KSZ maps. (a) The best case and arcmin) reflect the properties of the SZ effect only. (b) the worst case. Triangles and solid line stand for the interpola- We use the decimated biorthogonal wavelet transform tion with the biharmonic operator; diamonds and dashed line are which decomposes a signal s as follows: for the interpolation with the Laplacian. The interpolation with the biharmonic operator gives better results especially for the KSZ maps   J with low standard deviation. The vertical lines mark the standard s(l) = cJ,kφJ,l(k)+ ψj,l(k)wj,k (4) −6 −6 deviation of the original KSZ maps (2.6 ×10 and 1.2 ×10 ). The k k j=1 standard deviation of the primary CMB is 1.9 ×10−5. − j − j − j − j with φj,l(x) = 2 φ(2 x − l)andψj,l(x) = 2 ψ(2 x − l), where φ and ψ are, respectively, the scaling and the wavelet 3.2.1. Method functions. J is the number of resolutions used in the de- We test the decorrelation by principal component analysis composition, wj the wavelet coefficients (or details) at scale (PCA). The PCA gives us a reconstructed KSZ signal which j,andcJ a smooth version of s (j = 1 corresponds to the is rather close to the original. The correlation coefficient, av- finest scale, highest frequencies). The two-dimensional al- eraged over the 15 maps, between reconstructed and orig- gorithm gives three wavelet subimages at each decompo- inal KSZ reaches 0.73. However, the standard deviation is sition scale. Within this choice, the wavelet analysis pro- on average smaller by almost 50% than the original. This vides us with the wavelet coefficients associated with diag- is not satisfactory. We can also search for a linear combi- onal, vertical, and horizontal details of the analysed map. nation of the 17 estimated KSZ maps that is the closest to Using this tool, we have demonstrated [39, 40] that the ex- the original KSZ in the sense of least squares. This minimi- cess kurtosis of the wavelet coefficients in a biorthogonal sation is done using a standard singular value decomposition decomposition allows us to discriminate between a Gaus- (SVD). The average correlation coefficient (over the 15 sim- sian primary CMB signal and a non-Gaussian process like ulated input maps) between the original KSZ map and the SZ effect better than with an orthogonal wavelet decompo- reconstructed map is 0.8, slightly higher than the PCA result. sition. Moreover, we have shown that diagonal details are 2418 EURASIP Journal on Applied Signal Processing

×10−6 10−8 3

l − C 10 9

+1) 2.5 l T/T (

l −10 10 ∆

10−11 2

10−12 1.5 Standard deviation 10−13 Angular power spectrum

10−14 1 1000 02468101214 Multipole Map number

Figure 3: Angular power spectrum of the primary CMB Figure 4: Standard deviations of our set of 15 KSZ original simu- anisotropies (solid line) and of the KSZ fluctuations from galaxy lated maps (triangles) as compared with standard deviations of the clusters (dashed line). The plots are for one statistical realisation of 15 reconstructed KSZ maps (squares). The reconstruction is based both processes. (The multipole is equivalent to a wave number in on the minimisation of the statistical criterion. the spherical harmonic decomposition of the sky.)

Table 1: The statistical properties of the first scale (3 arcmin) diag- Consequently, we can confidently minimise on the statis- onal wavelet coefficients distribution for the δT map (KSZ + CMB), tical properties of power and non-Gaussianity at the small- the KSZ map, and the primary CMB alone. The two cases stand for est decomposition scale. In practice, we choose the follow- our best case (first pair) and the worst case (second pair). We note ing criterion minimised over the 17 estimated KSZ maps (for that the three moments are almost identical and characterise well each of the 15 maps of our dataset): the KSZ fluctuations; they are very different from the CMB fluctua- tions properties.           M w − M w 2 M w − M w 2 ζ = 2 0  2( ) 4 0  4( ) Min M2 w + M2 w , Standard deviation Skewness Excess kurtosis 2 0 4 0 KSZ + CMB 6.45 ×10−7 0.10 8.71 (5) KSZ 6.45 ×10−7 0.10 8.72 where w0 is the distribution of diagonal wavelet coefficients −7 KSZ + CMB 2.05 ×10 0.22 8.97 for the known δT map(KSZ+CMB)andw is the distri- KSZ 2.09 ×10−7 0.23 9.15 bution of diagonal wavelet coefficients for the desired solu- − M M CMB 1.60 ×10 8 −0.02 0.45 tion map (the reconstructed map). 2 and 4 are respec- tively the second and the fourth moments of the wavelet co- efficients. This criterion takes into account both the energy content of the coefficients, through second moment, and the the most sensitive to non-Gaussian signatures (recently con- non-Gaussian character, through fourth moment. We have firmed and explained in [41]). We therefore choose to use chosen the fourth moment because it is the one for which the the diagonal details in a biorthogonal wavelet decomposi- KSZ signal is the most sensitive to non-Gaussianity. Clearly, tion at the smallest decomposition scale to obtain the best we might also include the third moment of the wavelet coef- results. ficients to the criterion. This would be needed in particular In Table 1, we compare, using the 9/7 biorthogonal filter if we were dealing with a “skewed” signal (e.g., weak lensing bank [42] for the worst and best cases, the statistical prop- signal). Taking the fourth moment in the minimisation crite- erties of the diagonal details of KSZ maps and CMB + KSZ rion allows us in turn to focus on the reconstruction of KSZ maps at the first decomposition scale (3 arcmin). We also give maps excluding any skewed signal that might contribute at the values for the primary CMB maps. As expected, we note small scales. that the wavelet coefficients for KSZ and KSZ + CMB share In addition to the conditions of power and non-Gaussian the same statistical properties and are quite different from character, we make use in the minimisation process, of a those of the primary CMB alone. This confirms that KSZ sig- nice property of the wavelet transform, which is that it pre- nal dominates over primary CMB in wavelet domain (same serves the spatial information. Thus instead of minimising standard deviation means same power, c.f. Figure 3 in real over all wavelet coefficients of the data map (w0 in (5)), we space), and that non-Gaussian signatures in the KSZ + CMB can minimise only over those corresponding to clusters. This maps are associated with the KSZ effect (same skewness and enhances the non-Gaussian character and reduces the influ- excess kurtosis) alone at the smallest decomposition scale (3 ence of other possible non-Gaussian processes that could af- arcmin). fect the anisotropy map δT . Separating Kinetic SZ from CMB 2419

106 106 105 105 4 −6 4 −6 10 σreal = 2.684 × 10 10 σreal = 1.177 × 10 σ = . × −6 σ = . × −6 103 est 2 594 10 103 est 1 118 10 102 102 Number count 101 Number count 101 100 100 −4 −20 2 4 −3 −2 −10123 ∆T/T ×10−5 ∆T/T ×10−5

(a) (a)

10−10 10−11

10−11 10−12 l l C C 10−12 10−13 +1) −16 +1) −16 l Preal = 4.384 × 10 l Preal = 2.116 × 10 ( ( l P = . × −16 l P = . × −16 10−13 est 4 107 10 10−14 est 1 274 10

10−14 10−15 10 100 1000 10000 10 100 1000 10000 Multipole l Multipole l

(b) (b)

10 10

ratio 1 ratio 1 l l C C

0.1 0.1 10 100 1000 10000 10 100 1000 10000 Multipole l Multipole l

(c) (c)

Figure 5: (a)-(b) Histogram and power spectrum of the original Figure 6: The same as Figure 5. This is our worst reconstruction KSZ map (solid line) and of the reconstructed KSZ map (dashed case which corresponds to the original KSZ map with the lowest line). The reconstruction was obtained by minimising a statistical standard deviation. Note the low correlation coefficient 0.62. criterion. (c) The ratio of original to reconstructed power spectrum. Note the correlation coefficient between original and reconstructed ∼ P KSZ maps of 0.9 and the agreement between total power real and The quality of the KSZ map reconstruction can be observed P est. in Figures 5 and 6 which display, for our best and worst cases respectively, the histograms of the temperature fluctuations 3.2.2. Results and the power spectra of both original (solid line) and re- constructed (dashed line) KSZ maps as well as the ratio of In Figure 4, we present the standard deviations of the 15 these two power spectra. Note that the ratio is close to one original simulated KSZ maps (triangles) and of the 15 over a large range of multipoles (wave number in the spheri- reconstructed KSZ maps (squares) obtained by the above- cal harmonic decomposition) even in the domain where the mentioned minimisation technique. The agreement is good primary CMB dominates the KSZ signal (see Figure 3). We even for the lowest standard deviations with an error only also notice the correlation coefficient between original and of the order of ∼5%. This is much smaller than what was reconstructed KSZ maps which reaches ∼0.9 in our best case obtained from the PCA method (∼50%) or from the least and 0.62 in our worst case. The comparison between stan- square minimisation (∼25%). Furthermore, the mean value dard deviations of original and reconstructed maps σreal and (over the 15 original maps) of the correlation coefficient be- σest also gives a global indication on how well the method tween the original and the reconstructed KSZ maps is 0.78. works. 2420 EURASIP Journal on Applied Signal Processing

×10−5 ×10−6 2 6 1 4 2

T/T 0 T/T

∆ ∆ 0 − 1 Line 64 −2 Line 64 −2 −4 0 100 200 300 400 500 0 100 200 300 400 500 Pixel number Pixel number

×10−5 ×10−6 1.5 8 1 6 0.5 4 2

T/T 0 T/T ∆ . ∆ 0 −0 5 −2 Line 192 Line 192 −1 −4 −1.5 −6 0 100 200 300 400 500 0 100 200 300 400 500 Pixel number Pixel number

×10−5 ×10−6 2 10 1 5

T/T 0 T/T ∆ ∆ 0 −1 Line 320 Line 320 −2 −5 0 100 200 300 400 500 0 100 200 300 400 500 Pixel number Pixel number

×10−5 ×10−6 1.5 6 1 4 . 0 5 2 0 T/T − . T/T 0

∆ 0 5 ∆ −1 −2 Line 448 Line 448 −1.5 −4 −2 −6 0 100 200 300 400 500 0 100 200 300 400 500 Pixel number Pixel number

Figure 7: Cuts across the best reconstructed KSZ map (dashed line) Figure 8: Same as Figure 7 for the worst reconstructed KSZ map and its original counterpart (solid line). The cuts have the same po- and its original counterpart. sition in both maps.

The method allows us to obtain such results because we structions of the KSZ original maps, both in terms of corre- are able to estimate correctly the amplitude of the KSZ sig- lation coefficient, power spectrum and pixel distribution. We nal for most clusters together with their angular separation, now investigate some of the effects that can affect our results. as well as the amplitude of the background (primary CMB). 4.1. Amplitude of the input KSZ signal This is nicely exhibited by the superposition of the cuts across reconstructed (dashed line) and original (solid line) KSZ The previous results were obtained in a specific model which maps, for the best and worst cases (Figures 7 and 8,resp.). predicts the amplitude of the KSZ signal and thus its ratio The method partially fails to find broad KSZ features due to to primary CMB anisotropies. Obviously the KSZ amplitude their important level of confusion with primary CMB fluctu- can vary for many physical reasons (number of clusters, dis- ations. Moreover, since the minimisation process is an overall tribution of velocities, etc.). It is thus important to test what procedure, relatively large features (i.e., of the order of 10−5 is the performances of our separation method are in response ff in absolute ∆T/T) are occasionally poorly recovered. to di erent mixing ratios. For illustration, we take one KSZ map and add it to the same primary CMB map. The stan- dard deviation of the KSZ signal is reduced while the CMB 4. SENSITIVITY TESTS standard deviation is kept the same (i.e., we reduce the KSZ contribution to the 5δT map). We reduce the standard devi- We have shown in the previous section that statistical min- √ i imisation with a well-chosen criterion gives very good recon- ation following a geometrical progression σi = σ0 2 with Separating Kinetic SZ from CMB 2421

Table 2: Standard deviations of the KSZ maps and correlation coefficients between original and reconstructed KSZ maps for the same KSZ map with standard deviations ranging from 2.5 ×10−7 to 2.0 ×10−6. Two wavelet bases are tested.

9/7 filter 6/10 filter Original σ Estimated σ Correlation coefficient Estimated σ Correlation coefficient 2.5 ×10−7 2.21 ×10−7 0.48 2.68 ×10−7 0.45 3.53 ×10−7 3.36 ×10−7 0.54 4.02 ×10−7 0.52 5.0 ×10−7 5.52 ×10−7 0.56 5.46 ×10−7 0.58 7.07 ×10−7 8.02 ×10−7 0.59 6.60 ×10−7 0.67 1.0 ×10−6 9.74 ×10−7 0.68 9.16 ×10−7 0.71 1.41 ×10−6 1.35 ×10−6 0.74 1.20 ×10−6 0.77 2.0 ×10−6 1.94 ×10−6 0.78 1.77 ×10−6 0.81

−7 i = 0, 6 and σ0 = 2.5 ×10 . The highest standard devia- To illustrate the effectofalargerbeam,wehaveconvolved −6 tion is then σmax = 2.0 ×10 which is a typical value for our observed maps (y and δT maps) by a Gaussian-shaped our dataset. At the same time, we test the sensitivity of our beam (for simplicity) with a size of 3 arcmin. We find that method to the wavelet base by comparing results obtained the reconstructed KSZ map is not satisfying neither in terms using two different biorthogonal wavelet bases, the 9/7 tap of the correlation coefficients (mean coefficient of 0.59), nor filter and the 6/10 tap filter [43]. in terms of the average amplitudes (standard deviations of The results for this test are displayed in Table 2.Wefirst the 15 reconstructed maps are typically 40% smaller than the notice that results do not depend much on the wavelet ba- original), nor in terms of the power spectrum. We show in sis. As expected, the quality of the reconstruction (in terms Figure 10 a cut across a reconstructed KSZ map (which is not of correlation coefficient) increases with the standard devi- our best case) and its original counterpart. We note that only ation of the original KSZ map from 0.5 to ∼0.8. The small- the largest amplitude features are reconstructed but with am- est coefficients are obtained for very-low-standard deviations plitudes which are lower than the original. As expected, we (< 10−6). It is worth noting that decreasing the KSZ ampli- find that the results get worse for larger beam sizes. tudebyafactor2(i.e.,afactor4inpower)stillgivesareason- One way to improve our results in the case of large beam ably good correlation coefficient. At the same time, the stan- experiments might be to use a minimisation criterion in (5) dard deviation of the reconstructed KSZ map is very close to based on other wavelet decomposition scales which should the original even when the input KSZ signal is decreased by be less affected by the beam dilution. For example, the second one order of magnitude in terms of standard deviation. This smallest decomposition scale could be used in the case of a 3 is illustrated in Figure 9 by the reconstructed power spec- arc-minute beam. At that scale, the non-Gaussian character trum of the KSZ map. of the KSZ signal is indeed still preserved (see [39]), how- ever the power is no more dominated by KSZ but rather by 4.2. Beam convolution the CMB primary signal. More adapted criteria should then Our separation method is based on two steps; the first is be investigated, but they will likely require more “a priori” the interpolation and the second is the minimisation. Obvi- knowledge of both KSZ and primary CMB signals. ously, when the sky is observed by an instrument, the δT map suffers from beam dilution, which means that the signal is 4.3. Noise damped at the typical scale of the beam size. The same is true We illustrate possible effects of noise on our separation for the y map for which the damping can be even more severe method by adding to the observed δT and y maps a white since the signal is mainly at small scales. As a consequence, noise at the pixel size whose RMS amplitude in terms of tem- the definition of the mask based on the TSZ template and perature fluctuation is 2 × 10−6. This corresponds to a noise used in the interpolation is also affected by beam dilution. level of about 6 µK which is the typical noise of most future We expect that this reduces the quality of the interpolation SZ experiments. We note that the RMS noise level is of the and in turn that of the 17 estimated KSZ maps. Moreover, the order of the mean standard deviation of the original input minimisation criterion is based on two properties of the KSZ KSZ signal. It is twice as large as the standard deviation of signal (non-Gaussian character and excess of power) as com- some KSZ maps. This not only modifies the amplitude of pared to the primary CMB, which are mostly true at small the fluctuations in the δT at a given position in the map, but scales. When the δT map is convolved by the beam instru- also significantly modifies the position of the maxima and ment, the contribution from KSZ signal is reduced affecting shape of the fluctuations associated with the KSZ signal. As a also the statistical minimisation criterion. consequence, the spatial correlation between TSZ and KSZ is All these effects depend on the size of the beam. The decreased and the reconstructed KSZ signal is different from smallest the beam is, the less affected the recovered signal the input map (see Figure 11). The correlation coefficients is. For a beam-size of 1.5 arc-minute (like that of some between the original and reconstructed KSZ maps are obvi- planned SZ experiments), there should be no effect on our ously very low in this case with values ranging between 0.24 results since our minimum resolution is 1.5arc-minute. and 0.54. 2422 EURASIP Journal on Applied Signal Processing

106 4 ) 105 6 − 4 −7 10 σreal = 2.500 × 10 2 σ = . × −7 103 est 2 208 10 102 0 Number count 101 0 10 − −3 −2 −10123 2 ∆T/T ×10−6 −4 −12

10 Amplitude of KSZ fluctuations (10 −6 10−13 0 100 200 300 l

C Pixel number 10−14 +1) −18 l Preal = 9.459 × 10 ( l P = . × −18 10−15 est 3 156 10 Reconstructed Input 10−16 10 100 1000 10000 Multipole l Figure 10: Cuts across a typical reconstructed KSZ map (dashed line) and its original counterpart (solid line). The cuts have the same position in both maps. (No noise, convolution beam = 3 arcmin.) 10

is that no a priori criteria are needed to obtain the KSZ map.

ratio 1 However, the resulting maps are of low quality in terms of l

C standard deviation. More sophisticated methods such as the independent component analysis [12, 44, 45]canbeusedbut the results obtained need to be rescaled using external con- 0.1 10 100 1000 10000 straints. Multipole l In our study, we choose to use a reconstruction method based on a minimisation technique. We propose a minimisa- tion criterion taking into account statistical properties of the Figure 9: Same as Figure 5. The standard deviation of the origi- −7 KSZ signal: (i) KSZ dominates over primary anisotropies at nal KSZ map is very low (σreal = 2.5 ×10 ). Note the excess of near-zero values in the histogram of the estimated map (logarith- small angular scales, and (ii) KSZ fluctuations follow a non- micscale).Notealsotheverylowcorrelationcoefficient 0.48. This Gaussian distribution. We use the excess kurtosis of the di- is for the worst case. agonal wavelet coefficients to characterise the non-Gaussian signatures of the KSZ effect. The minimisation method gives reconstructed KSZ maps that are in quite good agreement As noted in Section 3.1.1, the white noise can in princi- with the original signal with an average correlation coeffi- ple be accounted for at the interpolation stage in the regu- cient between original and reconstructed KSZ maps of 0.78, larisation parameter. Such possibilities should have to be ad- and an error of 5% on the standard deviation of recon- dressed. structed KSZ maps. The KSZ reconstruction through min- imisation depends on the minimisation criteria and there- fore on our knowledge of the signals. The available CMB data 5. DISCUSSION seem to agree on the fact that primary CMB anisotropies are We presented a method for separating the KSZ signal from Gaussian distributed at least at small scales [46, 47, 48]; see primary CMB anisotropies based on two steps: (1) interpo- [49] for large scales. The KSZ effect is dominant at small lation and (2) reconstruction. In our case this corresponds to scales since it is associated with galaxy clusters. We have the interpolation of a correlated noise (the CMB). The KSZ tested our results against the relative amplitude of KSZ to reconstruction is based on a set of KSZ estimated maps ob- primary signal. We find satisfactory results even when KSZ tained with a choice of TSZ thresholds (from the cumulative is twice as small (in RMS) as predicted. distribution of the pixels in the TSZ template map), more so- The results above are for the case where only the two sig- phisticated methods optimising the series of TSZ thresholds nals CMB and SZ are taken into account, which allows us can be proposed. Using the set of KSZ estimated maps, we to investigate the intrinsic limitations of the method. Addi- can investigate several methods to reconstruct the final KSZ tional astrophysical contributions should be partly treated maps. We tested a decorrelation-based approach using the in a first-step component separation (from which we ob- PCA. The decorrelation is a blind method whose advantage tain the observed y and δT maps). For example if some Separating Kinetic SZ from CMB 2423

20 ACKNOWLEDGMENT ) 6 − 10 The authors wish to thank the Editor C. Baccigalupi for his incitement and anonymous referees for their comments on a previous version. 0

−10 REFERENCES [1] A. D. Miller, R. Caldwell, M. J. Devlin, et al., “A measurement −20 of the angular power spectrum of the CMB from l = 100 to 400,” Astrophysical Journal, vol. 524, pp. L1–L4, 1999. −30 [2] P. de Bernardis, P. A. R. Ade, J. J. Bock, et al., “A flat universe from high-resolution maps of the cosmic microwave back- Amplitude of KSZ fluctuations (10 −40 ground radiation,” Nature, vol. 404, no. 6781, pp. 955–959, 0 100 200 300 2000. Pixel number [3] S . Hanany, P. A. R. Ade, A. Balbi, et al., “MAXIMA-1: a mea- surement of the cosmic microwave background anisotropy on  ◦ Reconstructed angular scales of 10 −5 ,” Astrophysical Journal, vol. 545, no. 1, Input pp. L5–L9, 2000. [4] N. W. Halverson, E. M. Leitch, C. Pryke, et al., “Degree an- Figure 11: Cuts across a typical reconstructed KSZ map (dashed gular scale interferometer first results: a measurement of the line) and its original counterpart (solid line). The cuts have the same cosmic microwave background angular power spectrum,” As- position in both maps. (Noise = 2 × 10−6, no convolution.) trophysical Journal, vol. 568, no. 1, pp. 38–45, 2002. [5] A. Benoit, P. A. R. Ade, A. Amblard, et al., “The cosmic mi- crowave background anisotropy power spectrum measured by Archeops,” Astronomy and Astrophysics, vol. 399, no. 3, contribution from the TSZ signal remains in the δT map, it pp. L19–L23, 2003. will act as a correlated noise. We can account for it at the [6]C.L.Bennett,M.Halpern,G.Hinshaw,etal.,“Firstyear interpolation stage with the additional constraint that the Wilkinson Microwave Anisotropy Probe (WMAP) Observa- skewness should be zero (which is the case for the primary tions: preliminary maps and basic results,” Astrophysical Jour- CMB anisotropies), or in the minimisation procedure using nal Supplement Series, vol. 148, no. 1, pp. 1–27, 2003. [7] P. F. Scott, P. Carreira, K. Cleary, et al., “First results from a generalised criterion including the skewness as well as the the very small array—III. The cosmic microwave background excess kurtosis. In the present study, we have tested for the power spectrum,” Monthly Notices of the Royal Astronomical presence of an instrumental white noise at the pixel scale Society, vol. 341, no. 4, pp. 1076–1083, 2003. with 6 µK RMS amplitude. We find that such noise level af- [8] T. J. Pearson, B. S. Mason, A. C. S. Readhead, et al., “The fects the KSZ map reconstruction making it difficult to re- anisotropy of the microwave background to l = 3500: mosaic cover the KSZ signal buried in CMB. In theory, instrumental observations with the cosmic background imager,” Astrophys- ical Journal, vol. 591, pp. 556–574, 2003. noise can be taken into account in the interpolation step by λ [9] M. C. Runyan, P. A. R. Ade, J. J. Bock, et al., “First results from relaxing the parameter . Another way to deal with noise is the arcminute cosmology bolometer array receiver,” New As- to minimise not on the non-Gaussian character of the KSZ, tronomy Reviews, vol. 47, no. 11-12, pp. 915–923, 2003. but rather on the statistical properties of the remainder (i.e., [10]M.P.Hobson,A.W.Jones,A.N.Lasenby,andF.R.Bouchet, CMB+noise+other components) at scales where CMB dom- “Foreground separation methods for satellite observations of inates. We should then obtain an estimate of all the compo- the cosmic microwave background,” Monthly Notices of the nents except KSZ that can then be subtracted from the total Royal Astronomical Society, vol. 300, no. 1, pp. 1–29, 1998. [11] F. R. Bouchet and R. Gispert, “Foregrounds and CMB signal. These methods will need to be investigated in the fu- experiments—I. Semi-analytical estimates of contamination,” ture. New Astronomy, vol. 4, no. 6, pp. 443–479, 1999. Another key element of our separation method is the use [12] C. Baccigalupi, L. Bedini, C. Burigana, et al., “Neural net- of a spatial template. The choice of a spatial template is an works and the separation of cosmic microwave background important issue since it is used to define the mask and hence and astrophysical signals in sky maps,” Monthly Notices of the the interpolated regions. The template should then be the Royal Astronomical Society, vol. 318, no. 3, pp. 769–780, 2000. closest possible to the signal. In our case, the optimal choice [13] J. Delabrouille, J.-F. Cardoso, and G. Patanchon, “Multidetec- tor multicomponent spectral matching and applications for is the TSZ signal itself as it allows us to evaluate the temper- cosmic microwave background data analysis,” Monthly Notices ature fluctuations associated with KSZ in the map without of the Royal Astronomical Society, vol. 346, no. 4, pp. 1089– resorting to the knowledge or the measurement of cluster pa- 1102, 2003. rameters. However, the beam dilution caused by observation [14] M. P.Hobson and C. McLachlan, “A Bayesian approach to dis- suppresses the signal at small scales and can significantly af- crete object detection in astronomical data sets,” Monthly No- fect the results especially for large beam sizes. (Note that our tices of the Royal Astronomical Society, vol. 338, no. 3, pp. 765– . 784, 2003. previous results are equivalent to a 1 5 arc-minute beamsize.) [15] O. Forni and N. Aghanim, “Separating the kinetic Sunyaev- One way around the problem is to resort to multiscale min- Zel’dovich effect from primary cosmic microwave back- imisation criteria at the reconstruction step; we will investi- ground fluctuations,” Astronomy and Astrophysics, vol. 420, gate this question in the future. no. 1, pp. 49–60, 2004. 2424 EURASIP Journal on Applied Signal Processing

[16] D. J. Heeger and J. R. Bergen, “Pyramid-based texture anal- [35] M. Unser, A. Aldroubi, and M. Eden, “B-spline signal process- ysis/synthesis,” in Proc. 22nd Annual Conference on Computer ing: Part II—efficient design and applications,” IEEE Trans. Graphics (SIGGRAPH ’95), pp. 229–238, Los Angeles, Calif, Signal Processing, vol. 41, no. 2, pp. 834–848, 1993. USA , August 1995. [36] P. Thevenaz,´ T. Blu, and M. Unser, “Interpolation revisited,” [17]M.Kunz,A.J.Banday,P.G.Castro,P.G.Ferreira,andK.M. IEEE Trans. Med. Imag., vol. 19, no. 7, pp. 739–758, 2000. Gorski, “The trispectrum of the 4 year COBE DMR data,” As- [37] P. G. Ferreira, J. Magueijo, and J. Silk, “Cumulants as trophysical Journal, vol. 563, no. 2, pp. L99–L102, 2001. non-Gaussian qualifiers,” Physical Review D,vol.56,no.8, [18] S. F. Shandarin, “Testing non-gaussianity in cosmic mi- pp. 4592–4603, 1997. crowave background maps by morphological statistics,” [38] J. Pando, D. Valls-Gabaud, and L.-Z. Fang, “Evidence for Monthly Notices of the Royal Astronomical Society, vol. 331, scale-scale correlations in the cosmic microwave background no. 4, pp. 865–874, 2002. radiation,” Physical Review Letters, vol. 81, no. 21, pp. 4568– [19] D. Donoho and J. Jin, “Higher criticism for detecting sparse 4571, 1998. heterogeneous mixtures,” Annals of Statistics, vol. 32, no. 3, [39] N. Aghanim and O. Forni, “Searching for the non-Gaussian pp. 962–994, 2004. signature of the CMB secondary anisotropies,” Astronomy and [20] R. A. Sunyaev and Y. B. Zel’dovich, “The observations of Astrophysics, vol. 347, pp. 409–418, 1999. relic radiation as a test of the nature of X-Ray radiation from [40] O. Forni and N. Aghanim, “Searching for non-gaussianity: the clusters of galaxies,” Comments on Astrophysics and Space statistical tests,” Astronomy and Astrophysics Supplement Se- Physics, vol. 4, pp. 173–178, 1972. ries, vol. 137, pp. 553–567, 1999. [21] R. A. Sunyaev and Y. B. Zel’dovich, “Microwave background [41] J.-L. Starck, N. Aghanim, and O. Forni, “Detection and radiation as a probe of the contemporary structure and his- discrimination of cosmological non-Gaussian signatures by tory of the universe,” Annual Review of Astronomy and Astro- multi-scale methods,” Astronomy and Astrophysics, vol. 416, physics, vol. 18, pp. 537–560, 1980. pp. 9–17, 2004. [22] Y. Rephaeli, “Comptonization of the cosmic microwave back- [42] A. Cohen, I. Daubechies, and J. C. Feauveau, “Biorthogonal ground: the Sunyaev-Zel’dovich effect,” Annual Review of As- bases of compactly supported wavelets,” Tech. Rep., AT&T tronomy and Astrophysics, vol. 33, pp. 541–579, 1995. ff Bell Lab, Murray Hill, NJ, USA, 1990, Page: TM 11217- [23] M. Birkinshaw, “The Sunyaev-Zel’dovich e ect,” Physics Re- 900529-07. ports, vol. 310, pp. 97–195, 1999. [43] J. D. Villasenor, B. Belzer, and J. Liao, “Wavelet filter evalu- [24] J. M. Lamarre, M. Giard, E. Pointecouteau, et al., “First mea- ff ation for image compression,” IEEE Trans. Image Processing, surement of the submillimeter Sunyaev-Zel’dovich e ect,” As- vol. 4, no. 8, pp. 1053–1060, 1995. trophysical Journal, vol. 507, pp. L5–L8, 1998. [44] J.-F. Cardoso and A. Souloumiac, “Blind beamforming for [25] B. A. Benson, S. E. Church, P. A. R. Ade, et al., “Peculiar veloc- non Gaussian signals,” IEE Proceedings-F, vol. 140, no. 6, ity limits from measurements of the spectrum of the Sunyaev- pp. 362–370, 1993. Zel’dovich effect in six clusters of galaxies,” Astrophysical Jour- nal, vol. 592, no. 2, pp. 674–691, 2003. [45] A. Hyvarinen,¨ “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Trans. Neural Net- [26] M. G. Haehnelt and M. Tegmark, “Using the kinematic works, vol. 10, no. 3, pp. 626–634, 1999. Sunyaev-Zel’dovich effect to determine the peculiar velocities of clusters of galaxies,” Monthly Notices of the Royal Astronom- [46] L. Cayon, E. Martinez-Gonzalez, F. Argueso, A. J. Banday, and ical Society, vol. 279, pp. 545–556, 1996. K. M. Gorski, “COBE-DMR constraints on the nonlinear cou- [27] N. Aghanim, A. De Luca, F. R. Bouchet, R. Gispert, and J. pling parameter: a wavelet based method,” Monthly Notices of L. Puget, “Cosmology with Sunyaev-Zel’dovich observations the Royal Astronomical Society, vol. 339, no. 4, pp. 1189–1194, from space,” Astronomy and Astrophysics, vol. 325, pp. 9–18, 2003. 1997. [47] E. Komatsu, A. Kogut, M. R. Nolta, et al., “First-year [28] N. Aghanim, K. M. Gorski, and J.-L. Pujet, “How accu- Wilkinson Microwave Anisotropy Probe (WMAP) observa- rately can the SZ effect measure peculiar cluster velocities and tions: tests of gaussianity,” Astrophysical Journal Supplement, bulk flows?” Astronomy and Astrophysics, vol. 374, pp. 1–12, vol. 148, pp. 119–134, 2003. 2001. [48] M. G. Santos, A. Cooray, Z. Haiman, L. Knox, and C.-P. Ma, [29] A. Diaferio, R. A. Sunyaev, and A. Nusser, “Large-Scale mo- “Small-scale cosmic microwave background temperature and tions in superclusters: their imprint in the cosmic microwave polarization anisotropies due to patchy reionization,” Astro- background,” Astrophysical Journal, vol. 533, no. 2, pp. L71– physical Journal, vol. 598, pp. 756–766, 2003. L74, 2000. [49] P. Vielva, E. Martinez-Gonzalez, R. B. Barreiro, J. L. Sanz, and [30] M. Sorel, N. Aghanim, and O. Forni, “Blind statistical indica- L. Cayon, “Detection of non-Gaussianity in the WMAP 1-year tors of the kinetic Sunyaev-Zel’dovich anisotropies,” Astron- data using spherical wavelets,” Astrophysical Journal, vol. 609, omy and Astrophysics, vol. 395, no. 3, pp. 747–751, 2002. no. 1, pp. 22–34, 2004. [31] M. Unser, “Multigrid adaptive image processing,” in Proc. IEEE International Conference on Image Processing (ICIP ’95), vol. 1, pp. 49–52, Washington, DC, USA, October 1995. Olivier Forni is a planetologist at the Institut d’Astrophysique Spa- [32] P. Wesseling, An Introduction to Multigrid Methods,JohnWi- tiale in Orsay (France). His main research activities deal with the ley & Sons, Chichester, UK, 1992. evolution of the planets and satellites of the solar system. Recently [33] G. Wahba, “Practical approximate solutions to linear operator he has been working on the statistical properties of the cosmic mi- equations when the data are noisy,” SIAM Journal on Numeri- crowave background (CMB) and of the secondary anisotropies by cal Analysis, vol. 14, no. 4, pp. 651–667, 1977. means of multiscale transform analysis. He also got involved in [34] S. J. Reeves, “Optimal space-varying regularization in iterative component separation techniques in order to improve the detec- image restoration,” IEEE Trans. Image Processing, vol. 3, no. 3, tion of low-power signatures and to analyse hyperspectral infrared pp. 319–324, 1994. data on Mars. Separating Kinetic SZ from CMB 2425

Nabila Aghanim is a cosmologist at the Institut d’Astrophysique Spatiale in Orsay (France). Her main research interests are cosmic microwave background (CMB) and large-scale structure. She has been working during the last ten years on the statistical charac- terisation of CMB temperature anisotropies through power spec- trum analyses and higher-order moments of wavelet coefficients. She naturally got invloved and interested in signal processing tech- niques in order to improve the detection of low signal-to-noise ra- tios such as those associated with secondary anisotropies and sepa- rate them from the primary signal. EURASIP Journal on Applied Signal Processing 2005:15, 2426–2436 c 2005 Hindawi Publishing Corporation

Detection of Point Sources on Two-Dimensional Images Based on Peaks

M. Lopez-Caniego´ Instituto de F´ısica de Cantabria, CSIC-Universidad de Cantabria, and Departamento de F´ısica Moderna, Universidad de Cantabria, avenida de los Castros s/n, 39005 Santander, Spain Email: [email protected] D. Herranz Istituto di Scienze e Tecnologie dell’Informazione “A. Faedo,” CNR, via Moruzzi 1, 56124 Pisa, Italy Email: [email protected]

J. L. Sanz Instituto de F´ısica de Cantabria, CSIC-Universidad de Cantabria, avenida de los Castros s/n, 39905 Santander, Spain Email: [email protected]

R. B. Barreiro Instituto de F´ısica de Cantabria, CSIC-Universidad de Cantabria, avenida de los Castros s/n, 39905 Santander, Spain Email: [email protected]

Received 8 June 2004; Revised 7 February 2005

This paper considers the detection of point sources in two-dimensional astronomical images. The detection scheme we propose is based on peak statistics. We discuss the example of the detection of far galaxies in cosmic microwave background experiments throughout the paper, although the method we present is totally general and can be used in many other fields of data analysis. We consider sources with a Gaussian profile—that is, a fair approximation of the profile of a point source convolved with the detector beam in microwave experiments—on a background modeled by a homogeneous and isotropic Gaussian random field characterized by a scale-free power spectrum. Point sources are enhanced with respect to the background by means of linear filters. After filtering, we identify local maxima and apply our detection scheme, a Neyman-Pearson detector that defines our region of acceptance based on the a priori pdf of the sources and the ratio of number densities. We study the different performances of some linear filters that have been used in this context in the literature: the Mexican hat wavelet, the matched filter, and the scale-adaptive filter. We consider as well an extension to two dimensions of the biparametric scale-adaptive filter (BSAF). The BSAF depends on two parameters which are determined by maximizing the number density of real detections while fixing the number density of spurious detections. For our detection criterion the BSAF outperforms the other filters in the interesting case of white noise. Keywords and phrases: analytical methods, data analysis methods, image processing techniques.

1. INTRODUCTION The CMB is the remnant of the radiation that filled the universe immediately after the big bang. This weak radiation A very challenging aspect of data analysis in astronomy is the can provide us with answers to one of the most important detection of pointlike sources embedded in one- and two- set of questions asked in modern science—how the universe dimensional images. Some common examples are the sep- did begin, how it evolved to the state we observe today, and aration of individual stars in crowded optical images, the how it will continue to evolve in the future. Unfortunately, we identification of emission and absorption lines in noisy one- do not measure the CMB alone but a mixture of it with in- dimensional spectra, and the detection of faint extragalactic strumental noise and other astrophysical radiations that are objects at microwave frequencies. This latter case, for exam- usually referred to as foregrounds. ple, is one of the most critical issues for the new generation of Some foregrounds are due to our own galaxy, for exam- experiments that observe the cosmic microwave background ple, the thermal emission due to dust grains in the galactic (CMB). plane or the synchrotron emission by relativistic electrons Detection of Point Sources 2427 moving along the galactic magnetic field. These foregrounds ff appear as di use emission in the sky, and their spectral be- 5 haviors (the way the emission scales from one wavelength of observation to another) are reasonably well known. An- 200 other foreground with a well-known spectral behavior is the Sunyaev-Zel’dovich effect, which is due to the hot gas con- tained in galaxy clusters that distorts the energy distribution 100 of CMB photons. Foreground emissions carry information 0 about the galaxy structure, composition, and physical pa- (degrees) y rameters as well as about the number, distribution, and evo- lution of galaxy clusters that map the distribution of mat- 0 ter in the universe. Therefore, the study of the different fore- grounds has great scientific relevance by itself. In order to properly study the CMB and the different foregrounds, it −5 is mandatory to separate the signals (components) that are −100 mixed in the observations. This can be done by observing −5 0 5 the sky at a number of frequencies at least as big as the num- x (degrees) ber of components and then applying some statistical compo- nent separation method in order to recover the different as- Figure 1: Residual map of a 12.8 × 12.8 square degrees sky patch trophysical signals. Several component separation techniques at 30 GHz after the application of a maximum entropy compo- have been suggested, including blind (Baccigalupi et al. [1], nent separation. The residual map is obtained by subtracting from the 30 GHz map the different components (CMB and foregrounds) Maino et al. [2], Delabrouille et al. [3]), semi-blind (Bedini et given by the maximum entropy algorithm. Bright point sources ap- al. [4]) and nonblind (Hobson et al. [5], Bouchet and Gispert pear as spots in the images whereas faint point sources are masked [6], Stolyarov et al. [7], Barreiro et al. [8]) approaches. by the residual noise. Another important foreground is due to the emission of far galaxies. Since the typical angular size of the galaxies in the sky is a few arcseconds and the angular resolution of the lar power spectrum and hampering the statistical study (e.g., microwave detectors is typically greater than a few arcmin- the study of Gaussianity) of CMB and other foregrounds at utes,1 galaxies appear as points to the detector, which is un- such scales. Moreover, while there are good galaxy surveys at able to resolve their inner structure. Therefore, they are usu- radio and infrared frequencies, the microwave window of the ally referred to as extragalactic point sources (EPS) in the CMB electromagnetic spectrum is a practically unknown zone for jargon. Note that, however, they do not appear as points in extragalactic astronomy. Therefore, it is important to have the images but as the convolution of a pointlike impulse with detection techniques that are able to detect EPS with fluxes the angular response of the detector (beam). The instruments as low as possible. (radiometers and bolometers) that are used in CMB experi- One possibility is to consider the EPS emission at each ments have angular responses that are approximately Gaus- frequency as an additional noise to be considered in the equa- sian and therefore EPS appear as small Gaussian (or nearly tions of a statistical component separation method. Once the Gaussian) spots in the images.2 algorithm has separated the different components, the resid- The problem with EPS is that galaxies are a very hetero- ual that is obtained by subtracting the output foregrounds geneous bundle of objects, from the radio galaxies that emit from the original data should contain the EPS plus the in- most of their radiation in the low-frequency part of the elec- strumental noise and some amount of foreground residuals tromagnetic spectrum to the dusty galaxies that emit mainly that remain due to a nonperfect separation. As an example, in the infrared (Toffolatti et al. [9], Guiderdoni et al. [10], Figure 1 shows the residual at 30 GHz after applying a max- Tucci et al. [11]). This makes it impossible to consider all of imum entropy component separation algorithm (Hobson et them as a single foreground to be separated from the other by al. [12]) to a 12.8×12.8 square degrees simulated sky patch as means of multiwavelength observations and statistical com- would be observed by the Planck satellite. The brightest point ponent separation techniques. EPS constitute an important sources can be clearly observed over the residual noise. How- contaminant in CMB studies at small angular scales (Toffo- ever, fainter point sources are still masked by a residual noise latti et al. [9]), affecting the determination of the CMB angu- that is approximately Gaussian and must be detected some- how. Besides, the situation is more complex because the pres- ence of bright EPS in the data affects the performance of the 1For example, the upcoming ESA’s Planck satellite will have angular res- component separation algorithms so the recovered compo- olutions ranging from 5 arcminutes (for the 217–857 GHz channels) to 33 nents are contaminated by point sources in a way that is dif- arcminutes (for the 30 GHz channel). ficult to control. Therefore, any satisfactory method should 2 It is also common to speak of compact sources, describing a source that is detect and extract at least the bright sources before the com- comparable to the size of the beam being used. Non-pointlike sources (such as large galaxy clusters with arcminute angular scales) will have more com- ponent separation. Then, after separation some additional plicated responses when convolved with a beam, but if the source profile is low intensity EPS could be detected from the residual maps known, it is always possible to apply the methods presented in this work. such as the one in Figure 1. 2428 EURASIP Journal on Applied Signal Processing

Several techniques based on linear filters have been pro- In Section 4 we briefly review some of the linear filters pro- posed in the literature for the detection of point sources in posed in the literature. In Section 5 we describe a probability CMB data. Linear filtering techniques are suitable for this distribution of sources that is of interest and compare the problem because they can isolate structures with a given performance of the filters, regarding our choice of detector. characteristic scale, as is the case of pointlike sources, while Finally, in Section 6 we summarize our results. canceling the contribution of diffuse foregrounds. Among the methods proposed in the literature, we emphasize the Mexican hat wavelet (MHW, Cayon´ et al. [13], Vielva et al. 2. PEAK STATISTICS [14, 15, 16]), the classic matched filter (MF, Tegmark and de In this section we will study the statistics of peaks for a two- Oliveira-Costa [17]), the adaptive top hat filter (Chiang et al. dimensional Gaussian background in both the absence and [18]), and the scale-adaptive filter (SAF, Sanz et al. [19], Her- presence of a source. We will focus on three quantities that ranz et al. [20]). Moreover, linear filters can be used in com- define the properties of the peaks: the intensity of the field, bination with statistical component separation techniques in the curvature, and the shear at the position of the peak. The order to produce a more accurate separation of the different first quantity gives the amplitude of the peak. The curvature foregrounds (Vielva et al. [15]). and the shear give information about the spatial structure of The goal of filtering is to enhance the contrast between the peak and are related to its sharpness and eccentricity, re- the source to be detected and the background that masks it. spectively. For example, if we filter the image in Figure 1, assuming that the background can be characterized by a white noise, with 2.1. Background the well-known matched filter (see Section 4.1) at the scale of the 30 GHz detector beam (FWHM = 33 arcminutes) the We consider a two-dimensional (2D) background repre- signal-to-noise ratio of the sources increases by more than sented by a Gaussian random field ξ(x )withaveragevalue 25%. Therefore, a source whose signal-to-noise ratio was ∼ 3 ξ(x )=0andpowerspectrumP(q), before filtering becomes a source with signal-to-noise ratio ∼ 4 and will be easier to detect.    ∗  2    After filtering, a detection rule is applied to the data in ξ(Q)ξ (Q ) = P(q)δD(Q − Q ), q ≡|Q|,(1) order to decide whether the source is present or not. The usual detection approach in astronomy is thresholding:for  3 2 where ξ(Q) is the Fourier transform of ξ(x ) and δD is the any given candidate (e.g., a local peak in the data), a posi- Dirac distribution in 2D. tive detection is considered if the candidate has a signal-to- We are interested in the distribution of maxima of the noise ratio greater than a certain threshold (in many astro- background with respect to the three variables already men- nomical applications, a typical value of this threshold is 5σ). tioned: intensity, curvature, and shear. We define the normal- This naive approach works fine for bright sources, but weak ized field intensity ν, the normalized curvature κ, and the nor- sources can be easily missed. malized shear  as More sophisticated detection schemes can use additional information in order to improve the detection. If the detec- ξ λ λ λ − λ tion is performed by means of the study of the statistics of ν ≡ κ ≡ 1 + 2  ≡ 1 2 σ , σ , σ ,(2) maxima in the images, such information includes not only 0 2 2 2 the amplitude of the maxima but also spatial information re- lated to the source profile, for example, the derivatives of the where ν ∈ (−∞, ∞), κ ∈ [0, ∞),  ∈ [0, κ/2), λ1 and λ2 are intensity. In our approach we will consider the amplitude, the the eigenvalues of the negative Hessian matrix, and the σn are curvature, and the shear of the sources (the last two quanti- defined as ties are given by the properties of the beam in the case of  point sources) to discriminate between maxima of the back- ∞ 2 1 1+2n ground and real sources. Moreover, in some cases a priori σn ≡ dq q P(q). (3) 2π information on the distribution of intensity of the sources 0 is known. We will therefore use a Neyman-Pearson detec- σ tor that uses the three above-mentioned elements of infor- The moment 0 is equal to the dispersion of the field. mation (amplitude, curvature, and shear) of the maxima as The distribution of maxima of the background in one well as the a priori probability distribution of the sources. dimension (1D) with respect to the intensity and curvature This technique has been successfully tested in images of one- (the shear is not defined in 1D) was studied by Rice [23]. dimensional fields (Lopez-Caniego´ et al. [21, 22]). In this If we generalize it to 2D, including the shear, the expected work we will generalize it to two dimensions. The overview of this work is as follows. In Section 2 we 3 describe the statistics of the peaks for a two-dimensional Throughout this paper we will use the following notation for the Fourier transform: the same symbol will be used for the real space and the Gaussian background in the absence and presence of a Fourier space versions of a given function. The argument of the function will source. In Section 3 we introduce the detection problem, specify in each case which is the space we are referring to. For instance, f (q) define the region of acceptance, and derive our detector. will be the Fourier transform of the function f (x). Detection of Point Sources 2429 number density of maxima per intervals (x,x + dx ), (ν, ν + 3. THE DETECTION PROBLEM dν κ κ dκ   d ), ( , + ), and ( , + )isgivenby Equations (4)and(8) can be used to decide whether a source √   ispresentornotinadataset.Thetoolthatallowsustode- 8 3nb − / ν2− 2− κ−ρν 2/ −ρ2 nb(ν, κ, ) =   κ2 − 42 e (1 2) 4 ( ) 2(1 ), cide whether a point source is present or not in the data is π 1 − ρ2 called a detector. In this section we will describe the Neyman- (4) Pearson detector (NPD). We will study its performance in where nb is the expected total number density of maxima terms of two quantities: the number of true detections and (i.e., number of maxima per unit area dx ), the number of false (spurious) detections that emerge from the detection process. Our approach fixes the number density 1 nb ≡ √ ,(5)of spurious detections and determines the number density of π θ2 4 3 m true detections in each case. ρ θ and and m are defined as 3.1. The region of acceptance √ σ σ2 θ √ σ We consider a peak in the 2D dataset characterized by the 1 1 m 0 θm ≡ 2 , ρ ≡ = , θc ≡ 2 . (6) normalized amplitude, curvature, and shear (ν, κ, ). The σ2 σ0σ2 θc σ1 number density of background maxima nb(ν, κ, ) represents In the previous equations θc and θm are the coherence scale the null hypothesis H0 that the peak is due to the background of the field and maxima, respectively. The formula in (4)can in the absence of source. Conversely, the local number den- be derived from previous works (Bond and Efstathiou [24], sity of maxima n(ν, κ, ) represents the alternative hypoth- Barreiro et al. [25]). esis, that the peak is due to the source added to the back- ground. The local number density of maxima n(ν, κ, )can 2.2. Background plus point source be calculated as  To the previous 2D background we add a source with a ∞     known spatial profile τ(x ) and an amplitude A, so that the n(ν, κ, ) = dνs p νs n ν, κ, |νs . (10) 0 intensity due to the source at a given position x0 is ξs(x ) = Aτ x − x ( 0). For simplicity, we will consider a spherical Gaus- In the last equation we have used the a priori probability sian profile given by p(νs) that gives the amplitude distribution of the sources.  R ν κ  ν κ  2 We can associate to any region ∗( , , ) in the ( , , ) x ∗ ∗ τ(x) = exp − , x ≡|x|,(7)parameterspacetwonumberdensitiesnb and n , 2R2  R ∗ where is the Gaussian width (in the case of point sources nb = dν dκd nb(ν, κ, ), R R∗ convolved with a Gaussian beam, is the beam width). We  (11) 4 could easily consider other functional profiles without any n∗ = dν dκd n(ν, κ, ), loss of generality. The expected number density of maxima R∗ per intervals (x,x+dx ), (ν, ν+dν), (κ, κ+dκ), and (, +d), ∗ given a source of amplitude A in such spatial interval, is where nb is the expected number density of spurious sources, R ν κ    that is, due to the background, in the region ∗( , , ), n∗ n ν, κ, |νs whereas is the number density of maxima expected in the √ ν κ    same region of the ( , , ) space in the presence of a local 8 3nb − / ν−ν 2− 2− κ− κ −ρ ν−ν 2/ −ρ2 =   κ2 −42 e (1 2)( s) 4 ( 2 s ( s)) 2(1 ), source. The region R∗ will be called the region of acceptance. π −ρ2 1 In order to define the region of acceptance R∗ that gives (8) the highest number density of detections n∗ for a given ∗ number density of spurious detections nb ,weconsidera where νs = A/σ0 is the normalized amplitude of the source, Neyman-Pearson detector (NPD) using number densities in-  κs =−Aτ /σ2 is the normalized curvature of the source, and steadofprobabilities τ is the second derivative of the source profile τ with respect to x at the position of the source. Note that in (8)weare n(ν, κ, ) L(ν, κ, ) ≡ ≥ L∗, (12) taking into account that the shear of the source is zero since nb(ν, κ, ) we are considering a spherical profile. It is useful to define a L quantity ys that is related to the curvature of the source: where ∗ is a constant. The proof follows the same approach as for the standard Neyman-Pearson detector. If L ≥ L∗ we 2  θmτ νs ys decide that the signal is present, whereas if L

where ϕ∗ is a constant and ϕ(ν, κ)isgivenby the background is statistically homogeneous and isotropic, we will consider spherically symmetric filters, 1 − ρys ys − ρ ϕ(ν, κ) ≡ aν + bκ, a ≡ , b ≡ . (14) 1 − ρ2 1 − ρ2    1 |x − b| Ψ(x; R, b) ≡ ψ . (18) R2 R We remark that the detector is independent of the shear . This is due to the fact that we are considering a source with a  spherical profile with shear s = 0. If the profile is not spher- If we filter our background with Ψ(x; R, b), the filtered field ical, the detector may depend on the shear. is    3.2. Spurious sources and real detections w(R, b) = dxξ(x )Ψ(x; R, b). (19) Given a region of acceptance R∗, we can calculate the num- ber density of spurious sources and the number density of detections as given by (11): The filter is normalized such that the amplitude of the source is the same after filtering: √  n ∞   n∗ = √3 b dκ κ2 − e−κ2 e−κ2/2 M  b π 1+ erfc( ), 2 0 dxτ(x )Ψ(x; R,0) = 1. (20) (15) ϕ∗ − ysκ M ≡    ; a − ρ2 2 1 For the filtered field, (3)becomes √  ∞   ∗ 3nb  n = √ dνs p νs ∞ π 2 1+2n 2 2 0 σn ≡ 2π dq q P(q)ψ (q). (21)  (16) ∞   0 × dκ κ2 − 1+e−κ2 e−(1/2)(κ−νs ys)2 erfc(Q), 0 Thevaluesofρ, θm, θc, and all the derived quantities change ρys − 1 accordingly. The curvature of the filtered source κs can be Q ≡ M + νs    . (17) 2 1 − ρ2 obtained through (9), taking into account that for the filtered source,

Our approach is to fix the number density of spurious de-  ∞ tections and then to determine the region of acceptance that  3 −τψ = π dq q τ(q)ψ(q). (22) gives the maximum number of true detections. This can be 0 ∗ done by inverting (15)toobtainϕ∗ = ϕ∗(nb /nb; ρ, ys). Once ϕ∗ is known, we can calculate the number density of detec- Note that the function ψ(q) will depend as well on the scal- tions using (16). ing R. As an application of the previous ideas, we study the detection of point sources characterized by a Gaussian pro- file τ(x) = exp(−x2/2R2), x =|x|, and Fourier transform 4. THE FILTERS τ(q) = R2 exp(−(qR)2/2). This is the case we find in CMB Detection can, in principle, be performed on the raw data, experiments, where the profile of the point source is given by but in most cases it is convenient to transform first the data the instrumental beam that can be approximated by a Gaus- in order to enhance the contrast between the distributions sian. nb(ν, κ, )andn(ν, κ, ). Hopefully, such an enhancement This profile introduces in a natural way the scale of the R will help the detector to give better results (namely, a higher source , the scale at which we filter. However, previous number of true detections). In this paper we will focus on the works in 1D using the MHW, MF, SAF, and BSAF have shown αR use of linear filters as a means to transform the data in such a that the use of a modified scale can significantly improve way. Filters are suitable for this task because background fluc- the number of detections (Cayon´ et al. [13], Vielva et al. tuations that have variation scales different from the source [14, 15], Lopez-Caniego´ et al. [21, 22]). Therefore, we gener- scale can be easily filtered out while preserving the sources. alize the functional form of these filters to 2D and allow for α Different filters will improve detection in different ways: this this additional degree of freedom . paper compares the performance of several filters. The fil- ter that gives the highest number density of detections, for 4.1. The matched filter  a fixed number density of spurious sources, will be the pre- We introduce a circularly symmetric filter Ψ(x; R, b). The fil- ferred filter among the considered filters. tered field is given by (19). Now, we express the conditions   We consider a filter Ψ(x; R, b), where R and b define a to obtain a filter for the detection of the source s(x) = Aτ(x) scaling and a translation respectively. Since the sources we are at the origin taking into account the fact that the source is considering are spherically symmetric and we assume that characterized by a single scale R0. We assume the following Detection of Point Sources 2431 conditions: where   (1) w(R0,0)=s(0) ≡ A, that is, w(R0, 0) is an unbiased α2 1 1 estimator of the amplitude of the source; N(α) = , (29)  ∆m πΓ(m) γ +(2t/m)∆ (2) the variance of w(R, b) has a minimum at the scale R0, that is, it is an efficient estimator. and where m and ∆ are defined as in (24), t is defined as in Then, the 2D filter satisfying these conditions is the so-called (27). The parameters of the filtered background and source matched filter.Asmentionedbefore,wewillallowthefilter are scale to be modified by a factor α.Ifα = 1 we have the well- known standard matched filter use in the literature. For a m H1 2 H1 ρ(α)=ρ=  , θm(α) = αR , source with a Gaussian profile, a scale-free power spectrum 1+m H H 1+m H P q ∝ q−γ 2 3 3 ( ) , and allowing the filter scale to vary through the α parameter, the modified matched filter is m H2 γ + c(1 + m)∆ ys(α) = ∆ , 1+m H3 γ + cm∆ γ −(1/2)z2 (30) ψMF(q) = N(α)z e , z ≡ qαR, (23) where c = 2t/m and where 2 2 H1 = γ +2γc(1 + m)+c (1 + m)(2 + m), 2+γ α2 1 1 2α2 m ≡ , N(α) = , ∆ = , (24) H = γ2 γcm c2m m 2 ∆m π Γ(m) (1 + α2) 2 +2 + (1 + ), (31) 2 2 H3 = γ +2γc(2 + m)+c (2 + m)(3 + m). and Γ is the standard Gamma function. The parameters of the filtered background and source are The corresponding threshold as compared to the stan- dard matched filter (α = 1) is

m 2 t− m ρ α =ρ= θ α =αR y α =ρ∆. ν α 2∆ (γ + cm∆) ( ) m, m( ) m, s( ) =  . 1+ 1+ ν H (32) (25) MF(α=1) 2 The corresponding threshold as compared to the stan- 4.3. The Mexican hat wavelet dard matched filter (α = 1) is The MHW is defined to be proportional to the Laplacian of ν(α) the Gaussian function in 2D real space = αt−2∆m, (26) ν α=   MF( 1) 2 −(1/2)x2 ψMHW(x) ∝ 1 − x e , x ≡|x|. (33) where Thus, in Fourier space we get the modified Mexican hat 2 − γ wavelet introducing the α parameter as follows: t ≡ . (27) 2 2 −(1/2)z2 ψMHW(q) = N(α)z e , z ≡ qαR, We remark that for the standard matched filter the cur-  ff 1 α 2 (34) vature does not a ect the region of acceptance and the linear N(α) = . detector ϕ(ν, κ) is reduced to ϕ = ν. π ∆

4.2. The scale-adaptive filter Thus, the filtered background and source parameters are The scale-adaptive filter (or optimal pseudo-filter) has been proposed by Sanz et al. [19]. The filter is obtained by impos- 2+t 2 ρ(α) = ρ = , θm(α) = αR , ing an additional condition to the conditions that define the 3+t 3+t MF: (35) 2 w R  R  ys(α) =  ∆, (3) ( , 0) has a maximum at ( 0, 0). (2 + t)(3 + t) −γ Considering a scale-free power spectrum, P(q) ∝ q ,a m ∆ t αR where and are defined as in (24)and is defined as in modified scale , and a Gaussian profile for the source, the (27). The corresponding threshold is functional form of the filter in 2D is

t ν α αt−2∆2 γ −(1/2)z2 2 2 ( ) =  . ψSAF(q) = N(α)z e γ + z , z ≡ qαR, (28) (36) m νMF(α=1) Γ(m)Γ(2 + t) 2432 EURASIP Journal on Applied Signal Processing

Table 1: Number density of detections n∗ for the BSAF and the standard MF (α = 1) with optimal values of c and α for different values of n∗ R ff ≡ − n∗ /n∗ b and . RD means relative di erence in number densities in percentage: RD 100( 1+ BSAF MF). R n∗ αcn∗ n∗ b BSAF MF RD(%) 0.005 0.5 −0.44 0.0507 0.0484 4.7 1.5 0.01 0.5 −0.46 0.0709 0.0620 14.3 0.005 0.4 −0.54 0.0396 0.0335 18.2 2 0.01 0.4 −0.54 0.0567 0.0406 39.6 2.5 0.005 0.3 −0.64 0.0320 0.0245 30.6

4.4. The biparametric scale-adaptive filter 5. ANALYTICAL RESULTS Lopez-Caniego´ et al. [21] have shown that removing condi- In this section we will compare the performance of the dif- tion (3) defining the SAF and introducing instead the condi- ferent filters previously introduced. We use as an example the tion interesting case of white noise as background. This is a fair  approximation to the case presented in Figure 1, where the (3) w(R , b) has a maximum at (R ,0) 0 0 sources are embedded in a background that is a combination leads to the new filter of instrumental noise (approximately Gaussian) and a small contribution of residual foregrounds that have not been per- τ(q)   fectly separated. For this example, we will consider sources ψ(q) ∝ 1+c(qR)2 , (37) P(q) with intensities distributed uniformly between zero and an upper cutoff. c The comparison of the filters is performed as follows. We where is an arbitrary constant related to the curvature of fix the number density of spurious detections, the same for the maximum. For the case of a scale-free spectrum, and al- all the filters. Then, for any given filter we calculate the quan- lowing for a modified scale αR, the filter is given by the pa- tities σn, ρ,andys. Using (15) it is possible to calculate the rameterized equation value of ϕ∗ that defines the region of acceptance. Then we   calculate the number density of real detections using (16). γ −(1/2)z2 2 ψBSAF(q) = N(α)z e 1+cz , z ≡ qαR, The filter that leads to the highest number density of detec- tions will be the preferred one. We do this for different values α2 (38) N α = 1 1 1 of α in order to test how the variation of the filtering scale af- ( ) m , ∆ π Γ(m) 1+cm∆ fects the number of detections. where m and ∆ aredefinedasin(24). We remark that c = 0 5.1. A priori probability distribution c ≡ t/mγ t leads to the MF, and if 2 ,with defined as in (27), As mentioned before, we will test a pdf of source intensities the BSAF becomes the SAF. The parameters of the filtered that is uniform in the interval 0 ≤ A ≤ Ac.Intermsofnor- background and source are malized intensities, we have the pdf

  1   m D1 2 D1 p ν = ν ∈ ν . ρ α =ρ =  θm α = αR s , s 0, c (42) ( ) , ( ) , νc 1+m D2 D3 1+m D3

ff m D2 1+c(1 + m)∆ We will consider a cuto in the amplitude of the sources such ys(α) = ∆ , ν = 1+m D 1+cm∆ that c 2 after filtering with the standard MF, that is, we will 3 ffi (39) focus on the case of faint sources that would be very di cult to detect if no filtering was applied. Note that while the value νc is different for each filter (because each filter leads to a where different dispersion σ0 of the filtered field), the distribution A 2 in source intensities is the same for all the cases. D1 = 1+2c(1 + m)+c (1 + m)(2 + m), 2 5.2. Results for white noise D2 = 1+2cm + c m(1 + m), (40) We want to find the optimal filter in the sense of the max- D = 1+2c(2 + m)+c2(2 + m)(3 + m). 3 imum number of detections. For the sources, we use a uniform distribution with amplitudes in the interval A ∈ The equivalent threshold is given by [0, 2]σ0,whereσ0 is the dispersion of the linearly filtered map with the standard MF. We focus on the interesting case of t− m ν(α) α 2∆ (γ + cm∆) white noise (γ = 0) and explore different values of n∗ and R. =  . (41) b νMF(α=1) D3 The results are summarised in Table 1. Detection of Point Sources 2433

0.06 0.08

0.06 n∗ 0.04 n∗

0.04

0.02 0.02 0.4 0.6 0.8 1 1.2 1.4 . . . . . 0 4 0 6 0 8 1 1 2 1 4 α α MF MF SAF SAF BSAF BSAF

Figure 2: The expected number density of detections n∗ as a func- Figure 3: The expected number density of detections n∗ as a func- tion of α for γ = 0fortheBSAF(c has been obtained by maximizing tion of α for γ = 0fortheBSAF(c has been obtained by maximising the number of detections for each value of α), MF, and SAF filters. the number of detections for each value of α), MF, and SAF filters. ∗ R = n∗ = . We consider the case R = 1.5, nb = 0.01. We consider the case 2, b 0 01.

We study the performance of the different filters as a follow what would be found in a real image. Therefore, we function of α. This allows us to test how the variation of the present the results only for those values of α such that αR is natural scale of the filters helps the detection. In the case of at least ∼ 1. the BSAF, which has an additional free parameter, c in (38), for each value of α we determine numerically the value of c 6. CONCLUSIONS that gives the highest number of detections. Then the BSAF ∗ with such c parameter (i.e., a function of α, nb ,andR)is Severaltechniqueshavebeenintroducedintheliterature compared with the other filters. to detect point sources in two-dimensional images. Exam- In Figure 2, we plot the expected number density of de- ples of point sources in astronomy are far galaxies as de- tections n∗ for different values of α, R = 1.5 pixels, and tected by CMB experiments. An approach that has been thor- ∗ nb = 0.01. Note that for the 2D case the MHW and SAF are oughly used in the literature for this case consists in linear the same filter for γ = 0, and we have only included the latter filtering the data and applying detectors based on thresh- in our figures. In this case, the curve for the BSAF always goes olding. Such approach uses only information on the ampli- above the other filters. The maximum number of detections tude of the sources: the potentially useful information con- is found for small values of α. In this region, the improve- tained in the local spatial structure of the peaks is not used ment of the BSAF with respect to the standard matched filter at all. In our work, we design a detector based on peak is of order 15%. statistics that uses the information contained in the am- In Figure 3, we show the results for R = 2. We have in- plitude, curvature, and shear of the maxima. These quan- creased the beam width as compared to the previous example tities describe the local properties of the maxima and are and left unchanged the number density of false detections. used to distinguish statistically between peaks due to back- The BSAF outperforms all the other filters, and for small val- ground fluctuations and peaks due to the presence of a ues of α the improvement is of order 40%. Note that in source. this figure the MF takes values α ∈ [0, 1]. For greater values We derive a Neyman-Pearson detector (NPD) that con- ∗ of α,withR = 2andnb = 0.01, we cannot solve for ϕ∗ in the siders number densities of peaks which leads to a sufficient implicit equation (15) and cannot calculate n∗. detector that, in the case of the spherically symmetric sources We remark that filtering at scales much smaller than the that we consider, is linear in the amplitude and curvature of scale of the pixel does not make sense. This is due to the fact the sources. For this particular case, then, the information of that we are not including the effect of the pixel in our the- the shear of the peaks is not relevant. In other cases, however, oretical calculations and, thus, the results would not exactly it could be useful. 2434 EURASIP Journal on Applied Signal Processing

It is a common practice in astronomy to linear filter the The criterion for detection can be written as  images in order to enhance very faint point sources and help ∞     the detection. The best filter would be the one that makes it Ł(ν, κ) ≡ dνs p νs L ν, κ|νs ≥ L∗,(A.2) easier to distinguish between peaks coming from the back- 0 ground alone and those due to the presence of a source, ac- where L∗ is a constant. L is a function of ϕ, cording to the information used by the detector. In the case of 1 − ρys ys − ρ simple thresholding, which considers only the amplitude of ϕ(ν, κ) ≡ aν + bκ, a = , b = . (A.3) the peaks, the answer to the question of which is the best filter 1 − ρ2 1 − ρ2 (in the previous sense) is well known: the standard matched By differentiating L with respect to ϕ we find that filter. But in the case of the Neyman-Pearson detector, which  ∞   considers other things apart from mere amplitudes, this is no ∂L ϕν − / ν2 ρν − κ 2 = dνs p νs νse s (1 2)( s +( s 2 s) ) ≥ 0, (A.4) longer true. ∂ϕ 0 We have compared three commonly used filters in the lit- erature in order to assess which one of them performs bet- and therefore setting a threshold in L is equivalent to setting ter when detecting sources with our scheme. In addition, we a threshold in ϕ: have designed a filter such that it optimizes the number of ν κ ≥ L ⇐⇒ ϕ ν κ ≥ ϕ true detections for a fixed number of spurious sources. The Ł( , ) ∗ ( , ) ∗,(A.5) optimization of the number of true detections is performed where ϕ(ν, κ)isgivenby(A.3)andϕ∗ is a constant. by using the a priori pdf of the amplitudes of the sources. This filter depends on two free parameters and it is therefore called biparametric scale-adaptive filter (BSAF). By construc- ACKNOWLEDGMENTS tion, the functional form of the BSAF includes the standard The authors thank Patricio Vielva for useful discussions. MF as a particular case and its performance in terms of num- Lopez-Caniego´ thanks the Ministerio de Ciencia y Tec- beroftruedetectionsforafixednumberofspuriousdetec- nolog´ıa (MCYT) for a predoctoral FPI fellowship. Barreiro tions must be at least as good as the standard MF’s one. thanks the MCYT and the Universidad de Cantabria for Following the work done in the 1D case, we generalize the aRamon´ y Cajal contract. Herranz acknowledges support functional form of the filters to 2D and introduce an extra from the European Community’s Human Potential Pro- degree of freedom α thatwillallowustofilteratdifferent gramme under contract HPRN-CT-2000-00124, CMBNET, scales αR,whereR is the scale of the source. This significantly and from an ISTI fellowship since September 2004. We ac- improves the results. knowledge partial support from the Spanish MCYT project We have considered an interesting case, a uniform distri- ESP2002-04141-C03-01 and from the EU Research Train- bution of weak sources with amplitudes A ∈ [0, 2]σ0,where ing Network “Cosmic Microwave Background in Europe for σ0 is the dispersion of the field filtered with the standard Theory and Data Analysis.” matched filter, embedded in white noise (γ = 0). We have tested different values of the source size R and of the number REFERENCES n∗ density of spurious detections b that we fix. We find that [1] C. Baccigalupi, L. Bedini, C. Burigana, et al., “Neural net- the BSAF improves the number density of detections up to works and separation of cosmic microwave background and 40% with respect to the standard MF (α = 1) for certain astrophysical signals in sky maps,” Monthly Notices of the Royal cases. Note that since the Neyman-Pearson detector for the Astronomical Society, vol. 318, no. 3, pp. 769–780, 2000. standard MF (α = 1) defaults to the classic thresholding de- [2] D. Maino, A. Farusi, C. Baccigalupi, et al., “All-sky astrophys- tector that is commonly used in astronomy, the results of this ical component separation with fast independent component analysis (FASTICA),” Monthly Notices of the Royal Astronomi- work imply that it is possible, under certain circumstances, to cal Society, vol. 334, no. 1, pp. 53–68, 2002. detect more point sources than in the classical approach. [3] J. Delabrouille, J. F. Cardoso, and G. Patanchon, “Multidetec- The generalization of these ideas to other source profiles tor multicomponent spectral matching and applications for and non-Gaussian backgrounds is relevant and will be dis- cosmic microwave background data analysis,” Monthly Notices cussed in a future work. of the Royal Astronomical Society, vol. 346, no. 4, pp. 1089– 1102, 2003. [4] L. Bedini, D. Herranz, E. Salerno, C. Baccigalupi, E. E. APPENDIX Kuruoglu,˘ and A. Tonazzini, “Separation of correlated as- trophysical sources using multiple-lag covariance matrices,” We will show in this appendix that ϕ(ν, κ) ≥ ϕ∗ given in (13) EURASIP Journal on Applied Signal Proccessing, vol. 2005, is a sufficient linear detector, that is, the detector is linear in no. 15, pp. 2400–2412, 2005, Special issue on applications of ν κ signal processing in astrophysics and cosmology. the threshold and the curvature and the data it uses is a [5] M. P. Hobson, A. W. Jones, A. N. Lasenby, and F. R. Bouchet, sufficient statistic to decide if a peak is a source (independent “Foreground separation methods for satellite observations of of the a priori probability P(νs)). The ratio L(ν, κ, |νs) ≡ the cosmic microwave background,” Monthly Notices of the n(ν, κ, |νs)/nb(ν, κ, ) can be explicitly written as Royal Astronomical Society, vol. 300, no. 1, pp. 1–29, 1998. [6] F. R. Bouchet and R. Gispert, “Foregrounds and CMB experi-   ments I. Semi-analytical estimates of contamination,” New As- ϕν − / ν2 ρν − κ 2 L ν, κ, |νs = e s (1 2)( s +( s 2 s) ). (A.1) tronomy, vol. 4, no. 6, pp. 443–479, 1999. Detection of Point Sources 2435

[7] V. Stolyarov, M. P. Hobson, M. A. J. Ashdown, and A. the Neyman-Pearson detector,” Monthly Notices of the Royal N. Lasenby, “All-sky component separation for the Planck Astronomical Society, vol. 359, pp. 993–1006, 2005. mission,” Monthly Notice of the Royal Astronomical Society, [23] S. O. Rice, “Mathematical analyses of random noise,” in Se- vol. 336, no. 1, pp. 97–111, 2002. lected Papers on Noise and Stochastic Processes, N. Wax, Ed., [8]R.B.Barreiro,M.P.Hobson,A.J.Banday,etal.,“Foreground Dover Publications, New York, NY, USA, 1954. separation using a flexible maximum-entropy algorithm: an [24] J. R. Bond and G. Efstathiou, “The statistics of cosmic back- application to COBE data,” Monthly Notices of the Royal As- ground radiation fluctuations,” Monthly Notices of the Royal tronomical Society, vol. 351, no. 2, pp. 515–540, 2004. Astronomical Society, vol. 226, pp. 655–687, 1987. [9] L. Toffolatti, F. Argueso,¨ G. De Zotti, et al., “Extragalactic [25]R.B.Barreiro,J.L.Sanz,E.Mart´ınez-Gonzalez,´ L. Cayon,´ and source counts and contributions to the anisotropies of the J. Silk, “Peaks in the cosmic microwave background: flat ver- cosmic microwave background: predictions for the Planck sus open models,” The Astrophysical Journal, vol. 478, part 1, Surveyor mission,” Monthly Notices of the Royal Astronomical no. 1, pp. 1–6, 1997. Society, vol. 297, no. 1, pp. 117–127, 1998. [10] B. Guiderdoni, E. Hivon, F. R. Bouchet, and B. Maffei, “Semi- analytic modelling of galaxy evolution in the IR/submm M. Lopez-Caniego´ received his M.S. degree range,” Monthly Notices of the Royal Astronomical Society, in physics from the Universidad Autonoma vol. 295, no. 4, pp. 877–898, 1998. de Madrid, Madrid, Spain, in 2000, after [11] M. Tucci, E. Mart´ınez-Gonzalez,´ L. Toffolatti, J. Gonzalez-´ completing as part of his degree one year Nuevo, and G. De Zotti, “Predictions on the high-frequency at the University of Frankfurt, Germany. polarization properties of extragalactic radio sources and im- He was a Research Fellow at Bell Labs, Lu- plications for polarization measurements of the cosmic mi- cent Technologies, New Jersey, USA, dur- crowave background,” Monthly Notices of the Royal Astronom- ing 2001, before he continued his postgrad- ical Society, vol. 349, no. 4, pp. 1267–1277, 2004. uate studies at the Universidad Autonoma ff ff [12]M.P.Hobson,R.B.Barreiro,L.To olatti, et al., “The e ect de Madrid in 2002. At the end of this year of point sources on satellite observations of the cosmic mi- he joined a research group at the Instituto de Fisica de Cantabria crowave background,” Monthly Notices of the Royal Astronom- (CSIC-UC), where he is currently pursuing his Ph.D. in the field of ical Society, vol. 306, no. 1, pp. 232–246, 1999. image processing under an MCYT FPI predoctoral fellowship. His [13] L. Cayon,J.L.Sanz,R.B.Barreiro,etal.,“Isotropicwavelets:a´ research interests are in the areas of gravitational lensing and cos- powerful tool to extract point sources from cosmic microwave background maps,” Monthly Notices of the Royal Astronomical mic microwave background astronomy and the development and Society, vol. 315, no. 4, pp. 757–761, 2000. application of statistical signal processing techniques to astronom- [14] P. Vielva, E. Mart´ınez-Gonzalez,´ L. Cayon,J.M.Diego,J.L.´ ical data, in particular, linear and nonlinear filtering for the detec- Sanz, and L. Toffolatti, “Predicted Planck extragalactic point- tion and extraction of extragalactic point sources. source catalogue,” Monthly Notices of the Royal Astronomical Society, vol. 326, no. 1, pp. 181–191, 2001. D. Herranz received the B.S. degree in 1995 [15] P. Vielva, R. B. Barreiro, M. P. Hobson, et al., “Combining and the M.S. degree in 1995 from the Uni- maximum-entropy and the Mexican hat wavelet to recon- versidad Complutense de Madrid, Madrid, struct the microwave sky,” Monthly Notices of the Royal As- Spain, and the Ph.D. degree in astrophysics tronomical Society, vol. 328, no. 1, pp. 1–16, 2001. from Universidad de Cantabria, Santander, [16] P. Vielva, E. Mart´ınez-Gonzalez,´ J. E. Gallegos, L. Toffolatti, Spain, in 2002. He was a CMBNET Post- and J. L. Sanz, “Point source detection using the Spheri- doctoral Fellow at the Istituto di Scienza cal Mexican Hat Wavelet on simulated all-sky Planck maps,” e Tecnologie dell’Informazione “A. Faedo” Monthly Notices of the Royal Astronomical Society, vol. 344, (CNR), Pisa, Italy, from 2002 to 2004. He no. 1, pp. 89–104, 2003. is currently at the Instituto de Fisica de [17] M. Tegmark and A. de Oliveira-Costa, “Removing point Cantabria, Santander, Spain, under an MEC Juan de la Cierva con- sources from cosmic microwave background maps,” The As- tract. His research interests are in the areas of cosmic microwave trophysical Journal Letters, vol. 500, no. 2, pp. L83–L86, 1998. background astronomy and extragalactic point source statistics as [18]L.Y.Chiang,H.E.Jørgensen,I.P.Naselsky,P.D.Naselsky,I. well as the application of statistical signal processing to astronom- D. Novikov, and P. R. Christensen, “An adaptive filter for the ical data, including blind source separation, linear and nonlinear PLANCK compact source catalogue construction,” Monthly data filtering, and statistical modeling of heavy-tailed processes. Notices of the Royal Astronomical Society, vol. 335, no. 4, pp. 1054–1060, 2002. J. L. Sanz received the M.S. degree in [19] J. L. Sanz, D. Herranz, and E. Mart´ınez-Gonzalez,´ “Optimal 1971 from the Universidad Complutense detection of sources on a homogeneous and isotropic back- de Madrid, Spain, and the Ph.D. degree ground,” The Astrophysical Journal, vol. 552, no. 2, pp. 484– in physical sciences from Universidad Au- 492, 2001. [20] D. Herranz, J. L. Sanz, R. B. Barreiro, and E. Mart´ınez- tonoma de Madrid, Spain, in 1976. He was a Gonzalez,´ “Scale-adaptive filters for the detection/separation Postdoctoral Fellow at the Queen Mary Col- of compact sources,” The Astrophysical Journal, vol. 580, no. 1, lege, London, UK, during 1978. He is cur- pp. 610–625, 2002. rently Professor of astrophysics at the In- [21] M. Lopez-Caniego,D.Herranz,R.B.Barreiro,andJ.L.Sanz,´ stituto de Fisica de Cantabria, Santander, “A Bayesian approach to filter design: detection of compact Spain. His research interests are in the areas sources,” in Computational Imaging II, vol. 5299 of Proceedings of cosmic microwave background astronomy (extragalactic point of SPIE, San Jose, Calif, USA, January 2004. sources, component separation, and non-Gaussian studies) as well [22] M. Lopez-Caniego,´ J. L. Sanz, D. Herranz, and R. B. Barreiro, as the application of statistical signal processing and image analysis “Filter design for the detection of compact sources based on to astronomical data (linear and nonlinear data filtering, fusion). 2436 EURASIP Journal on Applied Signal Processing

R. B. Barreiro obtained her B.S. degree in 1995 from the Universidad de Santiago de Compostela, Spain, completing also as part of her degree one year at the Uni- versity of Sheffield, UK. She completed her Ph.D. in astrophysics in the Universi- dad de Cantabria in 1999. After her Ph.D., she worked as a Research Associate at the Cavendish Laboratory of the University of Cambridge, UK, until the end of 2001. She is currently at the Instituto de Fisica de Cantabria (CSIC-UC), Spain, under a Ramon´ y Cajal contract. Her research interests are mainly focused in the field of the cosmic microwave background, including statistical data analysis, in particular the study of the Gaussianity of the CMB, and the development of component sepa- ration techniques for both diffuse emissions and compact sources. EURASIP Journal on Applied Signal Processing 2005:15, 2437–2454 c 2005 Hindawi Publishing Corporation

Blind Component Separation in Wavelet Space: Application to CMB Analysis

Y. Moudden DAPNIA/SEDI-SAP, CEA/Saclay, 91191 Gif-sur-Yvette, France Email: [email protected]

J.-F. Cardoso CNRS, Ecole´ National Superieure´ des T´el´ecommunications, 46 rue Barrault, 75634 Paris, France Email: [email protected]

J.-L. Starck DAPNIA/SEDI-SAP, CEA/Saclay, 91191 Gif-sur-Yvette, France Email: [email protected]

J. Delabrouille CNRS/PCC, Coll`ege de France, 11 place Marcelin Berthelot, 75231 Paris, France Email: [email protected]

Received 30 June 2004; Revised 22 November 2004

It is a recurrent issue in astronomical data analysis that observations are incomplete maps with missing patches or intentionally masked parts. In addition, many astrophysical emissions are nonstationary processes over the sky. All these effects impair data processing techniques which work in the Fourier domain. Spectral matching ICA (SMICA) is a source separation method based on spectral matching in Fourier space designed for the separation of diffuse astrophysical emissions in cosmic microwave background observations. This paper proposes an extension of SMICA to the wavelet domain and demonstrates the effectiveness of wavelet- based statistics for dealing with gaps in the data. Keywords and phrases: blind source separation, cosmic microwave background, wavelets, data analysis, missing data.

1. INTRODUCTION constrain these models as well as to measure the cosmologi- cal parameters describing the matter content, the geometry, The detection of cosmic microwave background (CMB) and the evolution of our universe [6]. anisotropies on the sky has been over the past three decades a Accessing this information, however, requires disentan- subject of intense activity in the cosmology community. The gling in the data the contributions of several distinct astro- CMB, discovered in 1965 by Penzias and Wilson, is a relic ra- physical sources, all of which emit radiation in the frequency diation emitted some 13 billion years ago, when the universe range used for CMB observations [7]. This problem of com- was about 370 000 years old. Small fluctuations of this emis- ponent separation, in the field of CMB studies, has thus been sion, tracing the seeds of the primordial inhomogeneities the object of many dedicated studies in the past. which gave rise to present large scale structures as galaxies To first order, the total sky emission can be modeled as and clusters of galaxies, were first discovered in the observa- a linear superposition of a few independent processes. The tions made by COBE [1] and further investigated by a num- observation of the sky in direction (θ, ϕ)withdetectord is ber of experiments among which Archeops [2], boomerang then a noisy linear mixture of Nc components: [3], maxima [4], and WMAP [5]. The precise measurement of these fluctuations is of ut- Nc x ϑ ϕ = A s ϑ ϕ n ϑ ϕ most importance to cosmology. Their statistical properties d( , ) dj j ( , )+ d( , ), (1) (spatial power spectrum, Gaussianity) strongly depend on j=1 the cosmological scenarios describing the properties and where sj is the emission template for the jth astrophysi- evolution of our universe as a whole, and thus permit to cal process, herein referred to as a source or a component. 2438 EURASIP Journal on Applied Signal Processing

The coefficients Adj reflect emission laws while nd accounts Blind component separation (and in particular estima- for noise. When Nd detectors provide independent observa- tion of the mixing matrix), as discussed by Cardoso [17], can tions, this equation can be put in vector-matrix form: be achieved in several different ways. The first of these ex- ploits non-Gaussianity of all, but possibly one, components. The component separation method of Baccigalupi [11]and X(ϑ, ϕ) = AS(ϑ, ϕ)+N(ϑ, ϕ), (2) Maino [12] belongs to this set of techniques. In CMB data analysis, however, the main component of interest (the CMB where X and N are vectors of length Nd, S is a vector of length itself) has a Gaussian distribution and the observed mixtures Nc,andA is the Nd × Nc mixing matrix. suffer from additive Gaussian noise, so that better perfor- Given the observations of such a set of independent de- mance can be expected from methods based on Gaussian tectors, component separation consists in recovering esti- models. A second set of techniques exploits spectral diver- mates of the maps of the sources sj (ϑ, ϕ). Explicit component sity and works in the Fourier domain. It has the advantage separation has been investigated first in CMB applications by that detector–dependent beams can be handled easily, since [7, 8, 9]. In these applications, recovering component maps is the convolution with a point spread function in direct space the primary target, and all the parameters of the model (mix- becomes a simple product in Fourier space. SMICA follows ing matrix Adj, noise levels, statistics of the components, in- this approach in the context of noisy observations. Finally, a cluding the spatial power spectra) are assumed to be known third set of methods exploits nonstationarity. It is adapted to and are used to invert the linear system. situations where components are strongly nonstationary in Recent research has addressed the case of an imperfectly real space. known mixing matrix. It is then necessary to estimate it (or It is natural to investigate the possible benefits of ex- at least some of its entries) directly from the data. For in- ploiting both nonstationarity and spectral diversity for blind stance, Tegmark et al. assume power law emission spectra for component separation using wavelets. Indeed wavelets are all components except CMB and SZ, and fit spectral indices powerful tools in revealing the spectral content of nonsta- to the observations [10]. tionary data. Although blind source separation in the wavelet More recently, blind source separation or independent domain has been previously examined, the setting here is component analysis (ICA) methods have been implemented different. We should mention, for instance, the separation specifically for CMB studies. The work of Baccigalupi et method in [18] which is based on the non-Gaussianity of the al. [11], further extended by Maino et al. [12], imple- source signals but after a sparsifying wavelet transform and ments a blind source separation method exploiting the non- the Bayesian approach in [19] which adopts a similar point Gaussianity of the sources for their separation, which permits of view although with a richer source model accounting for to recover the mixing matrix A and the maps of the sources. correlations in the wavelet representation. Accounting for spatially varying instrumental noise in the The paper is organized as follows. In Section 2,wefirst observation model is investigated by Kuruoglu et al. in [13], recall the principle of spectral matching ICA. Then, after as well as the possible inclusion of prior information about a brief reminder of some properties of the atrous` wavelet the distributions of the components using a generic Gaussian transform, we discuss in Section 3 the extension of SMICA to mixture model. componentseparationinwaveletspaceinordertodealwith Snoussi et al. [14] propose a Bayesian approach in the nonstationary data. Considering the problem of incomplete Fourier domain assuming known spectra for the compo- data as a model case of practical significance for the compar- nents as well as possibly non-Gaussian priors for the Fourier ison of SMICA and its extension wSMICA, numerical exper- coefficients of the components. A fully blind, maximum like- iments and results are reported in Section 4. lihood approach is developed in [15, 16], with the new point of view that spatial power spectra are actually the main un- known parameters of interest for CMB observations. A key 2. SMICA benefit is that parameter estimation can then be based on a Spectral matching ICA, or SMICA for short, is a blind set of band-averaged spectral covariance matrices, consider- source separation technique which, unlike most standard ably compressing the data size. ICA methods, is able to recover Gaussian sources in noisy Working in the frequency domain offers several benefits contexts. It operates in the spectral domain and is based on but the nonlocality of the Fourier transform creates some dif- spectral diversity:itisabletoseparatesourcesprovidedthey ficulties. In particular, one may wish to avoid the averaging have different power spectra. This section gives a brief ac- induced by the nonlocality of the Fourier transform when count of SMICA. More details can be found in [16]; first ap- dealing with strongly nonstationary components or noise. In plications to CMB analysis are in [16, 20]. addition, in many experiments, only an incomplete sky cov- erage is available. Either the instrument observes only a frac- 2.1. Model and cost function tion of the sky or some regions of the sky must be masked due to localized strong astrophysical sources of contamination: For a second-order stationary Nd-dimensional process, we compact radio sources or galaxies, strong emitting regions in denote by RX (ν) the Nd×Nd spectral covariance matrix at fre- the galactic plane. These effects can be mitigated in a simple quency ν, that is, the (i, i)th entry of RX (ν) is the power spec- manner thanks to the localization properties of wavelets. trum of the ith coordinate of X while the offdiagonal entries Component Separation in Wavelet Space for CMB Analysis 2439

of RX (ν) contain the cross-spectra between the entries of X. Given a data set, denote by X(ν) its discrete Fourier trans- If X follows the linear model of (2) with independent addi- form at frequency ν and denote by {Fq|1 ≤ q ≤ Q} aset tive noise, then its spectral covariance matrix is structured as of Q frequency domains with Fq centered around frequency νq. Spectral covariance matrices are estimated nonparamet- † RX (ν) = ARS(ν)A + RN (ν)(3)rically by R ν R ν 1 † with S( )and N ( ) being the spectral covariance matrices RX νq = X(ν)X(ν) ,(6) nq of S and N, respectively. The assumption of independence ν∈Fq between the underlying components implies that RS(ν)isa diagonal matrix. We will also assume independence of the where nq denotes the number of Fourier points X(ν) in the RN ν noise processes between detectors: matrix ( ) also is a di- spectral domain Fq. We always use symmetric domains in the agonal matrix. sense that frequency ν belongs to Fq if and only if −ν also In the definition of RX (ν), we have not explicitly defined does. This symmetry guarantees that RX (νq)isalwaysareal- the frequency ν. This is because SMICA can be applied for valued matrix when X itself is a real-valued process. the separation of components in many contexts: each ob- In its standard form, the SMICA technique uses positive servation Xd can be a time series (one-dimensional), an im- weights αq = nq and a divergence D defined as age (two-dimensional random fields), a random field on the sphere (as in full-sky CMB studies). In each case, the appro- priate notions of frequency, stationarity, and power spectrum 1 −1 −1 DKL R1, R2 = trace R1R −log det R1R −m (7) should be used. 2 2 2 SMICA estimates all (or a subset of) the model parame- ters which is the Kullback-Leibler divergence between two m- variate zero-mean Gaussian distributions with covariance θ = RS νq , RN νq , A (4) matrices R1 and R2. These choices stem from the Whittle ap- proximation according to which each X(ν) has a zero-mean by minimizing a measure of “spectral mismatch” between normal distribution with covariance matrix RX (ν) and is un- X ν ν = ν sample estimates RX (ν) (defined below) of the spectral co- correlated with ( )for . In this case, it is easily −φ θ α = n D = D variance matrices and their ensemble averages which de- checked that ( )evaluatedwith q q and KL T pend on the parameters according to (3). More specifically, is (up to a constant) the log-likelihood for data samples. This is actually true when the spectral domains are shrunk to an estimate θ ={RS(νq), RN (νq), A} is obtained as θ = just one DFT point (nq = 1forallq); when the spectral do- argminθ φ(θ) where the measure of spectral mismatch φ(θ) mains Fq are chosen to contain several (usually many) DFT is defined by points, then −φ(θ) is the log-likelihood, in the Whittle ap- proximation, of the Gaussian stationary model with constant Q † power spectrum over each domain Fq. This approximation is φ(θ) = αqD RX νq , ARS νq A + RN νq . (5) q=1 at small statistical loss when the spectrum is smooth enough to show little variation over each spectral domain. Here, {νq|1 ≤ q ≤ Q} is a set of frequencies, {αq|1 ≤ q ≤ The major gain of assuming constant spectrum over each F Q} is a set of positive weights, and D(·, ·)isameasureof q is the resulting reduction of the data set to a small num- mismatch between two positive matrices. ber of covariance matrices. This may be a crucial benefit in This approach is a particular instance of moment match- applications like astronomical imaging where very large data sets are frequent. ing. As such, if consistent estimates RX (νq) of the spectral Regarding our application to CMB analysis, the hypoth- covariance matrices RX (νq) are available and if the model is esized isotropy of the distribution of the sources leads to in- identifiable, then any reasonable choice of the weights αq and of the divergence measure D(·, ·) should lead to consistent tegrate over spectral domains with the corresponding sym- estimates of the parameters. However, this does not mean metry. For sky maps small enough to be considered as flat, that these choices should be arbitrary: in our standard imple- the spectral decomposition is the two-dimensional Fourier mentation, we make specific choices (described next) in such transform and the “natural” spectral domains are rings cen- a way that minimizing φ(θ) is identical to maximizing the tered on the null frequency. For larger maps where curva- likelihood of θ in a model of Gaussian stationary processes. ture cannot be neglected, the spectral decomposition is over Hence, these choices guarantee a good statistical efficiency spherical harmonics and the natural spectral domains con- when the underlying processes are well modeled as Gaussian tain all the modes associated to a set of scales [20]. stationary processes. When this is not the case, though, the performance of SMICA may not be as good as (but not nec- 2.2. Parameter optimization essarily worse than) the performance of other methods de- Minimizing the spectral mismatch φ(θ) can be achieved us- signed to capture other aspects of the statistical distribution ing any optimization technique. However, φ being a likeli- of the data, such as non-Gaussian features, for instance. hood criterion in disguise, one can also resort to the EM 2440 EURASIP Journal on Applied Signal Processing algorithm. This is detailed in [16] in the case of spatially (in particular those columns of A which correspond to galac- white noise, that is, RN (ν) actually not depending on ν.Ac- tic emissions) is known to depend somewhat on the direction tually, this latter algorithm was slightly modified in order to of observation or on spatial frequency. Measuring the depen- deal with the case of colored noise N in (2). Another useful dence A(ϑ, ϕ) is of interest for future experiments as Planck, enhancement was to allow for constraints to be set on the and cannot be achieved directly with SMICA. Further, the model parameters so that prior information such as bounds components are known to be both correlated and nonsta- on some entries of the mixing matrix A could be included. tionary. For instance, galactic dust emissions are strongly Details are given in the appendix. peaked towards the galactic plane. A nonlocal spectral repre- The EM algorithm is straightforwardly implemented and sentation (via Fourier coefficients or via spherical harmon- does not require any tuning. It can quickly drive the spec- ics) mixes contributions from high galactic sky, nearly free of tral mismatch down to small values but is often unable to foreground contamination, and contributions from within complete the optimization. Slow EM finishing is inherent to the galactic plane. Noise levels themselves may be quite non- noisy models [21] and we have found it necessary to imple- stationary, with high SNR regions observed for a long time ment a mixed ad hoc strategy based on alternating EM steps and low SNR regions poorly observed. and BFGS steps [16]. When there are sharp edges on the maps or gaps in the We have also found that initialization is critical: criterion data, corresponding to unobserved or masked regions, spec- (5) is probably multimodal for many data sets. This issue is tral estimation using the smooth periodogram of (6)isnot not addressed in this paper though, since our prime inter- the most satisfactory procedure. Although apodizing win- est is in the study of the statistical performances of different dows may help cope with edge effects in Fourier analysis, they estimators of the model parameters θ. In the simulations re- are not very straightforward to use in the case of arbitrarily ported below, the minimization of φ(θ) is initialized at the shaped maps with arbitrarily shaped gaps, such as those en- true mixing matrix and with spectral covariance matrices es- countered in the Archeops experiment [2]. timated from the initial separate source and noise maps. Clearly, the spectral analysis of gapped data requires tools different from those used to process full data sets, if only be- 2.3. Component map estimation cause the hypothesized stationarity of the data is greatly dis- turbed by the missing samples. Common such methods of- When running SMICA, power spectral densities for the ten amount to using standard spectral estimators after the sources and detector noise are obtained along with the es- gaps were filled with estimates of the missing samples. How- timated mixing matrix. They are used in reconstructing the ever, the data interpolation stage is critical and cannot be source maps via Wiener filtering in Fourier space: a Fourier completed without prior assumptions on the data. Another X ν ν ∈ F mode ( )infrequencyband q is used to reconstruct idea, applicable to CMB analysis, is to process gapped data the maps according to as if they were complete but to correct afterwards the spec-

−1 tral estimates from the bias induced by the gaps [22]. We † −1 −1 † −1 S(ν) = A RN (ν) A + RS(ν) A RN (ν) X(ν). (8) preferred to rely on methods intrinsically dedicated to the analysis of nonstationary data such as the wavelet transform, In the limiting case where noise is small compared to signal widely used to reveal variations in the spectral content of components, the Wiener filter reduces to time series or images, as they permit to single out regions in direct space while retaining localization in the frequency −1 domain. We see next how to reformulate (5) in the wavelet S ν = A†R ν −1A A†R ν −1X ν . ( ) N ( ) N ( ) ( ) (9) domain in order to deal with missing data. Note that, in the following, the locations of the missing samples are assumed Note however that the above Wiener filter is optimal only to be known. in front of stationary Gaussian processes. For weak, point- like sources such as galaxy clusters seen via the Sunyaev– 3.1. Wavelet transform Zel’dovich effect (defined in Section 4.1), much better recon- The experiments described further down make use of the un- struction can be expected from nonlinear methods. decimated atrous` algorithm with the 2D cubic B3 spline [23] as scaling function, for implementing a wavelet transform. Although, depending on the data analysis problem, it is pos- 3. SPECTRAL MATCHING IN WAVELET SPACE sible that a different choice can lead to better results, for our The SMICA method for spectral matching in Fourier space specific application, the atrous` wavelet transform has several has already shown significant success for CMB spectral esti- favorable properties. First, it is a shift invariant transform, mation in multidetector experiments. It is in particular able the wavelet coefficient maps on each scale are the same size as to identify and remove residuals of poorly known correlated the initial image, and the wavelet and scaling functions have systematics and astrophysical foreground emissions contam- small compact supports on the data map. Hence, missing inating CMB maps. However, SMICA suffers from several patches in the observed maps are easily handled. Second, the practical difficulties when dealing with real data. 2D wavelet and scaling functions are nearly isotropic which Indeed, actual components are known to depart slightly is best for the analysis of an isotropic Gaussian field such as from the ideal linear mixture model (2). The mixing matrix the CMB, or of data sets such as maps of galaxy clusters, Component Separation in Wavelet Space for CMB Analysis 2441 which contain only isotropic features. The undecimated 100 isotropic atrous` wavelet transform has been shown to be 10−2 well suited to the analysis of astrophysical data where transla- −4 tion invariance is desirable and where the emphasis is seldom 10 on data compression [23]. Further, with this choice of scal- 10−6 ing function, the so-called scaling equation is satisfied, and 10−8 therefore fast implementations of the decomposition and re- Magnitude −10 construction steps of the atrous` transform are available [23]. 10 −12 Given a 2D data set c0(k, l), the atrous` algorithm pro- 10 w k l duces recursively a set of detail maps i( , ) on a dyadic res- 10−14 olution scale and a smooth approximation cJ (k, l)[23]. We J 10−16 note that the lowest resolution max is obviously limited by 00.05 0.10.15 0.20.25 0.3 the data map size. The transform is readily inverted by Reduced frequency J c (k, l) = cJ (k, l)+ wi(k, l), (10) Figure 1: Magnitudes averaged over spectral rings of the nearly 0 ψ ψ ... ψ i=1 isotropic cubic spline wavelet filters 1, 2, , 5 used in the sim- ulations described further down. The vertical dotted lines for ν = which is a simple addition of the smooth array with the detail {0.013, 0.025, 0.045, 0.09, 0.2} delimit the five frequency bands used maps. with SMICA in these simulations. 3.2. Spectral matching in wavelet space: wSMICA In order to define a sensible wavelet version of SMICA, we We now consider using another set of filters in place of first rewrite the SMICA criterion (5)intermsofcovariance the ideal bandpass filters used by SMICA. In dealing with matrices in the initial domain, where, for instance, the gaps nonstationary data or, as a special case, with gapped data, are best described, rather than in the Fourier domain. it is especially attractive to consider finite impulse response Consider a batch of T data samples Xt=1,T where t is an (FIR) filters. Indeed, provided the response of such a filter is appropriate index depending on the dimension of the data, short enough compared to data size T and gap widths, most ff and the set of Q ideal bandpass filters Fq associated with the of the samples in the filtered signal will be una ected by the nonoverlapping frequency domains Fq used in SMICA. De- presence of gaps. Using exclusively these samples yields esti- noting by Xq(t) the data filtered through Fq,wedefinesam- mated covariance matrices which are not biased by the miss- ple covariance matrices ing data, at the cost of a slight increase of variance due to discarding some data samples. In the following, we use fil- T ters ψ1, ψ2, ..., ψJ , φJ (see Figure 1) and the wavelet atrous` R q = 1 X t X t † T,X ( ) T q( ) q( ) (11) algorithm. t=1 Consider again a batch of T regularly spaced data samples Xt=1,T . Possible gaps in the data are simply described with a obtained by averaging in the original domain. Owing to the µ unitary property of the discrete Fourier transform, one has mask , that is, an array of zeroes and ones of the same size as the data Xt=1,T with ones corresponding to samples outside n W W ... W C q the gaps. Denoting by 1, 2, , J and J the wavelet R X (q) = RX νq , (12) T, T scales and the smooth approximation of X, obtained with the atrous` transform and µ , ..., µJ the masks for the dif- n 1 +1 where q was defined as the number of Fourier modes in ferent scales determined from the original mask µ knowing F R q = spectral band q. These matrices are estimates of T,X ( ) the different filter lengths, wavelet covariances are estimated X t X t † X t E( q( ) q( ) ), the covariance matrix of q( ). Again, ac- as follows: cording to model (3), the covariance matrices are again structured as T 1 † † RW,X (1 ≤ i ≤ J) = µi(t)Wi(t)Wi(t) , R X q = AR S q A R N q li T, ( ) T, ( ) + T, ( ), (13) t=1 (15) where R S(q)andR N (q) are defined similarly to R X (q). T T, T, T, 1 † RW,X (J +1)= µJ+1(t)CJ (t)CJ (t) , Hence, minimizing the SMICA objective function (5) is then lJ equivalent to minimizing +1 t=1

Q where li is the number of nonzero samples in µi.Withsource † φ(θ) = nqD R X (q), AR S(q)A + R N (q) (14) KL T, T, T, and noise covariances RW,S(i), RW,N (i) defined in a similar q= 1 way, the covariance model in wavelet space becomes with respect to the new set of parameters θ = (A, RT,S(q), R q † T,N ( )). RW,X (i) = ARW,S(i)A + RW,N (i). (16) 2442 EURASIP Journal on Applied Signal Processing

(a) (b) (c) Figure 2: Samples of simulated component maps of CMB, dust, and SZ.

Our wavelet-based version of SMICA consists in minimizing in the one-dimensional case and the wSMICA criterion: β β β β J+1 α α ... α α = 3 1 3 2 ... 3 J J+1 † 1, 2, , J , J+1 , , , J , J (21) φ(θ) = αiDKL RW,X (i), ARW,S(i)A + RW,N (i) (17) 4 16 4 4 i=1

in the two-dimensional case. The fraction 1 − βi of discarded with respect to θ = (A, R S(i), R N (i)) for some sensible W, W, points depends on scale i (even with the atrous` algorithm) choice of the weights αi. because the length of the wavelet filter itself depends on i. The weights in the spectral mismatch (17) should be cho- However, it is roughly scale independent, if the missing data sen to reflect the variability of the estimate of the correspond- are large patches of much bigger size than the length of the ing covariance matrix. Examining first (14), we see weights wavelet filters used at any scale in the wavelet decomposition. which are proportional to nq, that is, to the number of DFT Before closing, we note that the different wavelet filter points used in computing the sample covariance matrix, be- outputs Wi(t) are correlated due to the overlap between fre- cause this is in fact the number of uncorrelated values of X(ν) R ν quency responses (Figure 1). Optimal inference should take entering in the estimation of X ( q). It is also proportional to this correlation into account but we have chosen not to do the size of the frequency domain over which RX (νq)iseval- so and rather to stick to a simple criterion like (17) which ig- uated. Since wSMICA uses wavelet filters with only limited nores the correlations between sample covariance matrices. overlap, we choose to use the same strategy as in SMICA No big loss is expected from this choice because the wavelet since the latter amounts to using ideal bandpass filters. In bands do not overlap very much. other words, when no data points are missing, the weights for wSMICA are taken proportional to the size of the frequency domains covered at each scale. This is 4. NUMERICAL EXPERIMENTS

4.1. Simulation of realistic maps α α ... α α = 1 1 ... 1 1 m = 1, 2, , J , J+1 , , , J , J (18) We have simulated observations consisting of 6mix- 2 4 2 2 tures of n = 3 components, namely, CMB, galactic dust, and SZ emissions for which templates were obtained as described in the one-dimensional case and in [16]; see Figure 2 for typical realizations. Dust emission is the greybody emission of small dust 3 3 3 1 particles in our own galaxy. The intensity of emission is α , α , ..., αJ , αJ = , , ..., , (19) 1 2 +1 4 16 4J 4J strongly concentrated towards the galactic plane, although cirrus clouds at high galactic latitudes are present as well. The ναB T α  . in the two-dimensional case. dust emission law is of the form ν( dust)where 1 7, B T T  In the case of data with gaps, we must further take into ν( ) is the blackbody emission law, and dust 17 K is the account that some wavelet coefficients are discarded. Let βi typical dust temperature in the interstellar medium. denote the fraction of wavelets coefficients which are unaf- The Sunyaev-Zel’dovich effect (SZ) is a small distortion fected by the gaps at scale i. The number of effective points is of the CMB blackbody emission law caused by inverse Comp- reduced by this fraction and one should use the weights ton scattering of CMB photons on free electrons in hot ion- ized gas, present mostly in clusters of galaxies. The energetic electron, in the interaction, gives a fraction of its energy to β1 β2 βJ βJ+1 the scattered CMB photon, shifting its frequency to a higher α , α , ..., αJ , αJ = , , ..., , (20) 1 2 +1 2 4 2J 2J value. As a result, the SZ effect causes a shift in CMB photon Component Separation in Wavelet Space for CMB Analysis 2443

(a) (b) (c)

(d) (e) (f)

Figure 3: Simulated observation maps based on the templates shown in Figure 2 and the mixing matrix in Table 1 for the nominal Planck HFI noise levels.

Table 1: Entries of A, the mixing matrix used in our simulations.

CMB Dust SZ Channel 7.452 × 10−1 3.654 × 10−2 −8.733 × 10−1 100 GHz 5.799 × 10−1 7.021 × 10−2 −4.689 × 10−1 143 GHz 3.206 × 10−1 1.449 × 10−1 −2.093 × 10−3 217 GHz 7.435 × 10−2 3.106 × 10−1 1.294 × 10−1 353 GHz 6.009 × 10−3 5.398 × 10−1 2.613 × 10−2 545 GHz 6.115 × 10−5 7.648 × 10−1 5.268 × 10−4 857 GHz energy distribution, depleting the occupation of low energy While the relative noise standard deviations between chan- levels and populating high energy levels. The net effect, to nels are set according to the nominal values of the Planck first order, is a small additive emission, negative at frequen- HFI, we also experiment with five global noise levels at −20, cies below 217 GHz, and positive at frequencies above. A re- −6, −3, 0, and +3 dB from nominal values. Table 2 gives the view on SZ effect can be found in [24]. typical energy fractions that are contributed by each of the The templates, and thus the mixtures in each simulated n = 3 original sources and noise, to the total energy of each of data set, consist of 300 × 300 pixel maps corresponding to a the m = 6 mixtures, considering Planck nominal noise vari- 12.5◦ × 12.5◦ field located at high galactic latitude. The six ance. In fact, because SMICA and wSMICA actually work on mixtures in each set mimic observations that will eventually spectral bands, a much better indication of signal-to-noise be acquired in the six frequency channels of the Planck HFI ratio in these simulations is given by Figure 4 which shows (Figure 3). The entries of the mixing matrix A used in these how noise and source energy contributions distribute with simulations actually are estimated values of the electromag- respect to frequency in the six mixtures. netic emission laws of each component at 100, 143, 217, 353, Finally, in order to investigate the impact of gaps in the 545, and 857 GHz; see Table 1. data, and the benefits of using wSMICA in place of SMICA to White Gaussian noise is added to the mixtures accord- deal with these gaps, the mask shown in Figure 5 was applied ing to model (2) in order to simulate instrumental noise. onto the mixture maps. The case where no data is missing 2444 EURASIP Journal on Applied Signal Processing

Table 2: Energy fraction contributed by each source to the total energy of each mixture, for the nominal noise variance on the Planck HFI channels. CMB Dust SZ Noise Channel 9.91 × 10−1 1.18 × 10−4 7.92 × 10−3 2.53 × 10−6 100 GHz 9.97 × 10−1 7.25 × 10−4 3.79 × 10−3 5.17 × 10−7 143 GHz 9.98 × 10−1 1.01 × 10−2 2.48 × 10−7 1.34 × 10−7 217 GHz 5.55 × 10−1 4.8 × 10−1 9.78 × 10−3 7.47 × 10−8 353 GHz 2.5 × 10−3 1.02.75 × 10−4 3.78 × 10−9 545 GHz 1.29 × 10−7 1.05.56 × 10−8 1.24 × 10−10 857 GHz

was also considered as a reference case. Spectral matching without any mixing, that is, we take A to be the 3 × 3 identity with wSMICA is conducted using the output of the five matrix. The following steps were repeated 1000 times. wavelet filters ψ , ..., ψ associated to higher frequency de- 1 5 (i) Randomly pick one of each component maps out of tails. For the sake of comparison, SMICA is run using five the available 200 CMB maps, 30 dust maps, and 1500 bands in Fourier space which are similar to the dyadic bands SZ maps. imposed by the wavelet transform, as shown in Figure 1. This (ii) Compute covariance matrices in the five wavelet or latter choice of frequency bands is made to ease comparison Fourier bands, both with and without masking part of between SMICA and wSMICA. the maps. (iii) Normalize each source so that its total energy over the 4.2. Experiments with noise-free mixtures five bands is equal to one. Preliminary experiments were conducted in the case of van- (iv) Estimate a separating matrix by joint diagonalization ishing instrumental noise variance. In this case, the blind of the covariance matrices. component separation problem is “equivariant,” entailing that the quality of separation on a given mixture does not These noise-free experiments are complemented using ff depend at all on the mixing matrix A but only on the par- “surrogate” data in order to assess the e ect of any non- ticular realization of the sources and on the algorithm used Gaussianity or nonstationarity in the source templates. We for separation. More specifically, in the case of SMICA and repeat the simulations on Gaussian stationary maps gener- wSMICA, separation performance depends on the spectral ated with the same spectra as the CMB, dust, and SZ compo- I diversity of the components and on the ability of both objec- nents. The resulting distribution of then only reflects the tive functions to exploit this diversity. Hence, the noise-free ability of (w)SMICA to exploit the spectral diversity of the experiments in this section are indicative of the spectral di- components independently of the other aspects of their dis- versity of the components, of the ability of (w)SMICA to cap- tribution. ff ture it, and of the robustness of the (w)SMICA with respect ThehistogramsonFigure 6 are for the o diagonal term to missing data. corresponding to the residual corruption of CMB by Gaus- Note that in a noise-free model, the spectral matching sian dust in the second set of experiments (using surrogate objective boils down to an objective of joint diagonalization data). In Tables 3 and 4, the results obtained with the syn- of the covariance matrices, as shown in [25]. Hence, spectral thetic component maps are given as well as those obtained matching can be implemented using an efficient dedicated with the surrogate Gaussian maps, in terms of the standard ff I algorithm [26]. deviations of the o diagonal entries ij defined by (22). The estimated components are related to the true one ac- When working on surrogate Gaussian maps without cording to masks, using covariance matrices in Fourier space or in wavelet space gives similar performances. It is also satisfac- S = IS, (22) tory, when covariances in wavelet space are used with surro- gate Gaussian maps, that each computed standard deviation where I is the product of the mixing matrix used in simula- only slightly increases when a mask is applied on the data. tions and of the separating matrix obtained by joint diago- Indeed, as a consequence of incomplete coverage, there are nalization. It also includes any normalization needed for the less samples from which to estimate the covariances. This in- components and their estimates to have total energy in all crease is also observed when covariance matrices in Fourier bands equal to 1. With this normalization, the square of any space are used with the surrogate Gaussian maps but it can offdiagonal term Iij is directly related to the residual level be as high as fivefold and it does not affect all the coeffi- of contamination by component j in the recovered compo- cients equally. Although this can again be attributed to the re- nent i. Since performance in separating noise-free mixtures duced data size, the lowered spectral diversity between com- is independent of the mixing matrix, the choice of A in the ponents, because of the correlations and smoothing induced simulations is irrelevant: it does not change the distribution in Fourier space by the mask, is also part of the explanation. of I. In practice, our noise-free experiments are conducted In fact, as shown on Figure 4, CMB and dust spatial power Component Separation in Wavelet Space for CMB Analysis 2445

100 100 10−1 CMB 10−1 −2 10 − CMB 10 2 −3 10 − 10 3 10−4 Noise 10−4 −5 Noise

Energy 10 Energy −5 − 10 10 6 SZ − − 10 6 10 7 Dust SZ Dust − 10−8 10 7 10−9 10−8 00.05 0.10.15 0.20.25 0.3 00.05 0.10.15 0.20.25 0.3 Reduced frequency Reduced frequency

(a) (b)

100 10−1

− −2 10 2 10 CMB 10−3 −4 10 Noise 10−4 CMB −6 10 Noise −5 Energy

Energy 10 − Dust 10 8 Dust −6 SZ 10 −10 SZ 10 10−7

10−12 10−8 00.05 0.10.15 0.20.25 0.3 00.05 0.10.15 0.20.25 0.3 Reduced frequency Reduced frequency

(c) (d)

10−1 100 −2 10 10−2 Dust 10−3 Dust 10−4 10−4 − −5 Noise 6 10 10 Noise 10−6 −8 Energy Energy 10 CMB 10−7 10−10 CMB −8 SZ 10 SZ −12 10−9 10 10−10 10−14 00.05 0.10.15 0.20.25 0.3 00.05 0.10.15 0.20.25 0.3 Reduced frequency Reduced frequency

(e) (f)

Figure 4: Energy contributed by each source and noise in each bolometer as a function of frequency, for the nominal noise variance on the Planck HFI channels: (a) 100 GHz, (b) 143 GHz, (c) 217 GHz, (d) 353 GHz, (e) 545 GHz, and (f) 857 GHz. Note how SZ is expected to be always below nominal noise, that CMB and dust strongly dominate in different channels, and that CMB and dust spectra, without being proportional, display the same general behavior dominated by low modes. spectra are somewhat similar, that is, show low spectral di- In the case of realistic component maps, we note first that versity, and further smoothing can only degrade the perfor- the comparison of the performance of component separation mance of the source separation algorithm based on Fourier using wavelet-based covariance matrices with and without covariances. mask again agrees with the different data sizes, which is not 2446 EURASIP Journal on Applied Signal Processing

4.3. Realistic experiments The results of the previous section show that, in the noise- less case, using wavelet-based covariance matrices provides a simple and efficient way to cancel the bad impact that gaps actually have on the performance of estimation using Fourier-based statistics. We move on to investigate the effect of additive noise on SMICA and wSMICA. Picking at random one of each component maps out of (a) (b) the available 200 CMB maps, 30 dust maps, and 1500 SZ maps, 1000 sets of six synthetic mixture maps were gener- ated as previously described, for each of the 5 noise levels chosen. Then, component separation was conducted using the spectral matching algorithms SMICA and wSMICA both with and without part of the maps being masked. A typi- cal run of SMICA or wSMICA in the setting considered here (i.e., 300 by 300 maps, 6 mixtures, 3 sources, 5 wavelet scales, no constraints on the mixing matrix) takes only a few sec- onds on a 1.25 Ghz Mac G4 when coded in IDL. The same (c) (d) optimization techniques are used for SMICA and wSMICA since the two criteria have the same form. Each run of SMICA and wSMICA on the data returns es- timates A f and Aw of the mixing matrix. These estimates are subject to the indeterminacies inherent to the instantaneous linear mixture model (2). Indeed, in the case where optimiza- tion is over all parameters θ, any simultaneous permutation of the columns of A and of the lines of S leaves the model unchanged. The same occurs when exchanging a scalar pos- sibly negative factor between any column in A and the corre- sponding line in S. Therefore, columnwise comparison of A f (e) (f) and Aw to the original mixing matrix A requires first fixing these indeterminacies. This is done “by hand” after A f and Figure 5: (a) Mask used to simulate a gap in the data. (b)–(f) Mod- Aw have been normalized columnwise. ified masks at scales 1 through 5. The discarded pixels are in black. The results we report next focus on the statistical proper- ties of A f and Aw as estimated from the 1000 runs of the two competing methods in the several configurations retained. In the case with covariances in Fourier space. Next, whether fact, the correct estimation of the mixing matrix in model covariance matrices are computed in Fourier space or in (2) is a relevant issue, for instance, when it comes to deal- wavelet space, we note that the terms coupling CMB and ing with the cross-calibration of the different detectors. Fig- dust are again much higher than with surrogate data, even ures 7, 8,and9 show the results obtained, using the quadratic on complete maps. This is probably to be attributed to the norm nonstationarity and/or non-Gaussianity of the Dust com- ponent. Another point is that the CMB and dust templates m 1/2 as in Figure 2 exhibit sharp edges compared to SZ and this 2 QEj = Aij − Aij (23) inevitably disturbs spectral estimation using a simple DFT. i=1 To assess this effect, simulations were also conducted where the covariances in Fourier space were computed after an A = A A j = apodizing Hanning window was applied on the complete with f or w and CMB, dust, or SZ, to assess the data maps. The results reported in Table 3,tobecompared residual errors on the estimated emissivities of each compo- to Table 4, do indicate a slightly positive effect of windowing, nent. The plotted curves show how the mean of the above but still the separation using wavelet-based statistics appears positive error measure varies with increasing noise variance. better. To further complete this preliminary study, we con- For the particular case of CMB, Table 6 gives the estimated ducted similar experiments using JADE [27], an ICA algo- standard deviations of the relative errors (Aij−Aij)/Aij on the rithm based on fourth-order statistics. This algorithm does estimated CMB emission law in the six channels of Planck’s not use spectral information at all. As discussed earlier, such HFI in the different configurations retained. amethodisnotexpectedtoworkwellonCMBdataand Closer to our source separation objective, a more signif- the results reported in Table 5 do show lower performance icant way of assessing the quality of A f and Aw as estima- in comparison to Tables 3 and 4. tors of the mixing matrix A would be to use the following Component Separation in Wavelet Space for CMB Analysis 2447

0.25 0.25

0.2 0.2

0.15 0.15

0.1 0.1 Proportion of counts Proportion of counts

0.05 0.05

0 0 −0.1 −0.05 0 0.05 0.1 −0.1 −0.05 00.05 0.1 Offdiagonal entry Offdiagonal entry

(a) (b)

Figure 6: Histograms of the offdiagonal term of I,definedin(22), corresponding to the residual corruption of “CMB” by “dust” while separating Gaussian maps generated with the same power spectra as the astrophysical components, by joint diagonalization of covariance matrices in (a) Fourier and (b) wavelet spaces, with (black, which appears grey when seen through white ) and without (white) masking part of the data. The dark widest histogram on the left highlights the impact of masking on source separation based on Fourier covariance matrices.

Table 3: Standard deviations of the offdiagonal entries Iij defined by (22) obtained while separating realistic component maps by joint diagonalization of covariance matrices in Fourier space, with (M) or without masking (NM) part of the data, or applying an apodizing Hanning window (Han). Components 1, 2, and 3, respectively, stand for CMB, dust, and SZ. The numbers in italic were obtained with Gaussian maps and the underlined numbers correspond to the histograms in Figure 6.

Offdiag. entry NM M Han

I1,2 0.097 0.0076 0.074 0.038 0.024

I1,3 0.0049 0.0044 0.005 0.006 0.0094

I2,1 0.017 0.0066 0.018 0.01 0.017

I2,3 0.0064 0.0077 0.0066 0.0096 0.011

I3,1 0.0024 0.0026 0.0028 0.0037 0.0039

I3,2 0.0054 0.0071 0.0054 0.0079 0.01

Table 4: Standard deviations of the offdiagonal entries Iij defined by (22) obtained while separating realistic component maps by joint diagonalization of covariance matrices in wavelet space, with (M) and without masking (NM) part of the data. Components 1, 2, and 3, respectively, stand for CMB, dust, and SZ. The numbers in italic were obtained with Gaussian maps and the underlined numbers correspond to the histograms in Figure 6.

Offdiag. entry NM M

I1,2 0.015 0.0071 0.018 0.0079

I1,3 0.0025 0.0029 0.0028 0.0031

I2,1 0.016 0.0077 0.019 0.0089

I2,3 0.0041 0.0051 0.0048 0.0075

I3,1 0.0024 0.0029 0.003 0.0039

I3,2 0.0039 0.0054 0.0053 0.0085 2448 EURASIP Journal on Applied Signal Processing

Table 5: Standard deviations of the offdiagonal entries Iij defined Regarding future work, a few points are in order. First, by (22) obtained while separating realistic component maps using we note that possible correlations between the components JADE, with (M) and without masking (NM) part of the data. Com- are not accounted for in SMICA or wSMICA as presented ponents 1, 2, and 3, respectively, stand for CMB, dust, and SZ. here.However,itisnotdifficult in principle to handle such known or suspected correlations by adding offdiagonal pa- Offdiag. entry I I I I I I 1,2 1,3 2,1 2,3 3,1 3,2 rameters in the model spectral covariances of the sources. NM ...... 0 021 0 25 0 022 0 02 0 31 0 02 Still, in the case of CMB analysis from high frequency obser- M 0.023 0.29 0.025 0.018 0.34 0.018 vations which contain only one galactic component (dust) as in our simulations, spatial correlations between components should not be a problem. interference-to-signal ratio: We note that the proposed wavelet-based approach, as 2 2 implemented with the standard atrous` wavelet transform, i= j Ij iσi = , offers little flexibility in the spectral bands available for wS- ISR j I2 σ2 , (24) j,j j MICA while the Fourier approach gives complete flexibility in this respect. But it is possible, even straightforward, to use where the σj are the source variances and other transforms such as the atrous` wavelet packet trans- form, or the continuous wavelet transform, or in fact any set −1 † −1 † −1 I = A RN A A RN A (25) of linear filters, preferably FIR filters. This in turn raises the question of optimally choosing this set of filters, keeping in with RN the noise covariance. The plots on Figures 10, 11, mind that higher resolution in Fourier space requires longer and 12 show how the mean ISR from the 1000 runs of SMICA filters, which is not desirable in the case of incomplete or and wSMICA in different configurations varies with increas- nonstationary data. In fact, the optimal selection of bands is ing noise. Figure 13 typically estimated component maps ob- clearly a meaningful question both for SMICA and wSMICA. tained using SMICA and wSMICA. For the sake of compari- We also note that in the CMB application, the com- son, component maps estimated using the JADE source sep- ponents have quite different statistical properties: some are aration method are also included. expected to be very close to Gaussian (like the CMB) We note again that the performance of wSMICA behaves whereas others are strongly non-Gaussian (like SZ). The as expected when noise increases and if part of the data is non-Gaussianity of some components does not affect the missing. However, this is not always the case with SMICA. consistency of our estimator but, for a given spectrum, it Finally, this set of simulations, conducted in a more realistic does affect the distribution of the estimates although this im- setting with respect to ESA’s Planck mission, again confirms pact is not easily predicted. It is clear, however, that ignoring the higher performance, over Fourier analysis, that we indeed the strong non-Gaussianity of some components is a loss of expected from the use of wavelets. The latter are able to cor- information. Devising a method able, with reasonable com- rectly grab the spectral content of partly masked data maps plexity, to exploit jointly non-Gaussianity (as in traditional and from there allow for better component separation. ICA techniques) and spectral information (as in Fourier or waveletSMICA)appearsasadifficult challenge. 5. CONCLUSION This paper has presented an extension of the spectral match- APPENDIX ing ICA algorithm to the wavelet domain, motivated by EM ALGORITHM WITH CONSTRAINTS ON the need to deal with components which exhibit spatial THE MIXING MATRIX correlations and are nonstationary. Maps with gaps are a Q n particular instance of practical significance. Substituting co- Considering separate frequency bands of size q with n = variance matching in Fourier space by covariance matching q 1, the EM functional derived for the instanta- in wavelet space makes it possible to cope with gaps of any neous mixing model (2) with independent Gaussian station- S N shape in a very straightforward manner. Mainly, it is the finite ary sources and noise is length of the wavelet filters used here that allows the impact of edges and gaps on the estimated covariances and hence on Φ(θ, θ) = E log p X, S|θ |θ (A.1) component separation to be lowered. Optimally choosing the FIR filter bank regarding a particular application is a possible with θ = (A, RS,1, ..., RS,Q, RN,1, ..., RN,Q)andθ = (A, R ... R R ... R further enhancement. S,1, , S,Q, N,1, , N,Q). The maximization step of the Our numerical experiments, based on realistic simula- EM algorithm seeks then to maximize Φ(θ, θ)withrespect tions of the astrophysical data expected from the Planck mis- to θ and the optimal θ is used as the value for θ at the next sion, confirm the benefits of correctly processing existing EM step, and so on until satisfactory convergence is reached. gaps. Clearly, other possible types of nonstationarities in the Explicit expressions are easily derived for the optimal θ in the collected data such as spatially varying noise or component white noise case where an interesting decoupling occurs be- variance, and so forth could be dealt with very simply in a tween the reestimating equations for noise variances, source similar fashion using the wavelet extension of SMICA. variances, and the mixing matrix [15]. Component Separation in Wavelet Space for CMB Analysis 2449

0.008

0.006 10−1

0.005 Mean squared error 0.004 Mean squared error 10−2

0.003 −20 −6 −30 3 −20 −6 −30 3 Noise level in dB relative to nominal values Noise level in dB relative to nominal values

Fourier + Hanning Wavelet + no mask Fourier + Hanning Wavelet + no mask Fourier + mask Wavelet + mask Fourier + mask Wavelet + mask Fourier + no mask Fourier + no mask

Figure 7: Comparison of the mean squared errors on the estima- Figure 9: Comparison of the mean squared errors on the estima- tion of the emission law of CMB as a function of noise in five dif- tion of the emission law of SZ as a function of noise in five different ferent configurations: wSMICA without mask, wSMICA with mask, configurations: wSMICA without mask, wSMICA with mask, fS- fSMICA without mask, and fSMICA with mask, and fSMICA with MICA without mask, fSMICA with mask, and fSMICA with Han- Hanning apodizing window. ning apodizing window.

0.4 First, we exhibit the quadratic dependence of the EM functional Φ(θ, θ)onA: 0.3 Φ(θ, θ) 0.2 =−1 n ARssA†R−1 − ARxs†R−1 − RxsA†R−1 q trace q N,q q N,q q N,q 2 q

+constA, 0.1 (A.2) Mean squared error where

−1 C = A†R−1 A R−1 0.05 q N,q + S,q , −20 −6 −30 3 −1 † −1 −1 † −1 Noise level in dB relative to nominal values Wq = A RN qA + RS q A RN q, , , , (A.3) xs † Fourier + Hanning Wavelet + no mask Rq = RX,qWq , Fourier + mask Wavelet + mask ss † Fourier + no mask Rq = WqRX,qWq + Cq.

Figure 8: Comparison of the mean squared errors on the estima- In the white noise case, RN,q = RN ,(A.2)becomes tion of the emission law of dust as a function of noise in five differ- ent configurations: wSMICA without mask, wSMICA with mask, 1 xs ss−1 ss xs ss−1 † −1 fSMICA without mask, and fSMICA with mask, and fSMICA with Φ(θ, θ) =− trace A − R R R A − R R RN Hanning apodizing window. 2 +constA, (A.4) Linear equality constraints When A is subject to linear constraints, the joint maximiza- where tion of the EM functional with respect to all model parame- xs xs ss ss R = nqRq , R = nqRq . (A.5) ters is no longer easily achieved in general. In fact, one cannot q q simply decouple the reestimating rules for the noise param- eters and the mixing matrix and these have to be optimized Again, this can be rewritten as separately. We give next the modified reestimating equations for the mixing matrix and the source variances in the case of 1 † Φ(θ, θ) =− (A − M) Q(A − M)+constA,(A.6) constant noise (i.e., θ = (A, RS,1, ..., RS,Q)). 2 2450 EURASIP Journal on Applied Signal Processing

Table 6: Standard deviations of the relative errors on the estimated emission laws Ai1 of CMB in Planck’s HFI six channels. The column labels WNM, WM, FNM, FM, FHan are for the different configurations, respectively: wSMICA without mask, wSMICA with mask, fSMICA without mask, fSMICA with mask, and fSMICA with Hanning apodizing window. The five figures in each box are for noise variance -20, -6, -3, 0, and 3 dB from nominal Planck values.

Emission law WNM WM FNM FM FHan 4.4 ∗ 10−4 5.0 ∗ 10−4 6.2 ∗ 10−4 7.3 ∗ 10−4 7.2 ∗ 10−4 5.4 ∗ 10−4 7.5 ∗ 10−4 7.1 ∗ 10−4 8.5 ∗ 10−4 9.5 ∗ 10−4 −4 −4 −4 −4 −3 A11 6.6 ∗ 10 9.2 ∗ 10 8.2 ∗ 10 8.9 ∗ 10 1.3 ∗ 10 9.4 ∗ 10−4 1.2 ∗ 10−3 1.0 ∗ 10−3 1.0 ∗ 10−3 1.7 ∗ 10−3 1.2 ∗ 10−3 1.7 ∗ 10−3 1.2 ∗ 10−3 1.4 ∗ 10−3 2.3 ∗ 10−3 1.6 ∗ 10−4 2.1 ∗ 10−4 2.1 ∗ 10−4 2.0 ∗ 10−4 2.7 ∗ 10−4 5.3 ∗ 10−4 7.8 ∗ 10−4 5.6 ∗ 10−4 5.7 ∗ 10−4 1.0 ∗ 10−3 −4 −3 −4 −4 −3 A21 7.0 ∗ 10 1.1 ∗ 10 7.6 ∗ 10 8.4 ∗ 10 1.4 ∗ 10 1.0 ∗ 10−3 1.6 ∗ 10−3 1.0 ∗ 10−3 1.0 ∗ 10−3 2.1 ∗ 10−3 1.4 ∗ 10−3 2.2 ∗ 10−3 1.5 ∗ 10−3 1.7 ∗ 10−3 3.1 ∗ 10−3 1.5 ∗ 10−3 1.8 ∗ 10−3 2.2 ∗ 10−3 2.5 ∗ 10−3 2.3 ∗ 10−3 1.7 ∗ 10−3 2.1 ∗ 10−3 2.3 ∗ 10−3 2.6 ∗ 10−3 2.9 ∗ 10−3 −3 −3 −3 −3 −3 A31 2.1 ∗ 10 2.6 ∗ 10 2.6 ∗ 10 2.8 ∗ 10 3.7 ∗ 10 2.7 ∗ 10−3 3.0 ∗ 10−3 2.9 ∗ 10−3 3.0 ∗ 10−3 4.2 ∗ 10−3 3.3 ∗ 10−3 4.6 ∗ 10−3 3.3 ∗ 10−3 3.5 ∗ 10−3 6.1 ∗ 10−3 1.8 ∗ 10−2 2.0 ∗ 10−2 2.7 ∗ 10−2 3.0 ∗ 10−2 2.5 ∗ 10−2 1.9 ∗ 10−2 2.1 ∗ 10−2 2.7 ∗ 10−2 2.1 ∗ 10−2 2.7 ∗ 10−2 −2 −2 −2 −2 −2 A41 2.1 ∗ 10 2.4 ∗ 10 2.8 ∗ 10 3.1 ∗ 10 2.9 ∗ 10 2.7 ∗ 10−2 2.8 ∗ 10−2 3.1 ∗ 10−2 3.0 ∗ 10−2 3.5 ∗ 10−2 3.0 ∗ 10−2 4.1 ∗ 10−2 2.5 ∗ 10−2 2.7 ∗ 10−2 4.9 ∗ 10−2 4.0 ∗ 10−1 4.5 ∗ 10−1 6.1 ∗ 10−1 6.6 ∗ 10−1 5.6 ∗ 10−1 4.2 ∗ 10−1 4.7 ∗ 10−1 6.1 ∗ 10−1 6.5 ∗ 10−1 5.8 ∗ 10−1 −1 −1 −1 −1 −1 A51 4.5 ∗ 10 5.0 ∗ 10 6.1 ∗ 10 6.7 ∗ 10 6.4 ∗ 10 5.7 ∗ 10−1 5.9 ∗ 10−1 6.7 ∗ 10−1 6.7 ∗ 10−1 7.5 ∗ 10−1 6.2 ∗ 10−1 8.4 ∗ 10−1 5.0 ∗ 10−1 5.5 ∗ 10−1 1.0 5.7 ∗ 101 6.2 ∗ 101 8.5 ∗ 101 9.2 ∗ 101 7.8 ∗ 101 5.8 ∗ 101 6.5 ∗ 101 8.6 ∗ 101 9.1 ∗ 101 8.1 ∗ 101 1 1 1 1 1 A61 6.2 ∗ 10 6.9 ∗ 10 8.6 ∗ 10 9.4 ∗ 10 8.9 ∗ 10 7.9 ∗ 101 8.2 ∗ 101 9.3 ∗ 101 9.2 ∗ 101 1.0 ∗ 102 8.6 ∗ 101 1.2 ∗ 102 6.9 ∗ 101 7.7 ∗ 101 1.4 ∗ 102 where where C is a matrix with as many columns as constraints, and the columns of C have the same size as A. The maximum of −1 ss A = vect A, Q = RN ⊗ nqRq , the EM functional with respect to θ subject to the specified q linear constraints is then reached for      −1 (A.7) −  xs  ss  A = M − Q−1C C†Q−1C 1C† M − A M = vect  nqRq nqRq  . 0 , q q (A.9) R = Rss S,q diag q , Here “vect” builds a column vector with the entries of a ma- trix taken along its rows. Now we consider linear constraints where “diag” returns a diagonal matrix with the same diago- on the mixing matrix, specified as follows: nal entries as its input argument. Inthefreenoisecase, things are quite similar except that † C A − A0 = 0, (A.8) the noise covariance matrices RN,q do not factor out as nicely. Component Separation in Wavelet Space for CMB Analysis 2451

10−2 10−2

− 10−3 10 3

10−4 10−4 Mean interference-to-signal ratio Mean interference-to-signal ratio

− 10−5 10 5 −20 −6 −30 3 −20 −6 −30 3 Noise level in dB relative to nominal values Noise level in dB relative to nominal values Wavelet + no mask Fourier + no mask Wavelet + no mask Fourier + no mask Wavelet + mask Fourier + mask Wavelet + mask Fourier + mask Fourier + Hanning Fourier + Hanning

Figure 10: Comparison of the mean ISR for CMB as a function Figure 12: Comparison of the mean ISR for SZ as a function of of noise in five different configurations, namely, wSMICA without noise in five different configurations, namely, wSMICA without mask, wSMICA with mask, fSMICA without mask, and fSMICA mask, wSMICA with mask, fSMICA without mask, fSMICA with with mask, and fSMICA with Hanning apodizing window. mask, and fSMICA with Hanning apodizing window.

10−2 Then, the maximum of the EM functional with respect to θ subject to the specified linear constraints is again reached for

−1 † −1 −1 † A = M − Q C C Q C C M − A0 ,

− (A.12) 10 3 R = Rss . S,q diag q

These expressions of the reestimates of the mixing matrix can

Mean interference-to-signal ratio become algorithmically very simple when, for instance, the linear constraints to be dealt with affect separate lines of A, 10−4 −20 −6 −30 3 or even simpler when the constraints are such that the entries A ff Noise level in dB relative to nominal values of are a ected separately. Wavelet + no mask Fourier + no mask Wavelet + mask Fourier + mask Positivity constraints on the entries of A Fourier + Hanning Suppose a subset of entries of A are constrained to be posi- Figure 11: Comparison of the mean ISR for dust as a function tive. The maximization step of the EM algorithm on A alone, of noise in five different configurations, namely, wSMICA without again has to be modified. We suggest dealing with such con- mask, wSMICA with mask, fSMICA without mask, fSMICA with straints in a combinatorial way rephrasing the problem in mask, and fSMICA with Hanning apodizing window. terms of equality constraints. If the unconstrained maximum of the EM functional is not in the specified domain, then one has to look for a maximum on the borders of that do- The EM functional is again expressed as main: on a hyperplane, on the intersection of two, three, or more hyperplanes. One important point is that the maxi- 1 † A Φ(θ, θ) =− (A − M) Q(A − M)+constA, (A.10) mum of the EM functional with respect to subject to a 2 set of equality constraints will necessarily be lower than the maximum of the same functional considering any subset of where in this case these equality constraints. Hence, not all combinations need be explored, and a branch-and-bound-type algorithm is well Q = n R−1 ⊗ Rss M = Q−1 n R−1 Rxs . suited [28]. A straightforward extension allows to deal with q N,q q , vect q N,q q q q the case where a set of entries of the mixing matrix is con- (A.11) strained by upper and lower bounds. 2452 EURASIP Journal on Applied Signal Processing

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 13: (a)–(f) Estimated component maps obtained with SMICA and wSMICA, respectively. These estimates result from applying a Wiener filter in each frequency band or wavelet scale based on the optimized model parameters (see Section 2.3). (g)–(i) The initial source templates after applying the optimal Wiener filter obtained with SMICA, that is, the same as (a)–(c) but leaving out noise and residual contaminations. (j)–(l) Maps estimated using JADE [27]. The initial source maps are shown in Figure 2.

REFERENCES [2] A. Benoˆıt, P. Ade, A. Amblard, et al., “Archeops: a high resolution, large sky coverage balloon experiment for map- [1]G.F.Smoot,C.L.Bennett,A.Kogut,etal.,“Structureinthe ping cosmic microwave background anisotropies,” Astropar- COBE differential microwave radiometer first-year maps,” As- ticle Physics, vol. 17, no. 2, pp. 101–124, 2002. trophysical Journal Letters, vol. 396, no. 1, pp. L1–L5, 1992. [3] P. de Bernardis, P. Ade, J. J. Bock, et al., “A flat universe from Component Separation in Wavelet Space for CMB Analysis 2453

high-resolution maps of the cosmic microwave background [19] M. M. Ichir and A. Mohammad-Djafari, “Wavelet domain radiation,” Nature, vol. 404, no. 6781, pp. 955–959, 2000. blind image separation,” in Wavelets: Applications in Signal [4] S. Hanany, P. Ade, A. Balbi, et al., “MAXIMA-1: a measure- and Image Processing X, vol. 5207 of Proceedings of SPIE,pp. ment of the cosmic microwave background anisotropy on an- 361–370, San Diego, Calif, USA, November 2003. gularscalesof10arcminutesto5degrees,”Astrophysical Jour- [20] G. Patanchon, J. Delabrouille, and J.-F. Cardoso, “Source sep- nal Letters, vol. 545, pp. L5–L9, 2000. aration on astrophysical data sets from the WMAP satellite,” [5] C. L. Bennett, M. Halpern, G. Hinshaw, et al., “First year in Proc. 5th International Conference on Independent Com- Wilkinson microwave anisotropy probe (WMAP) observa- ponent Analysis (ICA ’04), pp. 1221–1228, Granada, Spain, tions: preliminary maps and basic results,” Astrophysical Jour- September 2004. nal Supplement Series, vol. 148, no. 1, pp. 1–27, 2003. [21] J.-F. Cardoso and D.-T. Pham, “Optimization issues in noisy [6] G. Jungman, M. Kamionkowski, A. Kosowsky, and D. N. Gaussian ICA,” in Proc. 5th International Conference on Inde- Spergel, “Cosmological-parameter determination with mi- pendent Component Analysis (ICA ’04), pp. 41–48, Granada, crowave background maps,” Physical Review D,vol.54,no.2, Spain, September 2004. pp. 1332–1344, 1996. [22] E. Hivon, K. M. Gorski,C.B.Netterfield,B.P.Crill,S.Prunet,´ [7] F. R. Bouchet and R. Gispert, “Foregrounds and CMB exper- and F. Hansen, “MASTER of the cosmic microwave back- iments: I. Semi-analytical estimates of contamination,” New ground anisotropy power spectrum: A fast method for sta- Astronomy, vol. 4, no. 6, pp. 443–479, 1999. tistical analysis of large and complex cosmic microwave back- [8] M. Tegmark and G. Efstathiou, “A method for subtracting ground data sets,” The Astrophysical Journal, vol. 567, no. 1, foregrounds from multi-frequency CMB sky maps,” Monthly pp. 2–17, 2002. Notices of the Royal Astronomical Society, vol. 281, no. 4, [23] J.-L. Starck, F. Murtagh, and A. Bijaoui, Image and Data Anal- pp. 1297–1314, 1996. ysis: The Multiscale Approach, Cambridge University Press, [9] M. P. Hobson, A. W. Jones, A. N. Lasenby, and F. R. Bouchet, Cambridge, UK, 1998. “Foreground separation methods for satellite observations of [24] M. Birkinshaw, “The Sunyaev-Zel’dovich effect,” Physics Re- the cosmic microwave background,” Monthly Notices of the ports, vol. 310, no. 2-3, pp. 97–195, 1999. Royal Astronomical Society, vol. 300, no. 1, pp. 1–29, 1998. [25] D.-T. Pham, “Blind separation of instantaneous mixture of [10] M. Tegmark, D. J. Eisenstein, W. Hu, and A. de Oliveira-Costa, sources via the Gaussian mutual information criterion,” Sig- “Foregrounds and forecasts for the cosmic microwave back- nal Processing, vol. 81, no. 4, pp. 855–870, 2001. ground,” The Astrophysical Journal, vol. 530, no. 1, pp. 133– [26] D. T. Pham, “Joint approximate diagonalization of posi- 165, 2000. tive definite hermitian matrices,” SIAM Journal on Matrix [11] C. Baccigalupi, L. Bedini, C. Burigana, et al., “Neural net- Analysis and Applications, vol. 22, no. 4, pp. 1136–1152, works and the separation of cosmic microwave background 2001. and astrophysical signals in sky maps,” Monthly Notices of the [27] J.-F. Cardoso, “High-order constrasts for independent com- Royal Astronomical Society, vol. 318, no. 3, pp. 769–780, 2000. ponent analysis,” Neural Computation, vol. 11, no. 1, pp. 157– [12] D. Maino, A. Farusi, C. Baccigalupi, et al., “All-sky astrophys- 192, 1999. ical component separation with fast independent component [28] P. Narendra and K. Fukunaga, “A branch and bound al- analysis (FASTICA),” Monthly Notices of the Royal Astronomi- gorithm for feature subset selection,” IEEE Trans. Comput., cal Society, vol. 334, no. 1, pp. 53–68, 2002. vol. 26, no. 9, pp. 917–922, 1977. [13] E. E. Kuruoglu, L. Bedini, M. T. Paratore, E. Salerno, and A. Tonazzini, “Source separation in astrophysical maps using in- dependent factor analysis,” Neural Networks, vol. 16, no. 3-4, Y. Moudden graduated in electrical en- pp. 479–491, 2003. gineering from SUPELEC, Gif-sur-Yvette, [14] H. Snoussi, G. Patanchon, J. Macias-Perez, A. Mohammad- France, and obtained an M.S. degree in Djafari, and J. Delabrouille, “Bayesian blind component sep- physics from the Universite´ de Paris VII, aration for cosmic microwave background observations,” in France, in 1997. He received a Ph.D. degree Bayesian Inference and Maximum Entropy Methods, MaxEnt in signal processing from the Universitede´ Workshops, pp. 125–140, American Institute of Physics, New Paris XI, Orsay, France. He was a Visitor at York, NY, USA, August 2001. UCLA in 2004 and is currently a Postdoco- [15] J.-F. Cardoso, H. Snoussi, J. Delabrouille, and G. Patanchon, toral Fellow with the CEA, Saclay, France, “Blind separation of noisy Gaussian stationary sources. Ap- working on applications of signal process- plication to cosmic microwave background imaging,” in Proc. ing to astronomy. His research interests include signal and image 11th European Signal Processing Conference (EUSIPCO ’02), processing, data analysis, and statistics and information theory. vol. 1, pp. 561–564, Toulouse, France, September 2002. [16] J. Delabrouille, J.-F. Cardoso, and G. Patanchon, “Multidetec- J.-F. Cardoso is with the French “Cen- tor multicomponent spectral matching and applications for tre National de la Recherche Scientifique” cosmic microwave background data analysis,” Monthly Notices (CNRS), in the Signal and Image Processing of the Royal Astronomical Society, vol. 346, no. 4, pp. 1089– ´ 1102, 2003. Department, Ecole Nationale Superieure´ [17] J.-F. Cardoso, “The three easy routes to independent com- des Tel´ ecommunications´ (ENST, Paris). His ponent analysis; contrasts and geometry,” in Proc. 3rd Inter- research area is statistical signal processing. national Conference on Independent Component Analysis (ICA Since 1989, he has been extensively working ’01), San Diego, Calif, USA, December 2001. on all aspects of blind source separation and [18] M. Zibulevsky and B. A. Pearlmutter, “Blind source separa- independent component analysis. In 2001, tion by sparse decomposition in a signal dictionary,” Neural he started a collaboration with astronomers Computation, vol. 13, no. 4, pp. 863–882, 2001. and cosmologists for the statistical analysis of astronomic data. 2454 EURASIP Journal on Applied Signal Processing

J.-L. Starck has a Ph.D. degree from Uni- versity Nice-Sophia Antipolis and a Habil- itation from University Paris XI. He was a Visitor at the European Southern Observa- tory (ESO) in 1993, at UCLA in 2004, and at Stanford’s Statistics Department in 2000 and 2005. He has been a researcher at CEA since 1994. His research interests include image processing and statistical methods in astrophysics and cosmology. He is also the author of two books entitled Image Processing and Data Analysis: TheMultiscaleApproach(Cambridge University Press, 1998), and Astronomical Image and Data Analysis (Springer, 2002).

J. Delabrouille graduated in 1991 from the “Ecole´ Nationale Superieure´ des Tel´ ecom-´ munications” with a major in aerospace telecommunication (1991). He then ob- tained in 1998 a Ph.D. degree in physics from the University of Chicago and a Ph.D. degree in astrophysics and space techniques from the University of Orsay. He is cur- rently employed by CNRS in the Astropar- ticle and Cosmology (APC) Laboratory in Paris, where he is the Head of the Data Analysis and Simulation Group. His main research interests are in cosmology and gravita- tion, cosmic microwave background observations, and developing methods for complex data analysis. EURASIP Journal on Applied Signal Processing 2005:15, 2455–2469 c 2005 Hindawi Publishing Corporation

Analysis of the Spatial Distribution of Galaxies by Multiscale Methods

J-L. Starck DAPNIA/SEDI-SAP, Service d’Astrophysique, CEA-Saclay, 91191 Gif-sur-Yvette, France Email: [email protected]

V. J. Mart´ınez Observatori Astronomic` de la Universitat de Val`encia, Edifici d’Instituts de Paterna, Apartat de Correus 22085, 46071 Val`encia, Spain Email: [email protected] D. L. Donoho Department of Statistics, Stanford University, Sequoia Hall, Stanford, CA 94305, USA Email: [email protected]

O. Levi Department of Statistics, Stanford University, Sequoia Hall, Stanford, CA 94305, USA Email: [email protected]

P. Q u e r r e DAPNIA/SEDI-SAP, Service d’Astrophysique, CEA-Saclay, 91191 Gif-sur-Yvette, France Email: [email protected]

E. Saar Department of Cosmology, Tartu Observatory, Toravere 61602, Estonia Email: [email protected]

Received 17 June 2004; Revised 17 February 2005

Galaxies are arranged in interconnected walls and filaments forming a cosmic web encompassing huge, nearly empty, regions between the structures. Many statistical methods have been proposed in the past in order to describe the galaxy distribution and discriminate the different cosmological models. We present in this paper multiscale geometric transforms sensitive to clusters, sheets, and walls: the 3D isotropic undecimated wavelet transform, the 3D ridgelet transform, and the 3D beamlet transform. We show that statistical properties of transform coefficients measure in a coherent and statistically reliable way, the degree of clustering, filamentarity, sheetedness, and voidedness of a data set. Keywords and phrases: galaxy distribution, large-scale structures, wavelet, ridgelet, beamlet, multiscale methods.

1. INTRODUCTION descriptors. This could be the distribution of galaxies of a specific type in deep redshift surveys of galaxies (or of clus- Galaxies are not uniformly distributed throughout the uni- ters of galaxies).1 In order to compare models of structure verse. Voids, filaments, clusters, and walls of galaxies can be formation, the different distribution of dark matter particles observed, and their distribution constrains our cosmologi- cal theories. Therefore we need reliable statistical methods to compare the observed galaxy distribution with theoretical 1Making 3D maps of galaxies requires knowing how far away each galaxy models and cosmological simulations. is from Earth. One way to get this distance is to use Hubble’s law for the expansion of the universe and to measure the shift, called redshift, to redder The standard approach for testing models is to define colors of spectral features in the galaxy spectrum. The greater the redshift, a point process which can be characterized by statistical the larger the velocity, and, by Hubble’s law, the larger the distance. 2456 EURASIP Journal on Applied Signal Processing in N-body simulations could be analyzed as well, with the 2. THE 3D WAVELET TRANSFORM same statistics. 2.1. The undecimated isotropic wavelet transform The two-point correlation function ξ(r) has been the pri- 3 mary tool for quantifying large-scale cosmic structure [1]. For each a>0, b1, b2, b3 ∈ R , the wavelet is defined by Assuming that the galaxy distribution in the Universe is a realization of a stationary and isotropic random process, ψ 3 −→ a,b1,b2,b3 : R R, the two-point correlation function can be defined from the     probability δP of finding an object within a volume ele- − / x1 − b1 x2 − b2 x3 − b3 ψa b b b x , x , x = a 3 2 · ψ , , . ment δV at distance r from a randomly chosen object or , 1, 2, 3 1 2 3 a a a position inside the volume: δP = n(1 + ξ(r))δV,where (1) n ξ r is the mean density of objects. The function ( )mea- Given a function f ∈ L2(R3), we define its wavelet coef- surestheclusteringpropertiesofobjectsinagivenvol- ficients by ume. It is zero for a uniform random distribution, pos- itive (resp., negative) for a more (resp., less) clustered W 4 −→ distribution. For a hierarchical clustering or fractal pro- f : R R,  cess, 1 + ξ(r) follows a power-law behavior with exponent   (2) D − ξ r ∼ r−γ W f a, b , b , b = ψ (x) f (x)dx. 2 3. Since ( ) for the observed galaxy dis- 1 2 3 a,b1,b2,b3 tribution, the correlation dimension for the range where ξ(r)  1isD  3 − γ. The Fourier transform of 2 Figure 1 shows an example of 3D wavelet function. the correlation function is the power spectrum. The di- It is standard to digitize the transform for data c(x, y, z) rect measurement of the power spectrum from redshift sur- with x, y, z ∈{1, ..., N} as follows. The wavelet transform veys is of major interest because model predictions are of a signal produces, at each scale j,asetofzero-meancoef- made in terms of the power spectral density. It seems clear ficient values {wj }.Letφ be a lowpass filter and we define that the real space power spectrum departs from a sin- j φj (x) = φ(2 x)andcj = c ∗ φj . Using an undecimated gle power-law ruling out simple unbounded fractal mod- isotropic wavelet decomposition [23], the set {wj } has the els [2]. The two-point correlation function can been gen- same number of pixels as the signal and this wavelet trans- eralized to the N-point correlation function [3, 4], and form is redundant. Furthermore, using a wavelet defined as all the hierarchy can be related with the physics responsi- the difference between the scaling functions of two successive ble for the clustering of matter. Nevertheless they are diffi- scales cult to measure, and therefore other related statistical mea- sures have been introduced as a complement in the sta-     1 x y z 1 x y z tistical description of the spatial distribution of galaxies ψ , , = φ(x, y, z) − φ , , ,(3) [5], such as the void probability function [6], the mul- 8 2 2 2 8 2 2 2 tifractal approach [7], the minimal spanning tree [8, 9, J 10], the Minkowski functionals [11, 12], or the func- the original cube c = c0 can be expressed as the sum of all the J r = − tion [13, 14] which is defined as the ratio ( ) (1 wavelet scales and the smoothed array cJ : G(r))/(1 − F(r)), where F is the distribution function of 3 the distance r of an arbitrary point in R to the near- J G  est object in the catalog, and is the distribution func- c x y z = cJ x y z + wj x y z. (4) r 0, , , , , , , , , tion of the distance of an object to the nearest ob- j=1 ject. Wavelets have also been used for analyzing the pro- jected 2D or the 3D galaxy distribution [15, 16, 17, 18, The set w ={w , w , ..., wJ , cJ } represents the wavelet trans- 19]. 1 2 form of the data. If we let W denote the wavelet transform New geometric multiscale methods have recently operator and N the pixels in c, the wavelet transform w emerged, the beamlet transform [20, 21] and the ridgelet (w = Wc)has(J +1)N pixels, for a redundancy factor of transform [22]; these allow us to represent data containing, J + 1. The scaling function φ is generally chosen as a spline respectively, filaments and sheets, while wavelets represent of degree 3, and the 3D implementation is based on three 1D well isotropic features (i.e., clusters in 3D). As each of these sets of (separable) convolutions. Like the scaling function φ, three transforms is tuned to a specific kind of feature, all of the wavelet function ψ is isotropic (see Figure 2). More de- them are useful and should be combined to describe a given tails can be found in [23, 24]. catalog. Sections 2, 3,and4 describe, respectively, the 3D wavelet transform, the 3D ridgelet transform, and the 3D beam- 3. THE 3D RIDGELET TRANSFORM let transform. It is shown in Section 5 through a set of ex- periments how these three 3D transforms can be combined 3.1. The 2D ridgelet transform in order to describe statistically the distribution of galax- The 2d continuous ridgelet transform of a function f ∈ ies. L2(R2)wasdefinedin[22] as follows. Analysis of the Spatial Distribution of Galaxies 2457

1 1

0.5 0.5

0 0

−0.5 −0.5 −10 −50 510 −10 −50 510 −10 −10 −5 −5 0 0 60 5 5 50 10 10 −10 −50 510 40 −10 −50 510 30 20 10

60 50 40 30 60 y 20 40 50 10 20 30 0 0 10 x

Figure 1: Example of wavelet function.

x3

(θ1, θ2) line θ1 line

θ2

θ1 x1 x1 θ1

x2 x2

(a) (b)

2 3 Figure 2: Definition of angle1 θ1 and θ2 in (a) R (2D case) and (b) R (3D case).

Select a smooth function ψ ∈ L2(R), satisfying admissi- Given a function f ∈ L2(R2), we define its ridgelet coeffi- bility condition cients by     2 ψˆ(ξ) R f : R3 −→ R, dξ < ∞,(5)  |ξ|   (7) R f a, b, θ = ψ (x) f (x)dx. 1 a,b,θ1 which holds if ψ has a sufficient decay and a vanishing mean ψ(t)dt = 0(ψ can be normalized so that it has unit energy  It has been shown [22] that the ridgelet transform is pre- 1/(2π) |ψˆ(ξ)|2dξ = 1). For each a>0, b ∈ R,andθ ∈ 1 cisely the application of a 1D wavelet transform to the slices [0, 2π[, we define the ridgelet by of the Radon transform (where the angular variable θ1 is con- 2 stant). This method is in a sense optimal to detect lines of ψa,b,θ : R −→ R, 1   the size of the image (the integration increase as the length of   − / x1 cos θ1 + x2 sin θ1 − b (6) the line). More details on the implementation of the digital ψa b θ x , x = a 1 2 · ψ . , , 1 1 2 a ridgelet transform can be found in [25]. 2458 EURASIP Journal on Applied Signal Processing

1

0.8

0.6

0.4

0.2

0

−0.2 −20 2

(b) (a)

Figure 3: Example of 2D ridgelet function.

(1) 3D-FFT.Computecˆ(k1, k2, k3), the 3D FFT of the cube c(i1, i2, i3). (2) Cartesian-to-spherical conversion. Using an interpolation scheme, substitute the sampled values of cˆ obtained on . 0 4 the Cartesian coordinate system (k1, k2, k3) with sampled θ θ ρ 2 values in a spherical coordinate system ( 1, 2, ). 0.2 N 2 N 1 (3) Extract lines.Extractthe3 lines (of size ) passing cˆ 0 through the origin and the boundary of . 0 (4) 1D-IFFT. Compute the 1D inverse FFT on each line. −1 (5) 1D-WT. Compute the 1D wavelet transform on each −0.2 −2 60 line. 50 40 Algorithm 1: The 3D ridgelet transform algorithm. 30 20 a> b ∈ θ ∈ π θ ∈ π 10 where 0, R, 1 [0, 2 [, and 2 [0, [. The ridgelet function is defined by 60 50 40 ψ 3 −→ 30 60 a,b,θ1,θ2 : R R, y 20 40 50   10 20 30 10 ψa b θ θ x x x 0 0 x , , 1, 2 1, 2, 3   x θ θ x θ θ x θ −b =a−1/2 ·ψ 1 cos 1 cos 2 + 2 sin 1 cos 2 + 3 sin 2 . Figure 4: Example of ridgelet function. a (9) Figure 4 shows an example of ridgelet function. It is a Figure 3 (left) shows an example ridgelet function. This wavelet function in the direction defined by the line (θ1, θ2), x θ x θ = function is constant along lines 1 cos + 2 sin const. and it is constant along the orthogonal plane to this line. Transverse to these ridges it is a wavelet (see Figure 3(b)). As in the 2D case, the 3D ridgelet transform can be built by extracting lines in the Fourier domain. Let c(i1, i2, i3)bea 3.2. From 2D to 3D cube of size (N, N, N); the steps can be seen in Algorithm 1 The 3D continuous ridgelet transform of a function f ∈ steps. L2(R3)isgivenby Figure 5 shows the 3D ridgelet transform flowgraph. The 3D ridgelet transform allows us to detect sheets in a cube.

R f : R4 −→ R, Local 3D ridgelet transform    (8) The ridgelet transform is optimal to find sheets of the size R f a, b, θ , θ = ψ (x) f (x)dx, of the cube. To detect smaller sheets, a partitioning must be 1 2 a,b,θ1,θ2 Analysis of the Spatial Distribution of Galaxies 2459

Euclidian space Fourier space

x3 kx3

(θ1,θ2) line ρ 1D WT 1D wavelet transform θ ,θ θ ,θ of ( 1 2) line θ2 ( 1 2) line Scale j+1 kx1 ⊥ θ ,θ x Plane ( 1 2) line ⊥ θ ,θ 1 Scale j Plane ( 1 2) line, Spatial radius ρ scale j θ1 Scale 1

x2 kx2 Lines Lines

Radon transform 1D wavelet transform

Figure 5: 3D ridgelet transform flowgraph.

introduced [26]. The cube c is decomposed into blocks of since any algorithm based on this set will have a complexity lower side-length b so that for a N × N × N cube, we count with lower bound of n6 and hence be unworkable for typical N/b blocks in each direction. After the block partitioning, the 3D data size. tranform is tuned for sheets of size b × b and of thickness aj , aj corresponding to the different dyadic scales used in the 4.2. The beamlet system transformation. 3 A dyadic cube C(k1, k2, k3, j) ⊂ [0, 1] is the collection of 3D points 4. THE 3D BEAMLET TRANSFORM   4.1. Definition   k k +1 k (k +1) x , x , x : 1 , 1 × 2 , 2 The X-ray transform of a continuum function f (x, y, z)with 1 2 3 2j 2j 2j 2j x y z ∈ 3 ( , , ) R is defined by    (11) k3 k3 +1 × j , j ,  2 2 (Xf)(L) = f (p)dp, (10) L j where 0 ≤ k1, k2, k3 < 2 for an integer j ≥ 0, called the scale. Such cubes can be viewed as descended from the unit cube C(0,0,0,0) = [0, 1]3 by recursive partitioning. Hence, where L is a line in R3,andp is a variable indexing points in the result of splitting C(0,0,0,0)inhalfalongeachaxisisthe the line. The transformation contains all line integrals of f . eight cubes C(k , k , k ,1)where ki ∈{0, 1} (see Figure 6), The beamlet transform (BT) can be seen as a multiscale digi- 1 2 3 splitting those in half along each axis we get the 64 subcubes tal X-ray transform. It is multiscale transform because, in ad- C(k , k , k ,2)where ki ∈{0, 1, 2, 3}, and if we decompose dition to the multiorientation and multilocation line integral 1 2 3 the unit cube into n3 voxels using a uniform n-by-n-by-n grid calculation, it integrated also over line segments at different J with n = 2 dyadic, then the individual voxels are the n3 cells lengths. The 3D BT is an extension to the 2D BT, proposed C(k , k , k , J), 0 ≤ k , k , k

C(1, 1, 1, 1) C(0, 0, 0, 0) 1 1 C(0, 1, 1, 1) 0.5 C(1, 0, 1, 1) 0.5 C(0, 1, 0, 1) 0 C(1, 0, 0, 1) 0 1 1 0.5 1 0.5 0.5 1 0 0.5 0 0 0 C(0, 1, 0, 1)

(a) (b)

Figure 6: Dyadic cubes.

(a) (b)

Figure 7: Examples of beamlets at two different scales: (a) scale 0 (coarsest scale) and (b) scale 1 (next finer scale). dyadic cube at scale j has a side-length of 2J− j voxels, we get The slopes and intercepts run through equispaced sets: O(24(J− j)) beamlets associated with the dyadic cube and a to- J− j j tal of O(24 ) = O(n4/2 )beamletsatscalej. If we sum the   O n4 2 n n number of beamlets at all scales we get ( ) beamlets. sx, sy, sz ∈ : =− , ..., , This gives a multiscale arrangement of line segments in n 2 2 − 1   (13) 3D with controlled cardinality of O(n4), the scale of a beam- n n tx, ty, tz ∈ : − , ..., . let is defined as the scale of the dyadic cube it belongs to 2 2 − 1 so lower scales correspond to longer line segments and finer scales correspond to shorter line segments. Figure 7 shows 2 beamlets at different scales. Beamlets√ in a data cube of side n have lengths between To index the beamlets in a given dyadic cube, we use n/2and 3n (the main diagonal). slope-intercept coordinates. For a data cube of n × n × n vox- els, consider a coordinate system with the cube center of mass Computational aspects at the origin and a unit length for a voxel. Hence, for (x, y, z) Beamlet coefficients are line integrals over the set of beam- in the data cube we have |x|, |y|, |z|≤n/2. We can consider lets. A digital 3D image can be regarded as a 3D piece-wise three kinds of lines: x-driven, y-driven, and z-driven, depend- constant function and each line integral is just a weighted ing on which axis provides the shallowest slopes. An x-driven sum of the voxel intensities along the corresponding line seg- line takes the form ment. Donoho and Levi [21] discuss in detail different ap- proaches for computing line integrals in a 3D digital image. ffi z = szx + tz, y = syx + ty (12) Computing the beamlet coe cients for real application data sets can be a challenging computational task since for a data cube with n × n × n voxels, we have to compute O(n4)co- with slopes sz, sy, and intercepts tz and ty. Here the slopes efficients. By developing efficient cache aware algorithms we |sz|, |sy|≤1. y-andz-driven lines are defined with an in- are able to handle 3D data sets of size up to n = 256 on a terchange of roles between x and y or z, as the case may be. typical desktop computer in less than a day running time. Analysis of the Spatial Distribution of Galaxies 2461

1 (1) 3D-FFT.Computecˆ(k1, k2, k3), the three-dimensional . 0 5 FFT of the cube c(i1, i2, i3). (2) Cartesian to spherical conversion. Using an interpolation 0 scheme, substitute the sampled values of cˆ obtained on the Cartesian coordinate system (k , k , k ) with sampled −0.5 1 2 3 −10 −50 510 values in a spherical coordinate system (θ1, θ2, ρ). −10 (3) Extract planes.Extractthe3N 2 planes (of size N × N) passing through the origin (each line used in the 3D −5 ridgelet transform defines a set of orthogonal planes; we 0 take the one including the origin). 60 5 (4) 2D-IFFT. Compute the 2D inverse FFT on each plane. 50 (5) 2D-WT. Compute the 2D wavelet transform on each 10 40 −10 −50 510 plane. 30 20 Algorithm 2: The 3D beamlet transform algorithm. 10

60 50 4.3. The FFT-based transformation 40 y 30 50 60 Let ψ ∈ L2(R2) be a smooth function satisfying a 2D vari- 20 30 40 10 20 ant of the admissibility condition, the 3D continuous beamlet 0 0 10 x transform of a function f ∈ L2(R3)isgivenby Figure 8: Example of beamlet function.

B f : R5 −→ R,  We will mention that in many cases there is no interest in the   (14) B f a, b , b , θ , θ = ψ (x) f (x)dx, coarsest scales coefficient that consumes most of the compu- 1 2 1 2 a,b,θ1,θ2 tation time and in that case the overall running time can be significantly faster. The algorithms can also be easily imple- mented on a parallel machine of a computer cluster using a where a>0, b1, b2 ∈ R, θ1 ∈ [0, 2π[, and θ2 ∈ [0, π[. The system such as MPI in order to solve bigger problems. beamlet function is defined by

ψ 3 −→ a,b1,b2,θ1,θ2 : R R,       (15) − / − x1 sin θ1 + x2 cos θ1 + b1 x1 cos θ1 cos θ2 + x2 sin θ1 cos θ2 − x3 sin θ2 + b2 ψa b b θ θ x , x , x = a 1 2 · ψ , . , 1, 2, 1, 2 1 2 3 a a

Figure 8 shows an example of beamlet function. It is con- The Fourier transform of the m-dimensional partial stant along lines of direction (θ1, θ2), and a 2D wavelet func- radon transform Radm f is related to the Fourier transform tion along plane orthogonal to this direction. of f (F f ) by the projection-slice relation The 3D beamlet transform can be built using the “gen-    f Fn−m Radm f k, km , ..., kn eralized projection-slice theorem” [27]. Let (x)beafunc- +1  +1  n Rad f m (16) tion on R ; and let m denote the -dimensional par- ={F f } kµm, km+1, ..., kn . tial Radon transform along the first m directions, m

2D wavelet transform θ ,θ ⊥ Euclidian space Fourier space of ( 1 2) line plane θ ,θ ⊥ ( 1 2) line plane at position (ρ2,ρ1) kx ρ ,ρ x3 3 at position ( 2 1) and scale 1 Plane ⊥ (θ1,θ2) line (θ1,θ2) line including origin ρ 2 2D yWT Scale 1

θ 2D xWT 2 kx 1 ρ1

x1 θ1

Scale j

x2 kx2 Planes Planes

Partial Radon transform 2D wavelet transform

Figure 9: 3D beamlet transform flowgraph.

5. EXPERIMENTS point in each of the vertices of a Voronoi tessellation of 1500 cells defined by 1500 nuclei distributed follow- 5.1. Experiment 1 ing a binomial process. There are 10 085 vertices lying We have simulated three data sets containing, respectively, a within a box of 141.4 h−1 Mpc side. cluster, a plane, and a line. To each data set, Poisson noise (ii) The second point pattern represents the galaxy po- has been added with eight different background levels. We sitions extracted from a cosmological Λ-CDM N- applied the three transforms on the 24 simulated data sets. body simulation. The simulation has been carried The coefficient distribution from each transformation was out by the Virgo consortium and related groups (see normalized using twenty realizations of a Poisson noise hav- http://www.mpa-garching.mpg.de/Virgo). The simu- ing the same number of counts as in the data. lation is a low-density (Ω = 0.3) model with cosmo- Figure 10 shows the maximum value of the normal- logical constant Λ = 0.7. It is, therefore, an approxi- ized distribution versus the noise level for our three simu- mation to the real galaxy distribution [29]. There are lated data sets. As expected, wavelets, ridgelets, and beam- 15 445 galaxies within a box with side 141.3 h−1 Mpc. lets are, respectively, the best for detecting clusters, sheets, Galaxies in this catalog have stellar masses exceeding 10 and lines. A feature can typically be detected with a very 2 × 10 M . high signal-to-noise ratio in a matched transform, while re- maining indetectible in some other transforms. For exam- Figure 11 shows the two simulated data sets, and ple, the wall is detected at more than 60σ by the ridgelet Figure 12 shows the two-point correlation function curve ff transform, but at less than 5σ by the wavelet transform. for the two-point processes. The two-point fields are di er- The line is detected almost at 10σ by the beamlet trans- ent, but as can be seen in Figure 12, both have very similar form, and with worse than 3σ detection level by wavelets. two-point correlation functions in a huge range of scales (2 These results show the importance of using several trans- decades). forms for an optimal detection of all features contained in We have applied the three transforms to each data set, j j j adataset. and we have calculated the skewness vector S = (sw, sr , sb) j j j and the kurtosis vector K = (kw, kr , kb)ateachscalej. 5.2. Experiment 2 j j j sw, sr , sb are, respectively, the skewness at scale j of the wavelet We use here two simulated data sets to illustrate the discrim- coefficients, the ridgelet coefficients, and the beamlet coeffi- inative power of multiscale methods. The first one is a sim- j j j cients. kw, kr , kb are, respectively, the kurtosis at scale j of the ulation from stochastic geometry. It is based on a Voronoi wavelet coefficients, the ridgelet coefficients, and the beamlet model. The second one is a mock catalog of the galaxy distri- ffi Λ coe cients. Figure 13 shows the kurtosis and the skewness bution drawn from a -CDM N-body cosmological model vectors of the two data sets at the two first scales. In contrast [29]. Both processes have very similar two-point correlation to the case with the two-point correlation function, this fig- functions at small scales, although they look quite different ure shows strong differences between the two data sets, par- and have been generated following completely different al- ticularly on the wavelet axis, which indicates that the second gorithms. data contains more or higher density clusters than the first (i) The first comes from Voronoi simulation. We locate a one. Analysis of the Spatial Distribution of Galaxies 2463

30 25 3 20 2.5 15 10 2 5 1.5 0 30 25 1

20 Normalized maximum value 15 30 . . 10 20 25 0 01 0 11 5 10 15 0 0 5 Noise level

Wavelet Ridgelet Beamlet

(a)

30 60 25 50 20 15 40 10 30 5 20 0 10 30 25

20 Normalized maximum value 15 30 . . 10 20 25 0 01 0 11 5 10 15 0 0 5 Noise level

Wavelet Ridgelet Beamlet

(b)

30 25 8 20 15 6 10 4 5 0 30 2 25 20 Normalized maximum value 15 30 0.01 0.11 10 20 25 5 10 15 0 0 5 Noise level Wavelet Ridgelet Beamlet

(c)

Figure 10: Poisson realization for a low noise level: simulation of cubes containing (a) a cluster , (b) a plane, and (c) a line. 2464 EURASIP Journal on Applied Signal Processing

140 140 120 120 100 100 80 80 60 60 40 40 20 20 140 140 120 120 100 100 80 140 80 140 60 100 120 60 100 120 40 60 80 40 60 80 20 20 40 20 20 40

(a) (b)

(c) (d)

Figure 11: Simulated data sets. (a) The Voronoi vertices point pattern and (b) the galaxies of the GIF Λ-CDM N-body simulation. (c) One 10 h−1 width slice of each data set.

100 000 5.3. Experiment 3 In this experiment, we have used a Λ-CDM simulation based 10 000 on the N-body hydrodynamical code, RAMSES [30]. The simulation uses an adaptive mesh refinement (AMR) tech- 1000 nique, with a tree-based data structure allowing recursive grid refinements on a cell-by-cell basis. The simulated data 3 . × 7

) were obtained using 256 particles and 4 1 10 cells in the r

( 100 3 ξ AMR grid, reaching a formal resolution of 8192 . The box size was set to 100h−1 Mpc, with the following cosmological 10 parameters:

1 Ωm = 0.3, Ωλ = 0.7, Ωb = 0.039, (17) h = 0.7, σ8 = 0.92. 0.1 0.01 0.11 10 r We used the results of this simulation at six different redshifts (z = 5, 3, 2, 1, 0.5, 0). Figure 14 shows a projec- Voronoi Λ-CDM (GIF) tion of the simulated cubes along one axis. We applied the 3D wavelet transform, the 3D beamlet transform, and the σ2 σ2 Figure 12: The two-point correlation function of the Voronoi ver- 3D ridgelet transform on the six data sets. Let W,z,j , R,z,j , σ2 tices process and the GIF Λ-CDM N-body simulation. They are very B,z,j denote the variance of the wavelet, the ridgelet, and the similar in the range [0.02, 2] h−1 Mpc. beamlet coefficients of the scale j at redshift z. Analysis of the Spatial Distribution of Galaxies 2465

3 6 2.5 5

2 4 1.5

Wavelet Wavelet 3

1 2

0.5 1

0 0 . . 4 6 5 7 . 3 . 3 8 . Ridgelet . . 4 3 Ridgelet . . 5 5 1 5 . 2 9 1 9 . 3 6 0 0 1 4 0 0 1 8 Beamlet Beamlet

Simulated file 1 Simulated file 1

Simulated file 2 Simulated file 2

(a) (b)

25 15 20

10 15 Wavelet Wavelet 10 5 5

0 0 . 10.8 6 4 . . 4 3 . 7 2 . Ridgelet . . 17 5 Ridgelet . . 18 8 2 1 . 11 7 3 6 . 12 6 0 0 5 8 0 0 6 3 Beamlet Beamlet

Simulated file 1 Simulated file 1

Simulated file 2 Simulated file 2

(c) (d)

Figure 13: Skewness and kurtosis for the two simulated data sets: (a) skewness, scale 1, (b) skewness, scale 2, (c) kurtosis, scale 1, and (d) kurtosis, scale 2.

Figure 15, shows, respectively, from top to bottom, the The Mw/b curve does not show much evolution, while the P z j = σ2 M wavelet spectrum w( , ) W,z,j , the beamlet spectrum w/r curve presents a significant slope. This shows that the P z j = σ2 P z j = b( , ) B,z,j , and the ridgelet spectrum r ( , ) beamlet transform is more sensitive to clustering than the σ2 ridgelet transform. This is not surprising since the support R,z,j . In order to see the evolution of matter distribution with redshift and scale, we calculate the ratio Mw/b(j, z) = of beamlets is much smaller than the support of ridgelets. M z Pw(z, j)/Pb(z, j)andMw/r(j, z) = Pw(z, j)/Pr (z, j). w/r increases with , reflecting the cluster formation. The Figure 16 shows the Mw/b and Mw/r curves as a function combination of multiscale transformations gives clear in- −1 −1 of z and Figure 17 shows the Mw/b and Mw/r curves as a func- formation about the degree of clustering, filamentarity, and tion of the scale number j. sheetedness. 2466 EURASIP Journal on Applied Signal Processing

2.2 × 107

4.6 × 104

Log (ratio) 1 × 102

2.2 × 10−1 024 Scale number z = 5 z = 1 z = 3 z = 0.5 z = 2 z = 0

(a)

104

102

Log (ratio) 100

10−2 012 Figure 14: Λ-CDM simulation at different redshifts. Scale number z = 5 z = 1 z = 3 z = 0.5 z = 2 z = 0 6. CONCLUSION (b) We have introduced in this paper a new method to analyze catalogs of galaxies based on the distribution of coefficients 105 obtained by several geometric multiscale transforms. We have introduced two new multiscale decompositions, the 3D ridgelet transform and the 3D beamlet transform, matched to sheetlike and filament features, respectively. We 103 described fast implementations using FFTs. We showed that combining the information related to wavelet, ridgelet, and ffi beamlet coe cients leads to a new description of point cat- Log (ratio) 101 alogs. In this paper, we described transform coefficients us- ing skewness and kurtosis, but another recent statistic esti- mator such the higher criticism [31] could be used as well. Each multiscale transform is very sensitive to one kind of fea- 10−1 012 ture: wavelets to clusters; beamlets to filaments; and ridgelets Scale number to walls. A similar method has been proposed for analyz- ing CMB maps [32] where both the curvelet and the wavelet z = 5 z = 1 z = 3 z = 0.5 transforms were used for the detection and the discrimina- z = 2 z = 0 tion of non-Gaussianities. This combined multiscale statistic is very powerful and we have shown that two data sets with (c) identical two-point correlation functions are clearly distin- guished by our approach. These new tools lead to better con- Figure 15: (a) Wavelet spectrum, (b) beamlet spectrum, and straints on cosmological models. (c) ridgelet spectrum at different redshifts. Analysis of the Spatial Distribution of Galaxies 2467

4 1

. 3 0 63 Ratio Log (ratio) 0.4 2

0.25 1 012 543210 Scale number z z = 5 z = 1 Scale 3 z = 3 z = 0.5 Scale 2 z = 2 z = 0 Scale 1 (a) (a)

0.933 6.31

. 0.667 3 98 Ratio Log (ratio) 2.51 0.4

1.58 0.133 012 543210 Scale number z z = 5 z = 1 Scale 3 z = 3 z = 0.5 Scale 2 z = 2 z = 0 Scale 1 (b) (b)

Figure 16: (a) Wavelet/beamlet Mw/b(z, j) and (b) wavelet/ridgelet Figure 17: (a) Beamlet/wavelet 1/Mw/b(z, j) and (b) ridgelet/ Mw/r(z, j) curves for the scale number j equal to 1,2, and 3. wavelet 1/Mw/r(z, j)curvesatdifferent redshifts.

ACKNOWLEDGMENTS sky survey,” The Astrophysical Journal, vol. 606, no. 2, part 1, We wish to thank Romain Teyssier for giving us the Λ- pp. 702–740, 2004. CDM simulated data used in the third experiment. This work [3] S. Szapudi and A. S. Szalay, “A new class of estimators for the has been supported by the Spanish MCyT project AYA2003- N-point correlations,” Astrophysical Journal Letters, vol. 494, 08739-C02-01 (including FEDER), the Generalitat Valen- no. 1, pp. L41–L44, 1998. [4] P. J. E. Peebles, “The galaxy and mass N-point correlation ciana project GRUPOS03/170, the National Science Founda- functions: a blast from the past,” in Historical Development of tion Grant DMS-01-40587 (FRG), and the Estonian Science Modern Cosmology,V.J.Mart´ınez, V. Trimble, and M. J. Pons- Foundation Grant 4695. Border´ıa, Eds., vol. 252 of ASP Conference Series, Astronomi- cal Society of the Pacific, San Francisco, Calif, USA, 2001. REFERENCES [5]V.J.Mart´ınez and E. Saar, Statistics of the Galaxy Distribution, Chapman & Hall/CRC press, Boca Raton, Fla, USA, 2002. [1]P.J.E.Peebles,The Large-Scale Structure of the Universe, [6] S. Maurogordato and M. Lachieze-Rey, “Void probabilities Princeton University Press, Princeton, NJ, USA, 1980. in the galaxy distribution—scaling and segrega- [2]M.Tegmark,M.R.Blanton,M.A.Strauss,etal.,“Thethree- tion,” The Astrophysical Journal, vol. 320, pp. 13–25, Septem- dimensional power spectrum of galaxies from the sloan digital ber 1987. 2468 EURASIP Journal on Applied Signal Processing

[7] V. J. Mart´ınez, B. J. T. Jones, R. Dom´ınguez-Tenreiro, and Data Analysis: The Multiscale Approach, Cambridge Univer- R. van de Weygaert, “Clustering paradigms and multifractal sity Press, Cambridge, UK, 1998. measures,” The Astrophysical Journal, vol. 357, no. 1, pp. 50– [24] J.-L. Starck and F. Murtagh, “Astronomical Image and Data 61, 1990. Analysis,” @#pages, Springer, Berlin, Germany, 2002. [8] S. P. Bhavsar and R. J. Splinter, “The superiority of the min- [25] J.-L. Starck, E. J. Candes,` and D. L. Donoho, “The curvelet imal spanning tree in percolation analyses of cosmological transform for image denoising,” IEEE Trans. Image Processing, data sets,” Monthly Notices of the Royal Astronomical Society, vol. 11, no. 6, pp. 670–684, 2002. vol. 282, no. 4, pp. 1461–1466, 1996. [26] E. J. Candes,` “Harmonic analysis of neural networks,” Applied [9] L. G. Krzewina and W. C. Saslaw, “Minimal spanning tree and Computational Harmonic Analysis, vol. 6, no. 2, pp. 197– statistics for the analysis of large-scale structure,” Monthly No- 218, 1999. tices of the Royal Astronomical Society, vol. 278, no. 3, pp. 869– [27] P. C. Lauterbur and Z.-O. Liang, Principle of Magnetic Reso- 876, 1996. nance Imaging, IEEE Press, New York, NY, USA, 2000. [10] A. G. Doroshkevich, D. L. Tucker, R. Fong, V. Turchaninov, [28] D. L. Donoho, O. Levi, J.-L. Starck, and V. J. Mart´ınez, “Multi- and H. Lin, “Large-scale galaxy distribution in the Las Cam- scale geometric analysis for 3D catalogs,” in Astronomical Data panas Redshift Survey,” Monthly Notices of the Royal Astro- Analysis II, J.-L. Starck and F. Murtagh, Eds., vol. 4847 of Pro- nomical Society, vol. 322, no. 2, pp. 369–388, 2001. ceedings of SPIE, pp. 101–111, Waikoloa, Hawaii, USA, August [11] K. R. Mecke, T. Buchert, and H. Wagner, “Robust morpho- 2002. logical measures for large-scale structure in the universe,” As- ff tronomy & Astrophysics, vol. 288, no. 3, pp. 697–704, 1994. [29] G. Kau mann, J. M. Colberg, A. Diaferio, and S. D. M. White, “Clustering of galaxies in a hierarchical universe—I. Methods [12] M. Kerscher, “Statistical analysis of large-scale structure in the z = universe,” in Statistical Physics and Spatial Statistics: The Art of and results at 0,” Monthly Notices of the Royal Astronomical Analyzing and Modeling Spatial Structures and Pattern Forma- Society, vol. 303, no. 1, pp. 188–206, 1999. tion,K.MeckeandD.Stoyan,Eds.,vol.554ofLecture Notes [30] R. Teyssier, “Cosmological hydrodynamics with adaptive in Physics, pp. 36–71, Springer, Berlin, Germany, 2000. mesh refinement—A new high resolution code called RAM- [13] M. N. M. Van Lieshout and A. J. Baddeley, “A nonparamet- SES,” Astronomy & Astrophysics, vol. 385, no. 1, pp. 337–364, ric measure of spatial interaction in point patterns,” Statistica 2002. Neerlandica, vol. 50, no. 3, pp. 344–361, 1996. [31] D. L. Donoho and J. Jin, “Higher criticism for detecting sparse [14]M.Kerscher,M.J.Pons-Border´ıa, J. Schmalzing, et al., “A heterogeneous mixtures,” Tech. Rep., Statistics Department, global descriptor of spatial pattern interaction in the galaxy Stanford University, Stanford, Calif, USA, 2002. distribution,” The Astrophysical Journal, vol. 513, no. 2, part 1, [32] J.-L. Starck, N. Aghanim, and O. Forni, “Detection and pp. 543–548, 1999. discrimination of cosmological non-Gaussian signatures by [15] E. Escalera, E. Slezak, and A. Mazure, “New evidence for multi-scale methods,” Astronomy & Astrophysics, vol. 416, subclustering in the Coma cluster using the wavelet analy- no. 1, pp. 9–17, 2004. sis,” Astronomy & Astrophysics, vol. 264, no. 2, pp. 379–384, 1992. [16] E. Slezak, V. de Lapparent, and A. Bijaoui, “Objective detec- J-L. Starck has a Ph.D. degree from the tion of voids and high-density structures in the first CfA red- University Nice-Sophia Antipolis and a Ha- shift survey slice,” The Astrophysical Journal, vol. 409, no. 2, bilitation degree from the University Paris pp. 517–529, 1993. XI. He was a Visitor at the European [17] V. J. Mart´ınez, S. Paredes, and E. Saar, “Wavelet analysis of the Southern Observatory (ESO) in 1993, at multifractal character of the galaxy distribution,” Monthly No- UCLA in 2004, and at the Statistics De- tices of the Royal Astronomical Society, vol. 260, no. 2, pp. 365– partment, Stanford University, in 2000 and 375, 1993. 2005. He has been a researcher at the service [18] A. Pagliaro, V. Antonuccio-Delogu, U. Becciani, and M. Gam- d’Astrophysique, CEA-sacloy, since 1994. bera, “Substructure recovery by three-dimensional discrete His research interests include image pro- wavelet transforms,” Monthly Notices of the Royal Astronom- cessing, statistical methods in astrophysics, and cosmology. He is ical Society, vol. 310, no. 3, pp. 835–841, 1999. also author of two books entitled Image Processing and Data Analy- [19] T. Kurokawa, M. Morikawa, and H. Mouri, “Scaling analysis sis: the Multiscale Approach,andAstronomical Image and Data Anal- of galaxy distribution in the Las Campanas Redshift Survey ysis. data,” Astronomy & Astrophysics, vol. 370, no. 2, pp. 358–364, 2001. V. J. Mar t´ınez received his Ph.D. degree in [20] D. L. Donoho and X. Huo, “Beamlets and multiscale image mathematics from the University of Valen- analysis,” in Multiscale and Multiresolution Methods, vol. 20 of cia in 1989, after preparing his thesis at Lecture Notes in Computational Science and Engineering,pp. NORDITA, Copenhagen, where his Adviser 149–196, Springer, New York, NY, USA, 2001. was Bernard Jones. In 1991, he got a perma- [21] D. L. Donoho and O. Levi, “Fast x-ray and beamlet trans- forms for three-dimensional data,” in Modern Signal Process- nent position at the University of Valencia ing, D. Rockmore and D. Healy, Eds., vol. 46 of Mathematical as an Associate Professor in astronomy and Science Research Institute Publications, Cambridge University astrophysics. He is currently the Director of Press, Cambridge, UK, March 2002. the Observatori Astronomic´ dela Universi- [22] E. J. Candes` and D. L. Donoho, “Ridgelets: the key to high tat de Valencia. He works on the statistical dimensional intermittency?” Philosophical Transactions of the properties of the large-scale structure of the Universe. He is also Royal Society of London A, vol. 357, pp. 2495–2509, September involved in observational projects to study the mass and extent of 1999. dark halos around elliptical galaxies. He is the coauthor, with E. [23] J.-L. Starck, F. Murtagh, and A. Bijaoui, Image Processing and Saar, of the book Statistics of the Galaxy Distribution. Analysis of the Spatial Distribution of Galaxies 2469

D. L. Donoho is Anne T. and Robert M. Bass Professor in the hu- manities and sciences at Stanford University. He received his A.B. degree in statistics at Princeton University where his thesis Adviser was John W. Tukey and his Ph.D. degree in statistics at Harvard University, where his thesis adviser was Peter J. Huber. He is a Mem- ber of the US National Academy of Sciences and of the American Academy of Arts and Sciences.

O. Levi is a faculty member in the De- partment of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer Sheva, Israel. He received the B.S. degree in mathematics and industrial engineering and the M.S. degree in in- dustrial engineering from Ben-Gurion Uni- versity, Israel. He earned his Ph.D. degree in scientific computing and computational mathematics from Stanford University in 2004. His thesis title was entitled “Multiscale geometric analysis of 3-D data sets” and he worked under the supervision of Professor D. Donoho. His fields of interest include scientific computing, matrix computation, discrete Fourier analysis, and MSG algorithms.

P. Q u er re has an Engineering degree from the Institut National Polytechnique de Toulouse. He has been working for the last five years at the Service d’Astrophysique, Saclay, on image and signal processing ap- plications. He got involved in the develop- ment of new methods of 2D and 3D redun- dant multiscale transforms. His main re- search interests are signal and image pro- cessing algorithms.

E. Saar received his Cand. Sci. (Ph.D.) de- gree in theoretical physics at Tartu Univer- sity in 1971, and his Dr. Astr. degree at the same university in 1991. He has worked all his life at Tartu Observatory, at positions ranging from a Programmer to a Vice Di- rector. Currently he is the Head of the Cos- mology Department. He has written papers on general relativity (inhomogeneous cos- mologies), physics of galaxies (spiral struc- ture, gaseous halos), atmospheric physics (space experiments), and cosmology (dark matter, large-scale structure). His main interests at present are the statistics of the cosmological large-scale structure and of the CMB, and numerical modelling of the formation of this structure. He is a coauthor, with V. J. Mart´ınez, of the book Statis- tics of the Galaxy Distribution. EURASIP Journal on Applied Signal Processing 2005:15, 2470–2485 c 2005 Hindawi Publishing Corporation

Cosmological Non-Gaussian Signature Detection: Comparing Performance of Different Statistical Tests

J. Jin Department of Statistics, Purdue University, 150 N. University Street, West Lafayette, IN 47907, USA Email: [email protected]

J.-L. Starck DAPNIA/SEDI-SAP, Service d’Astrophysique, CEA-Saclay, 91191 Gif-sur-Yvette Cedex, France Email: [email protected]

D. L. Donoho Department of Statistics, Stanford University, Sequoia Hall, Stanford, CA 94305, USA Email: [email protected]

N. Aghanim IAS-CNRS, Universit´e Paris Sud, Batimentˆ 121, 91405 Orsay Cedex, France Email: [email protected] Division of Theoretical Astronomy, National Astronomical Observatory of Japan, Osawa 2-21-1, Mitaka, Tokyo 181-8588, Japan O. Forni IAS-CNRS, Universit´e Paris Sud, Batimentˆ 121, 91405 Orsay Cedex, France Email: [email protected]

Received 30 June 2004

Currently, it appears that the best method for non-Gaussianity detection in the cosmic microwave background (CMB) consists in calculating the kurtosis of the wavelet coefficients. We know that wavelet-kurtosis outperforms other methods such as the bis- pectrum, the genus, ridgelet-kurtosis, and curvelet-kurtosis on an empirical basis, but relatively few studies have compared other transform-based statistics, such as extreme values, or more recent tools such as higher criticism (HC), or proposed “best possible” choices for such statistics. In this paper, we consider two models for transform-domain coefficients: (a) a power-law model, which seems suited to the wavelet coefficients of simulated cosmic strings, and (b) a sparse mixture model, which seems suitable for the curvelet coefficients of filamentary structure. For model (a), if power-law behavior holds with finite 8th moment, excess kurtosis is an asymptotically optimal detector, but if the 8th moment is not finite, a test based on extreme values is asymptotically optimal. For model (b), if the transform coefficients are very sparse, a recent test, higher criticism, is an optimal detector, but if they are dense, kurtosis is an optimal detector. Empirical wavelet coefficients of simulated cosmic strings have power-law character, infinite 8th moment, while curvelet coefficients of the simulated cosmic strings are not very sparse. In all cases, excess kurtosis seems to be an effective test in moderate-resolution imagery. Keywords and phrases: cosmology, cosmological microwave background, non-Gaussianity detection, multiscale method, wavelet, curvelet.

1. INTRODUCTION as measured by the FIRAS experiment on board COBE satel- lite [2]. The DMR experiment, again on board COBE, de- The cosmic microwave background (CMB), discovered in tected and measured angular small fluctuations of this tem- 1965 by Penzias and Wilson [1], is a relic of radiation emit- perature, at the level of a few tens of microkelvins, and at ted some 13 billion years ago, when the universe was about angular scale of about 10 degrees [3]. These so-called tem- 370 000 years old. This radiation exhibits characteristic of an perature anisotropies were predicted as the imprints of the almost perfect blackbody at a temperature of 2.726 initial density perturbations which gave rise to present large Comparing Different Statistics in Non-Gaussian 2471

per strings, or topological defects predict non-Gaussian con- tributions to the initial fluctuations [12, 13, 14]. The sta- tistical properties of the CMB should discriminate mod- els of the early universe. Nevertheless, secondary effects like the inverse Compton scattering, the Doppler effect, lensing, and others add their own contributions to the total non- Gaussianity. All these sources of non-Gaussian signatures might have different origins and thus different statistical and morpho- logical characteristics. It is therefore not surprising that a Figure 1: Courtesy of the WMAP team (reference to the website). large number of studies have recently been devoted to the All sky map of the CMB anisotropies measured by the WMAP satel- subject of the detection of non-Gaussian signatures. Many lite. approaches have been investigated: Minkowski functionals and the morphological statistics [15, 16], the bispectrum scale structures as galaxies and clusters of galaxies. This rela- (3-point estimator in the Fourier domain) [17, 18, 19], tion between the present-day universe and its initial condi- the trispectrum (4-point estimator in the Fourier domain) tions has made the CMB radiation one of the preferred tools [20], wavelet transforms [21, 22, 23, 24, 25, 26, 27], and of cosmologists to understand the history of the universe, the curvelet transform [27]. Different wavelet methods have the formation and evolution of the cosmic structures, and been studied, such as the isotropic a` trous algorithm [28] physical processes responsible for them and for their cluster- and the biorthogonal wavelet transform [29]. (The biorthog- ing. onal wavelet transform was found to be the most sensitive As a consequence, the last several years have been a par- to non-Gaussianity [27].) In [27, 30], it was shown that the ticularly exciting period for observational cosmology focus- wavelet transform was a very powerful tool to detect the non- ing on the CMB. With CMB balloon-borne and ground- Gaussian signatures. Indeed, the excess kurtosis (4th mo- based experiments such as TOCO [4], BOOMERanG [5], ment) of the wavelet coefficients outperformed all the other MAXIMA [6], DASI [7], and Archeops [8],afirmdetection methods (when the signal is characterised by a nonzero 4th of the so-called “first peak” in the CMB anisotropy angular moment). power spectrum at the degree scale was obtained. This detec- Nevertheless, a major issue of the non-Gaussian studies tion has been very recently confirmed by the WMAP satel- in CMB remains our ability to disentangle all the sources of lite [9, 10] (see Figure 1), which detected also the second and non-Gaussianity from one another. Recent progress has been third peaks. WMAP satellite mapped the CMB temperature made on the discrimination between different possible ori- fluctuations with a resolution better than 15 arcminutes and gins of non-Gaussianity. Namely, it was possible to separate a very good accuracy marking the starting point of a new the non-Gaussian signatures associated with topological de- era of precision cosmology that enables us to use the CMB fects (cosmic strings (CS)) from those due to Doppler effect anisotropy measurements to constrain the cosmological pa- of moving clusters of galaxies (both dominated by a Gaussian rameters and the underlying theoretical models. CMB field) by combining the excess kurtosis derived from In the framework of adiabatic cold dark matter models, both the wavelet and the curvelet transforms [27]. the position, amplitude, and width of the first peak indeed This success argues for us to construct a “toolkit” of well- provide strong evidence for the inflationary predictions of a understood and sensitive methods for probing different as- flat universe and a scale-invariant primordial spectrum for pects of the non-Gaussian signatures. the density perturbations. Furthermore, the presence of sec- In that spirit, the goal of the present study is to consider ond and third peaks confirms the theoretical prediction of the advantages and limitations of detectors which apply kur- acoustic oscillations in the primeval plasma and sheds new tosis to transform coefficients of image data. We will study light on various cosmological and inflationary parameters, plausible models for transform coefficients of image data and in particular, the baryonic content of the universe. The accu- compare the performance of tests based on kurtosis of trans- rate measurements of both the temperature anisotropies and form coefficients to other types of statistical diagnostics. polarised emission of the CMB will enable us in the very near At the center of our analysis are the following two facts: ff future to break some of the degeneracies that are still a ect- (A) the wavelet/curvelet coefficients of CMB are Gaussian ing parameter estimation. It will also allow us to probe more (we implicitly assume the most simple inflationary directly the inflationary paradigm favored by the present ob- scenario); servations. (B) the wavelet/curvelet coefficients of topological defect Testing the inflationary paradigm can also be achieved and Doppler effect simulations are non-Gaussian. through detailed study of the statistical nature of the CMB anisotropy distribution. In the simplest inflation models, We develop tests for non-Gaussianity for two models of the distribution of CMB temperature fluctuations should be statistical behavior of transform coefficients. The first, better Gaussian, and this Gaussian field is completely determined suited for wavelet analysis, models transform coefficients of by its power spectrum. However, many models such as mul- cosmic strings as following a power law. The second, theoret- tifield inflation (e.g., [11] and the references therein), su- ically better suited for curvelet coefficients, assumes that the 2472 EURASIP Journal on Applied Signal Processing salient features of interest are actually filamentary (it can be the transform coefficients of the non-Gaussian component residual strips due to a nonperfect calibration), which gives N,andW is some unknown symmetrical distribution. Here the curvelet coefficients a sparse structure. without loss of generality, we assume the standard deviation We review some basic ideas from detection theory, such for both zi and wi is 1. as likelihood ratio detectors, and explain why we prefer non- Phrased in statistical terms, the problem of detecting the parametric detectors, valid across a broad range of assump- existence of a non-Gaussian component is equivalent to dis- tions. criminating between the hypotheses In the power-law setting, we consider two kinds of non- parametric detectors. The first, based on kurtosis, is asymp- H0 : Xi = zi,(2) totically optimal in the class of weakly dependent symmetric H X = − λ · z λ · w <λ< non-Gaussian contamination with finite 8th moments. The 1 : i 1 i + i,0 1, (3) second, the Max, is shown to be asymptotically optimal in the N ≡ λ ≡ H class of weakly dependent symmetric non-Gaussian contam- and 0isequivalentto 0. We call 0 the null hypoth- H ination with infinite 8th moment. While the evidence seems esis and 1 the alternative hypothesis. W λ to be that wavelet coefficients of CS have about 6 existing When both and are known, then the optimal test for moments, indicating a decisive advantage for extreme-value problem (2)-(3) is simply the Neyman-Pearson likelihood λ = λ statistics, the performance of kurtosis-based tests and Max- ratio test (LRT) [32,page74].Thesizeof n for which H H based tests on moderate sample sizes (e.g., 64 K transform reliable discrimination between 0 and 1 is possible can be coefficients) does not follow the asymptotic theory; excess derived using asymptotics. If we assume that the tail proba- W kurtosis works better at these sample sizes. bility of decays algebraically, In the sparse-coefficients setting, we consider kurtosis, α x P{|W| >x}=Cα Cα the Max, and a recent statistic called higher criticism (HC) xlim→∞ , is a constant (4) [31]. Theoretical analysis suggests that curvelet coefficients n1/4 of filamentary features should be sparse, with about sub- (we say W has a power-law tail), and we calibrate λ to de- ffi n ffi stantial nonzero coe cients out of coe cients in a sub- cay with n, so that increasing amounts of data are offset by band; this level of sparsity would argue in favor of Max/HC. increasingly hard challenges: However, empirically, the curvelet coefficients of actual CS −r simulations are not very sparse. It turns out that kurtosis out- λ = λn = n ,(5) performs Max/HC in simulation. Summarizing, the work reported here seems to show then there is a threshold effect for the detection problem (2)- that for all transforms considered, the excess kurtosis out- (3). In fact, define performs alternative methods despite their strong theoreti-  cal motivation. A reanalysis of the theory supporting those   2 α ≤ methods shows that the case for kurtosis can also be justi- α, 8, ρ∗(α) = (6) fied theoretically based on observed statistical properties of 1  1  , α>8, the transform coefficients not used in the original theoretic 4 analysis. then, as n →∞, LRT is able to reliably detect for large n when r<ρ∗(α), and is unable to detect when r>ρ∗(α); this is 2. DETECTING FAINT NON-GAUSSIAN SIGNALS 1 1 proved in [33]. Since LRT is optimal, it is not possible for SUPERPOSED ON A GAUSSIAN SIGNAL r>ρ∗ α any statistic to reliably detect when 1 ( ). We call the r = ρ∗ α α r The superposition of a non-Gaussian signal with a Gaussian curve 1 ( ) in the - plane the detection boundary;see signal can be modeled as Y = N + G,whereY is the ob- Figure 2. served image, N is the non-Gaussian component, and G is In fact, when r<1/4, asymptotically LRT is able to reli- the Gaussian component. We are interested in using trans- ably detect whenever W has a finite 8th moment, even with- form coefficients to test whether N ≡ 0ornot. out the assumption that W has a power-law tail. Of course, the case that W has an infinite 8th moment is more compli- 2.1. Hypothesis testing and likelihood ratio test cated, but if W has a power-law tail, then LRT is also able to r< /α Transform coefficients of various kinds (Fourier, wavelet, reliably detect if 2 . etc.) have been used for detecting non-Gaussian behavior in Despite its optimality, LRT is not a practical procedure. λ numerous studies. Let X , X , ..., Xn be the transform coeffi- To apply LRT, one needs to specify the value of and the 1 2 W cients of Y; we model these as distribution of , which seems unlikely to be available. We need nonparametric detectors, which can be implemented λ W Xi Xi = 1 − λ · zi + λ · wi,0<λ<1, (1) without any knowledge of or , and depend on ’s only. In the section below, we are going to introduce two nonparametric detectors: excess kurtosis and Max; later in λ> z iid∼ N where 0isaparameter, i (0, 1) are the trans- Section 4.3, we will introduce a third nonparametric detec- iid form coefficients of the Gaussian component G, wi ∼ W are tor: higher criticism (HC). Comparing Different Statistics in Non-Gaussian 2473

1 When the null is true, the excess kurtosis statistic is asymp- 0.9 totically normal: 0.8 Undetectable Detectable κn X1, X2, ..., Xn −→ w N(0, 1), n −→ ∞ , (10) 0.7 for Max/HC 0.6 not for kurtosis n p r thus for large , the -value of the excess kurtosis is approxi- 0.5 Detectable for kurtosis mately . 0 4 not for Max/HC −1 0.3 p˜ = Φ¯ κn X1, X2, ..., Xn , (11) 0.2 Φ¯ · 0.1 where ( ) is the survival function (upper-tail probability) Detectable for both kurtosis and Max/HC of N(0, 1). 0 2 4 6 8 10 12 14 16 18 It is proved in [33] that the excess kurtosis is asymptoti- α cally optimal for the hypothesis testing of (2)-(3)if Figure 2: Detectable regions in the α-r plane. With (α, r)inthe E W8 < ∞. (12) white region on the top or in the undetectable region, all methods completely fail for detection. With (α, r) in the white region on the However, when E[W8] =∞, even though kurtosis is well bottom, both excess kurtosis and Max/HC are able to detect reliably. defined (E[W4] < ∞), there are situations in which LRT is In the shaded region to the left, Max/HC is able to detect reliably, able to reliably detect but excess kurtosis completely fails. In but excess kurtosis completely fails, and in the shaded region to the fact, by assuming (4)-(5)withanα<8, if (α, r) falls into right, excess kurtosis is able to detect reliably, but Max/HC com- pletely fails. the shaded region to the left of Figure 2, then LRT is able to reliably detect, however, excess kurtosis completely fails. This shows that in such cases, excess kurtosis is not optimal; see 2.2. Excess kurtosis and Max [33]. We pause to review the concept of p-value briefly. For a M statistic Tn, the p-value is the probability of seeing equally Max ( n) extreme results under the null hypothesis: The largest (absolute) observation is a classical and fre- quently used nonparametric statistic: p = P T ≥ t X X ... X H0 n n 1, 2, , n ;(7) Mn = max X1 , X2 , ..., Xn , (13) P H here H0 refers to probability under 0,and tn(X1, X2, ..., Xn) is the observed value of statistic Tn. under the null hypothesis, Notice that the smaller the p-value, the stronger the evidence against the null hypothesis. A natural decision rule based Mn ≈ 2logn, (14) on p-values rejects the null when p<αfor some selected level α, and a convenient choice is α = 5%. When the null and, moreover, by normalizing Mn with constants cn and dn, hypothesis is indeed true, the p-values for any statistic the resulting statistic converges to the Gumbel distribution −e−x are distributed as uniform U(0, 1). This implies that the Ev, whose cdf is e : p-values provide a common scale for comparing different statistics. Mn − cn −→ w Ev, (15) We now introduce two statistics for comparison. dn

Excess kurtosis (κn) where approximately Excess kurtosis is a widely used statistic, based on the 4th mo- √ X 6Sn ment. For any (symmetrical) random variable , the kurtosis dn = , cn = X¯ − 0.5772dn; (16) is π EX4 here X¯ and Sn are the sample mean and sample standard de- κ(X) = − 3. (8) n 2 2 {X } EX viation of i i=1, respectively. Thus a good approximation of the p-value for Mn is The kurtosis measures a kind of departure of X from Gaus- κ z = sianity, as ( ) 0. Mn − cn n X p˜ = exp − exp − . (17) Empirically, given realizations of , the excess kurtosis dn statistic is defined as We have tried the above experiment for n = 2442,andfound 4 n (1/n) i Xi c = . d = . κn X , X , ..., Xn = − 3 . (9) that taking n 4 2627, n 0 2125 gives a good approxi- 1 2 2 2 24 (1/n) i Xi mation. 2474 EURASIP Journal on Applied Signal Processing

−r Assuming (4)-(5)andα<8, or λ = n , and that W has thus if and only if r<1/4, κn for the alternative will differ a power-law tail with α<8, it is proved in [33] that Max significantly from κn under the null, and so the criterion for is optimal for hypothesis testing (2)-(3). Recall if we further detectability by excess kurtosis is r<1/4. assume 1/4 for the case 8. In comparison, in this case, Max is not op- set, which Mn is indeed using. However, when (α, r)moves timal. In fact, if we further assume 2/α < r < 1/4, then excess from the shaded region to the left to the shaded region to kurtosis is able to reliably detect, but Max will completely fail. the right, n1/α−r/2 2logn, the tails no longer contain any In Figure 2, we compared the detectable regions of the excess kurtosis and Max in the α-r plane. important evidence against the null, instead, the central part To conclude this section, we mention an alternative way of the data set contains the evidence. By symmetry, the 1st and the 3rd moments vanish, and the 2nd moment is 1 by to approximate the p-values for any statistic Tn. This alterna- tive way is important in case that an asymptotic (theoretic) the normalization; so the excess kurtosis is in fact the most approximation is poor for moderate large n, an example is promising candidate of detectors based on moments. ∗ The heuristic analysis is the essence for theoretic proof the statistic HCn we will introduce in Section 4.3; this alter- native way is helpful even when the asymptotic approxima- as well as empirical experiment. Later in Section 3.4,wewill tion is accurate. Now the idea is that, under the null hypoth- have more discussions for comparing the excess kurtosis with Max down this vein. esis, we simulate a large number (N = 104 or more) of Tn: (1) (2) (N) Tn , Tn , ..., Tn , we then tabulate them. For the observed value tn(X1, X2, ..., Xn), the p-value will then be well approx- 3. WAVELET COEFFICIENTS OF COSMIC STRINGS imated by 3.1. Simulated astrophysical signals 1 · k T(k) ≥ t X X ... X N # : n n 1, 2, , n , (18) The temperature anisotropies of the CMB contain the con- tributions of both the primary cosmological signal, directly and the larger the N, the better the approximation. related to the initial density perturbations, and the secondary anisotropies. The latter are generated after matter-radiation 2.3. Heuristic approach decoupling [34]. They arise from the interaction of the CMB We have exhibited a phase-change phenomenon, where the photons with the neutral or ionised matter along their path asymptotically optimal test changes depending on power-law [35, 36, 37]. index α. In this section, we develop a heuristic analysis of In the present study, we assume that the primary CMB detectability and phase change. anisotropies are dominated by the fluctuations generated The detection property of Max follows from compar- in the simple single field inflationary cold-dark-matter X = − λ · z ing the ranges of data. Recall that i 1 n i + model with a nonzero cosmological constant. The CMB λ · w {z }n − n n anisotropies have therefore a Gaussian distribution. We al- n i, the range of i i=1 is roughly ( 2log , 2log ), { λ · w }n λ · −n1/α n1/α = low for a contribution to the primary signal from topological and the range of n i i=1 is n ( , ) /α−r/ /α−r/ defects, namely, cosmic strings (CS), as suggested in [38, 39]. (−n1 2, n1 2); so, heuristically, See Figures 3 and 4. M ≈ n n1/α−r/2 We use for our simulations the cosmological parame- n max 2log , ; (19) ters obtained from the WMAP satellite [10]andanormal- σ = . n ization parameter 8 0 9. Finally, we obtain the so-called for large , notice that “simulated observed map,” D, that contains the two pre- √vious astrophysical√ components. It is obtained from Dλ = n1/α−r/2 n r<2 2log ,if α, 1 − λCMB + λCS, where CMB and CS are, respectively, (20) the CMB and the cosmic string simulated maps. λ = 0.18 n1/α−r/2 n r>2 2log ,if α, is an upper limit constant derived by [38]. All the simulated maps have 500 × 500 pixels with a resolution of 1.5 arcmin- r< /α M ff thus if and only if 2 , n for the alternative will di er utes per pixel. significantly from Mn for the null, and so the criterion for r< /α detectability by Max is 2 . 3.2. Evidence for E[W8] =∞ Now we study detection by excess kurtosis. Heuristically, ffi For the wavelet coe cients on the finest scale of the cosmic ffi 1 string map in Figure 3b, by throwing away all the coe cients κn ≈ √ · κ 1 − λn · zi + λn · wi n = 2 24 related to pixels on the edge of the map, we have 244 (21) coefficients; we then normalize these coefficients so that the √ = √1 · n · λ2 · κ W = O n1/2−2r empirical mean and standard deviation are 0 and 1, respec- n ( ) , {w }n 24 tively; we denote the resulting dataset by i i=1. Comparing Different Statistics in Non-Gaussian 2475

(a) (b) Figure 3: (a) Primary cosmic microwave background anisotropies Figure 4: Simulated observation containing the CMB and the CS and (b) simulated cosmic string map. (λ = 0.18).

{w }n |w | |w| > |w| > Assuming i i=1 are independent samples from a distri- We sort the i ’s in descending order, (1) (2) bution W,wehaveseeninSection 2 that whether excess kur- ···> |w|(n), and take the 50 largest samples |w|(1) > |w|(2) > 8 tosis is better than Max depends on the finiteness of E[W ]. ··· > |w|(50). For a power-law tail with index α,weexpect {w }n E W8 C We now analyze i i=1 to learn about [ ]. that for some constant α, Let i ≈ C − α |w| ≤ i ≤ n log n log α log (i) ,1 50, (24) m (n) = 1 w8 8 n i (22) i=1 so there is a strong linear relationship between log(i/n)and |w| be the empirical 8th moment of W using n samples. In the- log( (i)). Similarly, for the exponential model, we expect a n strong linear relationship between log(i/n)and|w| i ,andfor ory, if E[W8] < ∞, then m ( ) → E[W8]asn →∞.Soone ( ) 8 the Gaussian model, we expect a strong linear relationship E W8 m (n) way to see if [ ]isfiniteistoobservehow 8 changes between log(i/n)and|w|2 . n (i) with . For each model, to measure whether the “linearity” is n = 2 Technically, since we only have 244 samples, we can sufficient to explain the relationship between log(i/n)and compare |w| |w| |w|2 log( (i))(or (i),or (i)), we introduce the following z-score: n/ k m ( 2 ), k = 0, 1, 2, 3, 4; (23) 8 √ pi − i/n Zi = n , (25) i/n(1 − i/n) if these values are roughly the same, then there is strong evi- 8 dence for E[W ] < ∞; otherwise, if they increase with sam- where pi is the linear fit using each of the three models. If k E W8 =∞ m(n/2 ) z ple size, that is evidence for [ ] .Here 8 is an the resulting -scores is random and have no specific trend, E W8 n/ k {w }n the model is appropriate; otherwise, the model may need im- estimate of [ ] using 2 subsamples of i i=1. k k = m (n/2 ) provement. For 1, 2, 3, 4, to obtain 8 ,werandomlydraw n/ k {w }n The results are summarized in Figure 5. The power-law subsamples of size 2 from i i=1, and then take the aver- age of the 8th power of this subsequence; we repeat this pro- tail model seems the most appropriate: the relationship be- k tween log(i/n)andlog(|w| i ) looks very close to linear, the m (n/2 ) ( ) cess 50 000 times, and we let 8 be the median of these z z k -score looks very small, and the range of -scores much nar- k = m (n/2 ) 50 000 average values. Of course when 0, 8 is ob- rower than the other two. For the exponential model, the lin- tained from all n samples. earity is fine at the first glance, however, the z-score is de- The results corresponding to the first wavelet band are creasing with i, which implies that the tail is heavier than summarized in Table 1. From the table, we have seen that estimated. The Gaussian model fits much worse than expo- m (n) m (n/8) m (n/16) 8 is significantly larger than 8 and 8 ; this sup- nential. To summarize, there is strong evidence that the tail ports that E[W8] =∞. Similar results were obtained from follows a power law. the other bands. In comparison, in Table 1, we also list the Now we estimate the index α for the power-law tail. A 4th, 5th, 6th, and 7th moments. It seems that the 4th, 5th, widely used method for estimating α is the Hill estimator and 6th moments are finite, but the 7th and 8th moments [40]: are infinite. (l) l +1 αH = , (26) l i |w| /|w| 3.3. Power-law tail of W i=1 log (i) (i+1)

Typical models for heavy-tailed data include exponential tails where l is the number of (the largest) |w|(i) to include for and power-law tails. We now compare such models to the estimation. In our situation, l = 50 and data on wavelet coefficients for W; the Gaussian model is also (50) included as comparison. α = αH = 6.134; (27) 2476 EURASIP Journal on Applied Signal Processing

n/ k {|w |}n k = Table 1: Empirical estimate 4th, 5th, 6th, 7th, and 8th moments calculated using a subsamples of size 2 of i i=1, with 0, 1, 2, 3, 4. The table suggests that the 4th, 5th, and 6th moments are finite, but the 7th and 8th moments are infinite.

Size of 4th 5th 6th 7th 8th subsample moment moment moment moment moment n 30.0826 262.6756 2.7390 × 103 3.2494 × 104 4.2430 × 105 n/229.7100 256.3815 2.6219 × 103 2.9697 × 104 3.7376 × 105 n/22 29.6708 250.0520 2.4333 × 103 2.6237 × 104 3.0239 × 105 n/23 29.4082 246.3888 2.3158 × 103 2.4013 × 104 2.3956 × 105 n/24 27.8039 221.9756 1.9615 × 103 1.9239 × 104 1.8785 × 105

−7 0.5

−8

−9 0 -score z

log probability −10

−11 −0.5 22.22.42.62.83 0 1020304050 log(w(i)) i

(a) (b)

−7 1

−8 0.5

−9 0 -score z

log probability −10 −0.5

−11 −1 8 1012141618 0 1020304050 w(i) i

(c) (d)

−7 2

−8 1

−9 0 -score z

log probability −10 −1

− 11 − 0 100 200 300 400 2 0 1020304050 w2 (i) i

(e) (f)

i/n |w| |w| |w|2 ≤ i ≤ Figure 5: Plots of log probability log( )versus(a)log( (i)), (c) (i), and (e) (i) for 1 50, corresponding to the power- law/exponential/Gaussian models we introduced in Section 3; w are the wavelet coefficients of the finest scale (i.e., highest frequencies). Normalized z-score as defined in (25) for (b) the power-law, (d) exponential, and (c) Gaussian models again for 1 ≤ i ≤ 50. Comparing Different Statistics in Non-Gaussian 2477

Table 2: Table of α values for which the different wavelet bands of when the null is true (i.e., λ = 0)); the y-axis gives the cor- the CS map. responding fraction of true detections). Results are shown in Figure 6. The figure suggests that the excess kurtosis is Multiscale method Alpha slightly better than Mn. We also show an adaptive test, HCn Biorthogonal wavelet ∗ + in two forms (HCn and HCn ); these will be described later. Scale 1, horizontal 6.13 We now interpret. As our analysis predicts that W has a Scale 1, vertical 4.84 power-law tail with E[W8] =∞, it is surprising that excess Scale 1, diagonal 4.27 kurtosis still performs better than Max. Scale 2, horizontal 5.15 In Section 2.3,wecomparedexcesskurtosisandMax Scale 2, vertical 4.19 in a heuristic way; here we will continue that discussion, Scale 2, diagonal 3.83 using now empirical results. Notice that for the data set w w ... w Scale 3, horizontal 4.94 ( 1, 2, , n), the largest (absolute) observation is Scale 3, vertical 4.99 M = M = . Scale 3, diagonal 4.51 n 17 48, (30) Scale 4, horizontal 3.26 Scale 4, vertical 3.37 and the excess kurtosis is Scale 4, diagonal 3.76 κ = κ = 1 w4 − = . . n n i 3 27 08 (31) i we also found that the standard deviation of this estimate ≈ 0.9. Table 2 gives estimates of α for each band of the wavelet κ W α In the asymptotic analysis of Section 2.3,weassumed ( ) transform. This shows that is likely to be only slightly less is a constant. However, for n = 2442,wegetaverylargeex- than 8: this means the performance of excess kurtosis and cess kurtosis 27.08 ≈ n0.3; this will make excess kurtosis very Max might be very close empirically. favorable in the current situation. Now, in order for Mn to work successfully, we have to take 3.4. Comparison of excess Kurtosis and λ to be large enough that Max with simulation To test the results in Section 3.3, we now perform a small λM > n simulation experiment. A complete cycle includes the fol- 2log , (32) n = 2 {w }n lowing steps ( 244 and i i=1 are the same as in Section 3.3). so λ>0.072. The p-value of Mn is then (1) Let λ range from 0 to 0.1 with increment 0.0025. √ λM − 4.2627 (2) Draw (z , z , ..., zn) independently from N(0, 1) to − − 1 2 exp exp . , (33) represent the transform coefficients for CMB. 0 2125 (3) For each λ,let moreover, the p-value for excess kurtosis is, heuristically, (λ) Xi = Xi = 1 − λzi + λwi, λ = 0, 0.0025, ...,0.1, √ Φ¯ −1 nλ2κ (28) ; (34)

represent the transform coefficients for CMB + CS. setting them to be equal, we can solve κ in terms of M: (λ) (4) Apply detectors κn, Mn to the Xi ’s; and obtain the p- values. κ = κ0(M). (35)

We repeated steps (3)-(4) independently 500 times. The curve κ = κ (M) separates the M-κ plane into 2 re- Based on these simulations, first, we have estimated the 0 λ gions: the region above the curve is favorable to the excess probability of detection under various , for each detector: kurtosis, and the region below the curve is favorable to Max; see Figure 7. In the current situation, the point (M, κ) = Fraction of detections (17.48, 27.08) falls far above the curve; this explains why ex- p ≤ . (29) = number of cycles with a -value 0 05. cess kurtosis is better than Max for the current data set. 500 3.5. Experiments on Wavelet coefficients Results are summarized in Figure 6. Second, we pick out those simulated values for λ = 0.05 3.5.1. CMB + CS alone, and plot the ROC curves for each detector. The ROC We study the relative sensitivity of the different wavelet-based curve is a standard way to evaluate detectors [41]; the x-axis statistical methods when the signals are added to a dominant gives the fraction of false alarms (the fraction of detections Gaussian noise, that is, the primary CMB. 2478 EURASIP Journal on Applied Signal Processing

1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 True detection0.3 True detection0.3 0.2 0.2 0.1 0.1 0 0 00.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 00.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 λ λ

(a) (b)

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6 True positive True positive 0.5 0.5

0.4 0.4

0.3 0.3 00.10.20.30.40.50.60.70.80.91 00.10.20.30.40.50.60.70.80.91 False positive False positive

(c) (d)

Figure 6: (a) The fraction of detection for the excess kurtosis, HC∗, and Max; the x-axis is the corresponding λ;(b)thefractionofdetection for kurtosis, HC+, and Max. (c) ROC curves for the excess kurtosis, HC∗, and Max; (d) ROC curves for the excess kurtosis, HC+,andMax.

We ran 5000 simulations by√ adding the 100√ CMB re- transform. The first three subbands correspond to the finest alisations to the CS (D(λ, i) = 1 − λCMBi + λCS, i = scale (high frequencies) in the three directions, respectively, 1 ···100), using 50 different values for λ, ranging lin- horizontal, vertical, and diagonal. Bands 4, 5, and 6 corre- early between 0 and 0.18. Then we applied the biorthog- spond to the second resolution level and bands 7, 8, 9 to onal wavelet transform, using the standard 7/9 filter [42] the third. Results are clearly in favor of the excess kurto- to these 5000 maps. On each band b of the wavelet trans- sis. form, for each dataset D(λ, i), we calculate the kurtosis value The same experiments have been repeated, but replac- KD(b,λ,i). In order to calibrate and compare the departures ing the biorthogonal wavelet transform by the undecimated from a Gaussian distribution, we have simulated for each im- isotropic a` trous wavelet transform. Results are similarly in age D(λ, i) a Gaussian random field G(λ, i) which has the favor of the excess kurtosis. Table 3 gives the λ values (mul- same power spectrum as D(λ, i),andwederiveitskurto- tiplied 100) for which the CS are detected at a 95% confi- sis values KG(b,λ,i).Foragivenbandb and a given λ,we dence level. Only bands where this level is achieved are given. derive for each kurtosis KD(b,λ,i) its p-value pK (b, λ, i)un- The smaller the λ, the better the sensibility of the method to der the null hypothesis (i.e., no CS) using the distribution the detect the CS. These results show that the excess kurtosis of KG(b,λ,∗).Themeanp-value p¯K (b, λ) is obtained by tak- outperforms clearly HC and Max, whatever the chosen mul- ing the mean of pK (b, λ, ∗). For a given band b, the curve tiscale transform and the analyzed scale. p¯K (b, λ)versusλ shows the sensitivity of the method to de- No method is able to detect the CS at a 95% confidence tect CS. Then we do the same operation, but replacing the level after the second scale in these simulations. In practice, kurtosis by HC and Max. Figure 8 shows the mean p-value the presence of noise makes the detection even more difficult, versus λ for the nine finest scale subbands of the wavelet especially in the finest scales. Comparing Different Statistics in Non-Gaussian 2479

30 4.1. Curvelet coefficients of filaments SupposewehaveanimageI which contains within it a single 25 filament, that is, a smooth curve of appreciable length L.We (17.48, 27.08) analyse it using the curvelet frame. Applying analysis tech- 20 niques described carefully in [49], we can make precise the Region favorable kurtosis following claim: at scale s = 2− j , there will be about O(L2j/2) 15 significant coefficients caused by this filamentary feature, and they will all be of roughly similar size. The remaining O(4j ) 10 Empirical kurtosis coefficients at that scale will be much smaller, basically zero in 5 comparison. Region favorable HC or Max The pattern continues in this way if there is a collection m L L = 0 of filaments of individual lengths i and total length j/2 15 16 17 18 19 20 21 22 23 24 25 L1 + ···+ Lm.ThenweexpectroughlyO(L2 ) substantial j Empirical Max coefficients at level j,outof4 total. This suggests a rough model for the analysis of non- Gaussian random images which contain apparent “edge-like” Figure 7: The M-κ plane and the curve κ = κ0(M), where M is the largest (absolute) observation of wi’s, and κ is the empirical excess phenomena. If we identify the edges with filaments, then n ffi Ln1/4 kurtosis of wi’s, where wi’s are the wavelet coefficients of the sim- we expect to see, at a scale with coe cients, about ulated cosmic string. Heuristically, if (M, κ) falls above the curve, nonzero coefficients. Assuming all the edges are equally “pro- excess kurtosis will perform better than Max. The red bullet repre- nounced,” this suggests that we view the curvelet coefficients sents the points of (M, κ) = (17.48, 27.08) for the current data set of I at a given scale as consisting of a fraction  = L/n3/4 wi’s, which is far above the curve. nonzeros and the remainder zero. Under this model, the curvelet coefficients of a superposition of a Gaussian random image should behave like 3.5.2. CMB + SZ   Xi = (1 − )N(0, 1) + N(−µ,1)+ N(µ, 1), (36) We now consider a totally different contamination. Here, we 2 2 take into account the secondary anisotropies due to the ki- where  is the fraction of large curvelet coefficients corre- netic Sunyaev-Zel’dovich (SZ) effect [35]. The SZ effect rep- sponding to filaments, and µ is the amplitude of these coeffi- resents the Compton scattering of CMB photons by the free cients of the non-Gaussian component N. electrons of the ionised and hot intracluster gas. When the The problem of detecting the existence of such a non- galaxy cluster moves with respect to the CMB rest frame, the Gaussian mixture is equivalent to discriminating between the Doppler shift induces additional anisotropies; this is the so- hypotheses called kinetic SZ (KSZ) effect. The kinetic SZ maps are sim- ulated following Aghanim et al. [43] and the simulated ob- H X iid∼ N served map D is obtained from Dλ = CMB + λKSZ, where 0 : i (0, 1), (37)   CMB and KSZ are, respectively, the CMB and the kinetic (n) n n H : Xi = 1 − n N(0, 1) + N − µn,1 + N µn,1 , SZ simulated maps. We ran 5000 simulations by adding the 1 2 2 100 CMB realisations to the KSZ (D(λ, i) = CMBi + λKSZ, (38) i = 1 ···100), using 50 different values for λ, ranging lin- N ≡  ≡ early between 0 and 1. The p-values are calculated just as in and 0isequivalentto n 0. the previous section. 4.2. Optimal detection of sparse mixtures Table 4 gives the λ values for which SZ is detected at µ a 95% confidence level for the three multiscale transforms. When both  and are known, the optimal test for problem Only bands where this level is achieved are given. Results are (37)-(38) is simply the Neyman-Pearson likelihood ratio test again in favor of the Kurtosis. (LRT), [32, page 74]. Asymptotic analysis shows the follow- ing [50, 51]. −β Supposeweletn = n for some exponent β ∈ (1/2, 1), 4. CURVELET COEFFICIENTS OF FILAMENTS and Curvelet analysis was proposed by Candes` and Donoho µn = 2s log(n), 0

1 1 1

0.5 0.5 0.5 Mean P-value Mean P-value Mean P-value

0 0 0 00.01 0.02 00.01 0.02 00.01 0.02 u = 1 − λ u = 1 − λ u = 1 − λ

(a) (b) (c)

1 1 1

0.5 0.5 0.5 Mean P-value Mean P-value Mean P-value

0 0 0 00.10.2 00.10.2 00.10.2 u = 1 − λ u = 1 − λ u = 1 − λ

(d) (e) (f)

1 1 1

0.5 0.5 0.5 Mean P-value Mean P-value Mean P-value

0 0 0 00.10.2 00.10.2 00.10.2 u = 1 − λ u = 1 − λ u = 1 − λ

(g) (h) (i)

Figure 8: For the nine first bands of the wavelet transform, the mean p-value versus λ. The solid, dashed, and dotted lines correspond, respectively, to the excess kurtosis, the HC, and Max ((a) band 1, (b) band 2, (c) band 3, (d) band 4, (e) band 5, (f) band 6, (g) band 7, (h)band8,and(i)band9). then, as n →∞, LRT is able to reliably detect for large n when To define HC, first we convert the individual Xi’s into p- s>ρ∗ β s<ρ∗ β z p = P{N >X} 2 ( ), and is unable to detect when 2 ( )[31, 50, values for individual -tests. Let i (0, 1) i be the 51]. Since LRT is optimal, it is not possible for any statistic to ith p-value, and let p(i) denote the p-values sorted in increas- s<ρ∗ α s = ρ∗ β reliably detect when 2 ( ). We call the curve 2 ( ) ing order; the higher criticism statistic is defined as in the β-s plane the detection boundary;seeFigure 9. We also remark that if the sparsity parameter β<1/2, √ n i/n − p it is possible to discriminate merely using the value of the ∗ (i) HCn = max , (41) empirical variance of the observations or some other simple i p(i) 1 − p(i) moments, and so there is no need for advanced theoretical approaches. or in a modified form: 4.3. Adaptive testing using higher criticism √ The higher criticism statistic (HC), was proposed in [31], n i/n − p + (i) where it was proved to be asymptotically optimal in detecting HCn = max ; (42) {i /n≤p ≤ − /n} :1 (i) 1 1 p − p (37)-(38). (i) 1 (i) Comparing Different Statistics in Non-Gaussian 2481

Table 3: Table of λ values (multiplied by 100) for CS detections at 95% confidence.

Multiscale method Excess kurtosis HC Max Biorthogonal wavelet Scale 1, horizontal 0.73 0.73 0.73 Scale 1, vertical 0.73 0.73 0.73 Scale 1, diagonal 0.38 0.38 0.38 Scale 2, horizontal 8.01 9.18 8.81 Scale 2, vertical 6.98 8.44 10.65 Scale 2, diagonal 2.20 2.94 2.57 A` trous wavelet transform Scale 1 1.47 1.47 1.47 Scale 2 9.91 12.85 16.53 Curvelet Scale 1, band 1 1.47 2.20 3.30 Scale 1, band 2 13.59 16.90 — Scale 2, band 1 11.38 14.32 —

Table 4: Table of λ values for which the SZ detections at 95% confidence. Multiscale method Excess kurtosis HC Max Biorthogonal wavelet Scale 1, horizontal 0.30 0.32 — Scale 1, vertical 0.32 0.32 — Scale 1, diagonal 0.06 0.06 0.24 Scale 2, horizontal — — — Scale 2, vertical — — — Scale 2, diagonal 0.65 0.71 — A` trous wavelet transform Scale 1 0.41 0.47 — Curvelet Scale 1, band 1 0.59 0.69 0.83

1 With an appropriate normalization sequence: 0.9 0.8 Estimable an = 2 log log n, 0.7 (43) 0.6 bn = 2 log log n +0.5 log log log n − 0.5log(4π), s 0.5 Detectable . 0 4 the distribution of HCn converges to the Gumbel distribu- . 4 0 3 tion Ev , whose cdf is exp(−4exp(−x)) [52]: 0.2 Undetectable 0.1 4 anHCn − bn −→ w Ev , (44) 0 0.50.55 0.60.65 0.70.75 0.80.85 0.90.95 1 β so the p-values of HCn are approximately β s Figure 9: The detection boundary separates the square in the - plane into the detectable region and the undetectable region. When exp − 4exp − anHCn − bn . (45) (β, s) falls into the estimable region, it is possible not only to reliably detect the presence of the signals, but also estimate them. For moderately large n,ingeneral,theapproximationin(45) ∗ + + ∗ 2 we let HCn refer either to HCn or HCn whenever there is is accurate for the HCn ,butnotforHCn .Forn = 244 , no confusion. The above definition is slightly different from taking an = 2.2536 and bn = 3.9407 in (45) gives a good + [31], but the ideas are essentially the same. approximation for the p-value of HCn . 2482 EURASIP Journal on Applied Signal Processing

500 10 400 300 5 200 100 0 50 100 150 200 250 300 350 400 450 500 −25 −20 −15 −10 −50 510152025

(a) (a)

15 20 10 0 5 −20 0 −5 −4 −3 −2 −1012345 −1.5 −1 −0.500.51 1.5 (b) (b)

2 20 10 1 0 0 −10 −1 −20 −2 −15 −10 −50 51015 −5 −4 −3 −2 −1012345 (c) (c)

Figure 10: (a) The image of the bar, (b) the log-histogram of the Figure 11: For the curvelet coefficients vi’s of the simulated CS map curvelet coefficients of the bar, and (c) the qq-plot of the curvelet in Figure 3, (a) the log-histogram of vi’s, (b) the qq-plot of vi’s ver- . coefficients versus normal distribution. sus normal distribution, and (c) the qq-plot of sign(vi)|vi|0 815 ver- sus double exponential.

A brief remark comparing Max and HC. Max only takes This discrepancy from the sparsity model has two ex- into account the few largest observations, HC takes into ac- planations. First, cosmic string images contain (to the count those outliers, but also moderate large observations; naked eye) both point-like features and curve-like features. as a result, in general, HC is better than Max, especially Because curvelets are not specially adapted to sparsifying when we have unusually many moderately large observa- point-like features, the coefficients contain extra informa- tions. However, when the actual evidence lies in the middle tion not expressible by our geometric model. Second, cos- of the distribution, both HC and Max will be very weak. mic string images might contain filamentary features at a range of length scales and a range of density contrasts. 4.4. Curvelet coefficients of cosmic strings If those contrasts exhibit substantial amplitude variation, In Section 3, we studied wavelet coefficients of simulated cos- the simple mixture model must be replaced by something mic strings. We now study the curvelet coefficients of the more complex. In any event, the curvelet coefficients of cos- same simulated maps. mic strings do not have the simple structure proposed in We now discuss empirical properties of curvelet coeffi- Section 4. cients of (simulated) cosmic strings. This was first deployed When applying various detectors of non-Gaussian be- on a test image showing a simple “bar” extending vertically havior to curvelet coefficients, as in the simulation of across the image. The result, seen in Figure 10, shows the im- Section 3.5, we find that, despite the theoretical ideas back- age, the histogram of the curvelet coefficients at the next-to- ing the use of HC as an optimal test for sparse non- finest scale, and the qq-plot against the normal distribution. Gaussian phenomena, the kurtosis consistently has better The display matches in general terms the sparsity model of performance. The results are included in Tables 3 and 4. Section 4. That display also shows the result of superposing Note that, although the curvelet coefficients are not as Gaussian noise on the image; the curvelet coefficients clearly sensitive detectors as wavelets in this setting, that can be an have the general appearance of a mixture of normals with advantage, since they are relatively immune to point-like fea- sparse fractions at nonzero mean, just as in the model. tures such as SZ contaimination. Hence, they are more spe- We also applied the curvelet transform to the simulated cific to CS as opposed to SZ effects. cosmic string data. Figure 11 shows the results, which sug- gest that the coefficients do not match the simple sparsity 5. CONCLUSION model. Extensive modelling efforts, not reported here, show that the curvelet coefficients transformed by |v|0.815 have an The kurtosis of the wavelet coefficients is very often used in exponential distribution. astronomy for the detection non-Gaussianities in the CMB. Comparing Different Statistics in Non-Gaussian 2483

It has been shown [27] that it is also possible to separate the tions: foreground emission,” Astrophysical Journal Supplement non-Gaussian signatures associated with cosmic strings from Series, vol. 148, no. 1, pp. 97–117, 2003. those due to SZ effect by combining the excess kurtosis de- [10] C. L. Bennett, M. Halpern, G. Hinshaw, et al., “First-year rived from these both the curvelet and the wavelet transform. Wilkinson microwave anisotropy probe (WMAP) observa- We have studied in this paper several other transform-based tions: preliminary maps and basic results,” Astrophysical Jour- nal Supplement Series, vol. 148, no. 1, pp. 1–27, 2003. statistics, the Max and the higher criticism, and we have com- [11] F. Bernardeau and J. Uzan, “Non-Gaussianity in multifield in- pared them theoretically and experimentally to the kurto- flation,” Physical Review D, vol. 66, no. 10, 103506, 14 pages, sis. We have shown that kurtosis is asymptotically optimal 2002. in the class of weakly dependent symmetric non-Gaussian [12] X. Luo, “The angular bispectrum of the cosmic microwave contamination with finite 8th moments, while HC and Max background,” Astrophysical Journal Letters, vol. 427, no. 2, pp. are asymptotically optimal in the class of weakly depen- L71–L74, 1994. dent symmetric non-Gaussian contamination with infinite [13] A. H. Jaffe, “Quasilinear evolution of compensated cosmolog- 8th moment. Hence, depending on the nature of the non- ical perturbations: the nonlinear σ model,” Physical Review D, Gaussianity, a statistic is better than another one. This is a vol. 49, no. 8, pp. 3893–3909, 1994. motivation for using several statistics rather than a single [14] A. Gangui, F. Lucchin, S. Matarrese, and S. Mollerach, “The one, for analysing CMB data. Finally, we have studied in de- three-point correlation function of the cosmic microwave background in inflationary models,” Astrophysical Journal, tails the case of cosmic string contaminations on simulated vol. 430, no. 2, pp. 447–457, 1994. maps. Our experiment results show clearly that kurtosis out- [15] D. Novikov, J. Schmalzing, and V. F. Mukhanov, “On non- performs Max/HC. Gaussianity in the cosmic microwave background,” Astron- omy & Astrophysics, vol. 364, pp. 17–25, 2000. [16] S. F. Shandarin, “Testing non-Gaussianity in cosmic mi- ACKNOWLEDGMENT crowave background maps by morphological statistics,” Monthly Notices of the Royal Astronomical Society, vol. 331, The cosmic string maps were kindly provided by F. R. no. 4, pp. 865–874, 2002. Bouchet. The authors would also like to thank Inam Rahman [17] B. C. Bromley and M. Tegmark, “Is the cosmic microwave for help in simulations. background really non-Gaussian?” Astrophysical Journal Let- ters, vol. 524, no. 2, pp. L79–L82, 1999. [18] L. Verde, L. Wang, A. F. Heavens, and M. Kamionkowski, REFERENCES “Large-scale structure, the cosmic microwave background [1] A. A. Penzias and R. W. Wilson, “Measurement of the flux and primordial non-Gaussianity,” Monthly Notices of the Royal density of CAS A at 4080 Mc/s,” Astrophysical Journal, vol. 142, Astronomical Society, vol. 313, no. 1, pp. 141–147, 2000. no. 1, pp. 1149–1155, 1965. [19] N. G. Phillips and A. Kogut, “Statistical power, the bis- [2] D. J. Fixsen, E. S. Cheng, D. A. Cottingham, et al., “A pectrum, and the search for non-Gaussianity in the cos- balloon-borne millimeter-wave telescope for cosmic mi- mic microwave background anisotropy,” Astrophysical Jour- crowave background anisotropy measurements,” Astrophysi- nal, vol. 548, no. 2, pp. 540–549, 2001. cal Journal, vol. 470, pp. 63–63, 1996. [20]M.Kunz,A.J.Banday,P.G.Castro,P.G.Ferreira,andK.M. [3]G.F.Smoot,C.L.Bennett,A.Kogut,etal.,“Structureinthe Gorski,´ “The trispectrum of the 4 year COBE DMR data,” COBE differential microwave radiometer first-year maps,” As- Astrophysical Journal Letters, vol. 563, no. 2, pp. L99–L102, trophysical Journal Letters, vol. 396, no. 1, pp. L1–L5, 1992. 2001. [4] A. D. Miller, R. Caldwell, M. J. Devlin, et al., “A measure- [21] N. Aghanim and O. Forni, “Searching for the non-Gaussian ment of the angular power spectrum of the cosmic microwave signature of the CMB secondary anisotropies,” Astronomy & background from l = 100 to 400,” Astrophysical Journal Letters, Astrophysics, vol. 347, pp. 409–418, 1999. vol. 524, no. 1, pp. L1–L4, 1999. [22] O. Forni and N. Aghanim, “Searching for non-Gaussianity: [5]P.deBernardis,P.A.R.Ade,J.J.Bock,etal.,“Aflatuniverse statistical tests,” Astronomy and Astrophysics Supplement Se- from high-resolution maps of the cosmic microwave back- ries, vol. 137, no. 3, pp. 553–567, 1999. ground radiation,” Nature, vol. 404, no. 6781, pp. 955–959, [23] M. P. Hobson, A. W. Jones, and A. N. Lasenby, “Wavelet anal- 2000. ysis and the detection of non-Gaussianity in the cosmic mi- [6] S. Hanany, P. A. R. Ade, A. Balbi, et al., “MAXIMA-1: a mea- crowave background,” Monthly Notices of the Royal Astronom- surement of the cosmic microwave background anisotropy ical Society, vol. 309, no. 1, pp. 125–140, 1999. on angular scales of 10 − 5◦,” Astrophysical Journal Letters, [24] R. B. Barreiro and M. P. Hobson, “The discriminating power vol. 545, no. 1, pp. L5–L9, 2000. of wavelets to detect non-Gaussianity in the cosmic mi- [7] N. W. Halverson, E. M. Leitch, C. Pryke, et al., “Degree an- crowave background,” Monthly Notices of the Royal Astronom- gular scale interferometer first results: a measurement of the ical Society, vol. 327, no. 3, pp. 813–828, 2001. cosmic microwave background angular power spectrum,” As- [25] L. Cayon,J.L.Sanz,E.Mart´ ´ınez-Gonzalez,´ et al., “Spherical trophysical Journal, vol. 568, no. 1, pp. 38–45, 2002. mexican hat wavelet: an application to detect non-Gaussianity [8] A. Benoˆıt,P.A.R.Ade,A.Amblard,etal.,“Thecosmicmi- in the COBE-DMR maps,” Monthly Notices of the Royal Astro- crowave background anisotropy power spectrum measured nomical Society, vol. 326, no. 4, pp. 1243–1248, 2001. by Archeops,” Astronomy & Astrophysics, vol. 399, no. 3, pp. [26] J. Jewell, “A statistical characterization of galactic dust emis- L19–L23, 2003. sion as a non-Gaussian foreground of the cosmic microwave [9] C. L. Bennett, R. S. Hill, G. Hinshaw, et al., “First-year background,” Astrophysical Journal, vol. 557, no. 2, pp. 700– Wilkinson microwave anisotropy probe (WMAP) observa- 713,2001. 2484 EURASIP Journal on Applied Signal Processing

[27] J.-L. Starck, N. Aghanim, and O. Forni, “Detection and [46] E. J. Candes` and D. L. Donoho, “Edge-preserving denoising in discrimination of cosmological non-Gaussian signatures by linear inverse problems: optimality of curvelet frames,” Annals multi-scale methods,” Astronomy & Astrophysics, vol. 416, of Statistics, vol. 30, no. 3, pp. 784–842, 2002. no. 1, pp. 9–17, 2004. [47] J.-L. Starck, E. Candes,` and D. L. Donoho, “The curvelet [28] J.-L. Starck, F. Murtagh, and A. Bijaoui, Image Processing and transform for image denoising,” IEEE Trans. Image Processing, Data Analysis: the Multiscale Approach, Cambridge University vol. 11, no. 6, pp. 670–684, 2002. Press, Cambridge, UK, 1998. [48] J.-L. Starck, F. Murtagh, E. J. Candes,` and D. L. Donoho, “Gray [29] S. G. Mallat, A Wavelet Tour of Signal Processing,Academic and color image contrast enhancement by the curvelet trans- Press, San Diego, Calif, USA, 1998. form,” IEEE Trans. Image Processing, vol. 12, no. 6, pp. 706– [30] N. Aghanim, M. Kunz, P. G. Castro, and O. Forni, “Non- 717, 2003. Gaussianity: comparing wavelet and fourier based meth- [49] E. J. Candes` and D. L. Donoho, “Frames of curvelets and opti- ods,” Astronomy & Astrophysics, vol. 406, no. 3, pp. 797–816, mal representations of objects with piecewise c2 singularities,” 2003. Communications On Pure and Applied Mathematics, vol. 57, [31] D. L. Donoho and J. Jin, “Higher criticism for detecting sparse no. 2, pp. 219–266, 2004. heterogeneous mixtures,” Annals of Statistics, vol. 32, no. 3, pp. [50] Yu. I. Ingster, “Minimax detection of a signal for ln-balls,” 962–994, 2004. Mathematical Methods of Statistics, vol. 7, no. 4, pp. 401–428, [32] E. L. Lehmann, Testing Statistical Hypotheses, John Wiley & 1999. Sons, New York, NY, USA, 2nd edition, 1986. [51] J. Jin, “Detecting a target in very noisy data from multiple [33] D. L. Donoho and J. Jin, “Optimality of excess kurtosis for de- looks,” IMS Lecture Notes Monograph, vol. 45, pp. 1–32, 2004. tecting a non-Gaussian component in high-dimensional ran- [52] G. R. Shorack and J. A. Wellner, Empirical Processes with Ap- domvectors,”Tech.Rep.,StanfordUniversity,Stanford,Calif, plications to Statistics, John Wiley & Sons, New York, NY, USA, USA, 2004. 1986. [34] M. White and J. D. Cohn, “TACMB-1: The theory of anisotropies in the cosmic microwave background,” American Journal of Physics, vol. 70, no. 2, pp. 106–118, 2002. J. Jin received the B.S. and M.S. degrees in mathematics from [35] R. A. Sunyaev and I. B. Zeldovich, “Microwave background Peking University, China, the M.S. degree in applied mathemat- radiation as a probe of the contemporary structure and his- ics from the University of California at Los Angeles (UCLA), Cali- tory of the universe,” Annual Review of Astronomy and Astro- fornia, and the Ph.D. degree in statistics from Stanford University, physics, vol. 18, pp. 537–560, 1980. Stanford, California, where D. L. Donoho served as his adviser. He [36] J. P. Ostriker and E. T. Vishniac, “Generation of microwave is an Assistant Professor of statistics at Purdue University, Indiana. background fluctuations from nonlinear perturbations at the His research interests are in the area of large-scale multiple hypoth- ERA of galaxy formation,” Astrophysical Journal, vol. 306, pp. esis testing, statistical estimation, and their applications to protein L51–L54, 1986. mass spectroscopy and astronomy. [37] E. T. Vishniac, “Reionization and small-scale fluctuations in the microwave background,” Astrophysical Journal, vol. 322, J.-L. Starck has a Ph.D. degree from University Nice-Sophia An- pp. 597–604, 1987. tipolis and a Habilitation from University Paris XI. He was a visitor [38] F. R. Bouchet, P.Peter, A. Riazuelo, and M. Sakellariadou, “Ev- at the European Southern Observatory (ESO) in 1993, at UCLA in idence against or for topological defects in the BOOMERanG 2004, and at Stanford’s Statistics Department in 2000 and 2005. He data?” Physical Review D, vol. 65, no. 2, 021301(R), 4 pages, has been a researcher at CEA since 1994. His research interests in- 2002. clude image processing, and statistical methods in astrophysics and [39] F. R. Bouchet, D. P. Bennett, and A. Stebbins, “Patterns of cosmology. He is also author of two books entitled Image Processing the cosmic microwave background from evolving string net- and Data Analysis: the Multiscale Approach (Cambridge University works,” Nature, vol. 335, no. 6189, pp. 410–414, 1988. Press, 1998) and Astronomical Image and Data Analysis (Springer, [40] B. M. Hill, “A simple general approach to inference about the 2002). tail of a distribution,” Annals of Statistics,vol.3,no.5,pp. 1163–1174, 1975. D. L. Donoho is Anne T. and Robert M. Bass Professor in the hu- [41] C. E. Metz, “Basic principles of ROC analysis,” Seminars in Nuclear Medicine, vol. 8, no. 4, pp. 283–298, 1978. manities and sciences at Stanford University. He received his A.B. degree in statistics from Princeton University where his thesis ad- [42] M. Antonini, M. Barlaud, P.Mathieu, and I. Daubechies, “Im- age coding using wavelet transform,” IEEE Trans. Image Pro- viser was John W. Tukey and his Ph.D. degree in statistics from Har- cessing, vol. 1, no. 2, pp. 205–220, 1992. vard University where his thesis adviser was Peter J. Huber. He is a [43] N. Aghanim, K. M. Gorski,´ and J.-L. Puget, “How accurately Member of the US National Academy of Sciences and of the Amer- can the SZ effect measure peculiar cluster velocities and bulk ican Academy of Arts and Sciences. flows?” Astronomy & Astrophysics, vol. 374, no. 1, pp. 1–12, 2001. N. Aghanim is a Cosmologist at the Institut d’Astrophysique Spa- [44] E. J. Candes` and D. L. Donoho, “Curvelets—a surprisingly tiale in Orsay (France). Her main research interests are cosmic mi- effective nonadaptive representation for objects with edges,” crowave background (CMB) and large-scale structure. She has been in Curve and Surface Fitting: Saint-Malo 1999,A.Cohen,C. working during the last ten years on the statistical characterisation Rabut, and L. L. Schumaker, Eds., Vanderbilt University Press, of CMB temperature anisotropies through power spectrum analy- Nashville, Tenn, USA, 1999. ses and higher-order moments of wavelet coefficients. She naturally [45] D. L. Donoho and A. G. Flesia, “Can recent developments in got involved and interested in signal processing techniques in order harmonic analysis explain the recent findings in natural scene to improve the detection of low signal to noise such as those as- statistics?” Network: Computation in Neural Systems, vol. 12, sociated with secondary anisotropies and separate them from the no. 3, pp. 371–393,2001. primary signal. Comparing Different Statistics in Non-Gaussian 2485

O. Forni is a Planetologist at the Institut d’Astrophysique Spatiale in Orsay (France). His main research activities deal with the evolu- tion of the planets and satellites of the solar system. Recently he has been working on the statistical properties of the cosmic microwave background (CMB) and of the secondaries anisotropies by means of multiscale transforms analysis. He also got involved in compo- nent separation techniques in order to improve the detection of low power signatures and to analyse hyperspectral infrared data on Mars. EURASIP Journal on Applied Signal Processing 2005:15, 2486–2499 c 2005 C. Thiebaut and S. Roques

Time-Scale and Time-Frequency Analyses of Irregularly Sampled Astronomical Time Series

C. Thiebaut Centre d’Etude Spatiale des Rayonnements, 9 avenue du Colonel Roche - Boite postale 4346, 31028 Toulouse Cedex 4, France Email: [email protected]

S. Roques Laboratoire d’Astrophysique de l’Observatoire Midi-Pyr´en´ees, 14 avenue Edouard Belin, 31400 Toulouse, France Email: [email protected]

Received 27 May 2004; Revised 21 January 2005

We evaluate the quality of spectral restoration in the case of irregular sampled signals in astronomy. We study in details a time- scale method leading to a global wavelet spectrum comparable to the Fourier period, and a time-frequency matching pursuit allowing us to identify the frequencies and to control the error propagation. In both cases, the signals are first resampled with a linear interpolation. Both results are compared with those obtained using Lomb’s periodogram and using the weighted wavelet Z- transform developed in astronomy for unevenly sampled variable stars observations. These approaches are applied to simulations and to light variations of four variable stars. This leads to the conclusion that the matching pursuit is more efficient for recovering the spectral contents of a pulsating , even with a preliminary resampling. In particular, the results are almost independent of the quality of the initial irregular sampling. Keywords and phrases: astronomical time series, irregular sampling, time-scale methods, time-frequency methods, wavelets, matching pursuit.

1. INTRODUCTION are of two types. First, evenly spaced time series separated by wide gaps [9] (typically day/night alternation for observa- Nonuniform sampling problems arise in many astronomi- tions of short-period stars taken over several days). In that cal fields [3, 22], particularly in Stellar physics when one ob- case, many different methods have been proposed to deal serves the light curves of variable stars (asteroseismology) or with this problem, for example, autoregressive models which spectroscopic variabilities. The frequencies deduced from the predict data for the gaps [21] combined with observing cam- light variations of such stars represent an important source paigns with telescopes at several different longitudes [18]. of information. In particular, they can help constrain stel- Second, unequally spaced time series with samples missing lar evolution models, because the structure of the vibration almost everywhere. Here, data under study are from sev- modes and their frequency separations may yield physical eral years of observations (long-period stars) with a mean parameters of the star, such as the rotation period or the sampling rate of a few days (here, telescope failures or bad composition of its layers [1, 16]. Another field of applica- weather conditions are the main causes of the gaps [4]). This tion concerns the development of automatic classifiers for second case is considered in this paper. Of course, problems variable stars, where the period is a very discriminating pa- of this kind do not arise only when processing astronomical rameter [27]. Of course, observations have to cover a long signals. In an astrophysical context, it is of capital importance enough time span for the best possible resolution of the to solve them in order to carry out a physical interpretation power density spectra. The difficulty in obtaining such com- of the observations, no other experimental alternative is pos- plete observations is well known: the lack of information is sible. essentially due to diurnal cuts, poor weather conditions, or Another problem often arises in complement: searching equipment malfunctions. Generally, such astronomical data for oscillations that are characteristics of the structural prop- erties of the star (i.e., these that arise almost everywhere in This is an open access article distributed under the Creative Commons the signal) thanks to a frequency analysis. One then under- Attribution License, which permits unrestricted use, distribution, and stands the necessity of getting information about the life- reproduction in any medium, provided the original work is properly cited. time of a given peak of the resulting power density spectrum. Time-Scale Analyses for Unevenly Sampled Time Series 2487

In this context, wavelet analysis [6] and time-frequency anal- two appreciably different types of time-frequency analyses: ysis [10], which have the ability to decompose the signal into a global wavelet transform and the associated wavelet spec- contributions localized both in time and in scale (or fre- trum [28], which is described in Section 2,andamatch- quency), are thus especially attractive to obtain this informa- ing pursuit decomposition [17] developed in Section 3. tion. These analyses are widely described for simulated data The results are discussed and compared (Section 4)to for example in Szatmary´ et al. [26] and in re- those found using a periodogram and WWZ, for simu- search in Kiss and Szatmary´ [14]. These authors conclude lated signals and for light curves of four variable stars: T that such methods of decomposition, labeled with a scale and Camelopardis—a star of period 373.2days, a position parameter, provide interpretable visual represen- S Persei—a Type C semiregular star of period 822 days, tations of astronomical data, as an alternative to the standard AC Herculis—a Type A RV Tauri variable star of period spectral analysis. 75.01 days, and RV Tauri—a Type B RV Tauri star of pe- Unfortunately, wavelet and time-frequency analyses are riod 78.73 days. These periods are those specified in the generally not directly applicable to the particular case of ir- fourth edition of the General Catalog of Variable Stars, see regularly sampled data [24], thus one often uses standard http://www.sai.msu.su/groups/cluster/gcvs/gcvs.Thisleads techniques like periodograms [13]. However, with such a us in particular to forecast a chaotic light curve of AC Her- simple spectral technique, intervals including low-amplitude culis as predicted by Kollath et al. [15]. peaks are hard to identify in the processed data because each feature is contaminated by noise and convolved by a function whose nature closely depends on the irregular distribution of 2. GLOBAL WAVELET SPECTRA the data. As the corresponding aliases can be of substantial We recall that a wavelet decomposition is an expansion of an amplitude, they can lead to the confusion of features due to arbitrary function into smoothed localized contributions la- real oscillations with those arising from the segmented na- beled by a scale and a time parameter. Its aim is to expand a ture of the observing window. If one deals only with signals signal into a series of coefficients of specified energy and then whosespectraaredominatedbyasmallnumberofcompo- tocapturefineandcoarsefeaturesatdifferent scales. More- nents at discrete frequencies, nonlinear deconvolution meth- ffi over, it provides an easily interpretable visual representation ods like the widely used CLEAN technique are e cient [20]. of the signal (see, e.g., the book of Daubechies [6]formore Obviously, in the absence of any a priori information on the details). Wavelets are generated by a function ψ(t)namedthe spectrum to be recovered, the case considered here, this tech- analyzing wavelet. This function should have a finite energy, nique becomes unreliable. ffi and its integral should vanish. These two conditions mean Time-frequency methods are e cient, particularly when that the wavelet should oscillate like a short wave. The ana- an examination by eye of the periodogram leads to a failure lyzing wavelet is the mother of the wavelet family. The wavelet (see Section 4). If data sampling is not equidistant, one must family {ψ} is generated by translating and dilating the ana- first resample the signal to build a regular sampling, before lyzing wavelet. Then, one can write any function as a linear applying a time-frequency method. Several techniques exist combination of the elements of the family. to do this and the associate errors are widely discussed in Here we consider the continuous wavelet transform of a the literature. de Waele and Broersen [8] divide them into real signal s(t) with respect to the analyzing wavelet ψ(t). The simple and complex methods. In particular, they conclude wavelet transform is defined as the function that linear interpolation is a robust resampling method al- though it provides a signal whose standard deviation is bi-    1 t − b ased with a systematic error. However, this error can be cor- C(b, a) = √ ψ∗ s(t)dt (1) rected by replacing the standard deviation obtained with lin- a a ear interpolation by the value given with the method they propose: the nearest neighbor resampling. Foster [11]pro- on the time-scale plane. Here, a is a dilation scale and b a posed a rescaled wavelet technique called weighted wavelet translation parameter; the asterisk denotes the complex con- Z-transform (WWZ). It is developed specifically for unevenly jugate. The qualitative information given by the visual output sampled data in the context of observation of variable stars: supplements the information obtained by inspection of the here, the wavelet is rescaled to satisfy admissibility condi- signal itself, or its Fourier transform. The wavelet transform tion on such irregular sampling. One can for example read displays information in a wide range of scale parameters on the paper of Haubold [12] for an interesting analysis of this a single picture. method. In this paper, our results will also be compared with The choice of the analyzing wavelet is generally guided those obtained by the WWZ technique. by a compromise between time and frequency resolutions The signals under consideration in this paper being ir- and by its ability to capture localized features of the signal. regularly sampled, we have opted for a processing method As we are essentially interested in wavelet power spectra, the in two stages: (1) we have resampled the data by using a wavelet we used here is the Morlet wavelet (a complex expo- linear interpolation (with a sampling rate typically equal nential modulated by a Gaussian) to one day and without applying any additional smooth-     2 ing or filtering—contrary to preprocessing techniques rec- − / t ψ (t) = π 1 4 exp iω t exp − ,withω = 6, (2) ommended by Buchler et al. [2]), (2) we have then applied 0 0 2 0 2488 EURASIP Journal on Applied Signal Processing which offershighfrequencyresolutionbecauseitisvery well localized in frequencies. In contrary, using derivative of Gaussian wavelet would result in a good time localization, but a poor one in frequency. The global wavelet transform used in our analysis corre- sponds to a continuous wavelet approach allowing the defi- nition of a global wavelet spectrum as the square of the mod- ulus of the wavelet coefficients for each scale together with statistical significance tests [28]. The aim of the method pro- Error (%) posed by these authors is to provide quantitative tools asso- ciated with wavelet analysis. This leads in particular to an equivalent Fourier period (which can be derived analytically for each wavelet function) which can be easily compared to 10 the Fourier power spectrum or to the periodogram. 10 20 30 40 50 60 70 80 90 Number of atoms 3. MATCHING PURSUIT ALGORITHM Gaussian Spline 0 The matching pursuit algorithm, introduced by Mallat and Hamming Spline 1 Hanning Spline 2 Zhang [17], allows us to choose, in a given redundant finite Blackman Spline 3 dictionary of time-frequency waveforms, a set of vectors that match the signal as well as possible. The dictionary D is de- fined as a family (not a basis) of time-frequency functions Figure 1: Reconstruction error versus number of atoms of the obtained by dilating, modulating, and translating a single matching pursuit decomposition for each shape of the even real function k(t) (in logarithmic scale). real even function k(t) ∈ L2(R). The atoms (elements) of the dictionary are defined by   t − b scalar product in R. In practice, one has to compute a scalar k t = √1 k eiωt ν( ) a a ,(3)product for all the values of ν.Thevectorkν giving the largest k one is ν1 . The light curve is then decomposed into the form where a is the dilation scale, b the translation parameter, and ω ν = a b ω   a frequency modulation. One defines ( , , √) as the s t = s t kν t kν t Rs t atom index in the dictionary. Note that the factor 1/ a nor- ( ) ( ), 1 ( ) 1 ( )+ ( ), (4) malizes the L2(R)normofkν(t) to unity. If the window k(t) is Gaussian, the joint time-frequency localization of all the where Rs(t) is the residual vector after approximating s(t)in k t k t Rs t atoms is a minimum, and in this case kν(t)isaGabor func- the “direction” ν1 ( ). Clearly, ν1 ( ) is orthogonal to ( ), tion. Note that the family {kν} is not a wavelet family in the and hence, one has the relation sense that a given dilation allows several analyzing frequency       values. It can be seen as a superposition of a wavelet trans- s t 2 = s t k t 2 Rs t 2 ( ) ( ), ν1 ( ) + ( ) ,(5) form and a short-term Fourier transform. In particular, the underlying family is nonorthogonal. In practice, the atoms of where ·denotes the Euclidian norm. Note that if the fam- the family are thus oscillating functions modulated by win- {k } k t ily ν is nonorthogonal, the vector ν1 ( )isorthogonalto dow functions. They are generated by two mother functions the residual Rs(t). This important property allows the con- k t that satisfy localization properties: a window function ( ) struction of the algorithm: the main idea of the matching (a “Spline 0” in this paper—see Figure 1) and a wavelet-like pursuit is to subdecompose the residue Rs(t), by finding a function. For each value of the atom index ν, one has a new k t vector ν2 ( ) that matches it as well as possible, as was done atom of the family, until obtaining a complete collection of for s(t). Each time, the procedure is repeated on the obtained atomic waveforms. residue: A matching pursuit algorithm computes adaptive sig-   nal representations: it expands any signal into a set of s t = s t k t k t Rs t ( ) ( ), ν1 ( ) ν1 ( )+ ( ), atoms selected among the redundant dictionary D,tomatch   Rs t = Rs t kν t kν t R2s t its components as well as possible, through iterated one- ( ) ( ), 2 ( ) 2 ( )+ ( ), dimensional projections. . (6) . In our particular case, it is the resampled light curve s(t),   n n n+1 from which the mean value has been subtracted after resam- R s(t) = R s(t), kνn (t) kνn (t)+R s(t). pling so that it becomes zero mean, which is approximated k by a single vector ν1 chosen from the dictionary D such that It is easy to determine a convergence criterion of the algo- |s t k t | · · ( ), ν1 ( ) is as large as possible. Here, , denotes the rithm, by examining the decrease of the norm of the residue. Time-Scale Analyses for Unevenly Sampled Time Series 2489

Finally, the signal is decomposed into The shape of the even real function k(t)canbechosen according to various criteria, while still in agreement with ∞   the mathematical conditions authorizing the matching pur- s t = Ris t k t k t ( ) ( ), νi ( ) νi ( ), (7) suit decomposition. These criteria depend on the adaptabil- i=1 ity of k(t) to the studied signal and on some oversensitiv- ity to errors. Eight different window shapes have been con- k t where the atoms νi ( ) are the ones that match the signal sidered: Gaussian, Hamming, Hanning, Blackman, Spline 0, structures as well as possible. We can then build a hierarchy Spline 1, Spline 2, and Spline 3. Their description can be kν t kν t ... kν t of the main signal structures ( 1 ( ), 2 ( ), , n ( )) yield- foundinOppenheimandSchafer[19] and in de Boor [7]. ing a time-frequency energy distribution. Figure 1 presents the evolution of the quadratic reconstruc- An energy conservation theorem results from (5): tion error versus the number of atoms used to decompose a simulated signal with these eight different windows. The   ∞    2 i 2 tested signal is a regularly spaced 1200-point-long signal, the s(t) = R s(t), kνi (t) . (8) . i=1 sum of two cosines with periods of 33 3 and 100 days. The quadratic reconstruction error is defined as the norm of the The energy density of s(t) in the time-frequency plane (t, u) residual signal between the simulated and the reconstructed is defined by ones divided by the norm of the simulated signal. The Black- man window always gives the largest error. The Gaussian and ∞     Spline 3 windows also show large errors whatever the num- Es t u = Ris t k t 2 Wk t u ( )( , ) ( ), νi ( ) νi ( , ), (9) ber of atoms. Up to 50 atoms, the Spline 0 window is clearly i= 1 the best. For a decomposition with more than 50 atoms, all the windows, except the Blackman window, present similar where (Wkνi )(t, u) is the Wigner-Ville distribution [29]de- reconstruction errors. Following this, as the signals we ana- fined as follows: lyze here are of quite constant amplitude, the adaptation of        ∞ k t τ ∗ τ −iuτ ( ) to the data leads us to choose the Spline 0 window. In- Wkν (t, u) = kν t + kν t − e dτ. (10) −∞ 2 2 deed, the rectangular shape of this window does not create an amplitude modulation. In practice, it is this energy density Es (9) that is represented in the time-frequency diagrams. Note that this density does not include the interference terms of the Wigner-Ville distri- 4. APPLICATION EXAMPLES AND THE COMPARISON butions, because it is computed from an atomic decomposi- WITH THE PERIODOGRAM AND WITH THE WWZ s t tion of ( ). Although we have analyzed the light curves of four stars, we Decomposition onto orthonormal basis or the method of focus on only two of them (AC Herculis and RV Tauri—see Coifman and Wickerhauser [5] selects in a global way the ba- http://www.kusastro.kyoto-u.ac.jp/vsnet/index.html)and sis that is best adapted to the signal properties. The results as- the results are summarized in Table 1. sociated with these methods are hardly interpretable. On the The data set of the irregularly sampled observations contrary, the matching pursuit decomposition is a construc- spans JD 2 440 000–2 450 000 for AC Herculis and JD 2 432 tive process which allows to detect and characterize the time- 223–2 452 270 for RV Tauri (see Figure 2a). JD is the Julian frequency components one by one, from the highest energet- day number, number of days that have elapsed since noon ics one to the lowest. of January 1, 4713 B.C. of our civil calendar. In addition, we In a matching pursuit diagram, we can choose to select have created two artificial signals (sum of two cosines with only some atoms representing the structures of interest. In periods of 100 and 33.3 days) with the same nonequidistant our case, these are the most coherent ones. The correspond- sampling scheme as those of AC Herculis and RV Tauri, and ing atoms appear as long (in time) elements in the time- without added noise. The choice not to add noise is linked to frequency plane (horizontal atoms). As the noise of the time the concern of analyzing only the effect of the nonuniform series does not correlate well with any long lifetime dictio- sampling and how the methods address this question. nary element, its information is diluted and then subdecom- The periodograms from these variable star observations posed in several “stains” localized in a short time interval. and those from the simulated signals made from the same ir- The peaks, even of large amplitude, which do not correspond regular sampling are presented in Figures 3 and 4. In these to star oscillations but are artifacts due to the sampling or figures (top), the two 100-day and 33.3-day periods are vis- to the gaps of the observing window or corresponding to ible. The clearly identifiable aliases essentially correspond to highly transient phenomena appear in the time-frequency the annual cycle of the observations, proving that one finds diagram localized in a very short time period, in a large fre- it effective in the sampling. quency range (vertical atoms). A simple operation, keeping We used also the WWZ and experimented it with dif- only the long lifetime atoms (whatever their frequency), al- ferent values for the parameter c [11] defining the trade- lows us to eliminate spurious information (e.g., correspond- off between time resolution and frequency resolution. Af- ing to noise). ter accepting the default parameters proposed by Foster 2490 EURASIP Journal on Applied Signal Processing

Table 1: Results for the four variable stars (column 1). The sampling quality is defined by the mean number of observations per day (column 2). The reconstruction error computed for the matching pursuit analysis is presented (column 3). The known period and those found with the different analyses are indicated (columns 4–8): periodogram (PPerio), GWS (PGWS), matching pursuit decomposition (PMP), and weighted wavelet Z-transform (PWWZ). In the WWZ, c = 0.005 for data of S Per. and T Cam., c = 0.04 for AC Her, and c = 0.0125 for RV Tau. In brackets: double periods are also found and are corresponding to the periods indicated in the fourth edition of the General Catalog of Variable Stars.

Star O/day Error PGCVS PPerio PGWS PMP PWWZ

AC Her. 0.757 3.71% 75.01 37.74[75.19] 37.00[75.00] 37.73[75.19] 38.24[71.43] S Per. 0.112 11.22% 822.00 806.40 826.00 819.00 831.90

RV Tau. 0.190 8.53% 78.73 39.26 39.00 39.29[8.77] 38.8 T Cam. 0.051 10.98% 373.20 371.75 373.00 372.30 371.10

We then define two “sampling” signals for which each 12 value is equal to the time step of the AC Herculis and RV 11 Tauri data, respectively. Let ti be the time when the data are 10 available. The sampling signal values are ti+1 − ti (the wider

Magnitude 9 the gap, the larger the value—see Figure 2b). The “sampling” 8 signals are normalized so that their variances are the same as 33.544.555.5 the corresponding simulated signals. These sampling signals × 4 Modified Julian date (d) 10 are used in the two following sections: their wavelet trans- forms are compared to those of simulated and real signals. (a)

2500 4.1. Results from the simulations 2000 The wavelet power spectrum (WPS) and the global wavelet 1500 spectrum (GWS) obtained for both simulations are pre-

erence (d) 1000 ff 500 sented in Figures 5 and 6. The Torrence and Compo [28] 0 approach provides statistical tools for establishing the va-

Time di −500 lidity of the results. This allows us to show in the GWS the . . . 335445555 95% confidence level that would be obtained for white noise, ×104 Modified Julian date (d) which is the noise present in periodic variable stars obser- (b) vations as those studied here (the natural assumption of red noise is not adequate here because these data were obtained from different observers). This level (shown by the dashed Figure 2: RV Tauri sampling signal (modified Julian date = JD-2 line) is quasisuperposed on the x-axis (the linear interpola- 400 000.5). tion induces errors comparable to white noise). In the WPS, the continuous white line indicates the cone of influence (c = 0.0125) for time and frequency grids, we settled on c = (zone where the edge effects are important). Information 0.04 for time sampling of AC Herculis and kept c = 0.0125 outside this cone is not relevant. for time sampling of RV Tauri as better compromises in the From an examination by eye of the WPS from the sim- sense that they were the first lowest values for which we ob- ulated signal on AC Herculis sampling (Figure 5b), one can tained the target periods. WWZ applied on these test data identify the two simulated periods at 100.5and33.1days. can reveal peaks at frequencies 100.5and33.2daysfortime The corresponding GWS (Figure 5c) shows these two periods sampling of AC Herculis and at frequencies 100.0and32.9 which are over the 95% confidence level. In the WPS from days for time sampling of RV Tauri. The same constants will the simulated signal from the RV Tauri sampling (Figure 6b), be used for application to the observations. one is also able to identify the two periods also at 100.5and In order to perform time-frequency and time-scale anal- 33.1 days, although in a more indistinct way, and a third one yses, the data (simulated and real) have also been linearly in- at 230 days. The large zone due to the bad sampling at the terpolated with a time step equal to one day. As the known beginning of the run, around t = 36 000 modified Julian periods of the studied variable stars are more than 50 days, days (MJD = JD-2 400 000.5), appears as a large stain in Shannon’s theorem is satisfied. The so-interpolated signals the low frequencies. Although it is inside the cone of influ- are presented in Figures 5 and 6 (simulated) and Figures 7 ence, it overshadows the presence of both periods. The GWS and 8 (real). Note that contrary to the case of AC Herculis, (Figure 6c) highlights the same problem. whose missing samples are well distributed throughout the Figures 9 and 10 present the GWS from simulated signals observation, RV Tauri data contain very large gaps at the be- compared with those of the corresponding sampling signals. ginning of the run. The purpose of this comparison is to discriminate, from the Time-Scale Analyses for Unevenly Sampled Time Series 2491

104 103

103 102

102 101

101 100

Power − Power 100 10 1

10−1 10−2

10−2 10−3

10−3 10−4 00.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 00.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Frequency (d−1) Frequency (d−1)

(a) (a)

103

102 102

101 101 100 100 10−1 − Power 10 1 Power 10−2 10−2 10−3

−3 10 − 10 4

10−4 10−5 00.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 00.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Frequency (d−1) Frequency (d−1)

(b) (b)

Figure 3: Periodograms from the AC Herculis light curve (b) and Figure 4: Periodograms from the RV Tauri light curve (b) and from from the corresponding simulated signal (a), presented on a loga- the corresponding simulated signal (a), presented on a logarithmic rithmic scale. scale. results of the GWS, if the presence of a peak of significant based on the LastWave software of Bacry available at http:// amplitude can be explained or not by the irregularity of the www.cmap.polytechnique.fr/∼bacry/LastWave/. sampling. In these simple cases, we can verify that the irregu- The linearly interpolated simulated signals are decom- lar sampling of RV Tauri is responsible for the peaks above posed into functions from the dictionary D with a Spline roughly 500 days, detected in the GWS (Figure 10). Their 0 window. The energy density of the 100 first atoms k t k t ... k t nonvalidity was already confirmed by their position outside ( ν1 ( ), ν2 ( ), , ν100 ( )) is shown on Figures 11 and 12.The the cone of influence. However, the GWS from the sampling long atoms represent the most coherent structures of the sig- signal does not explain the 230-day period (see Section 5). nal. The peaks which do not correspond to Stellar oscillations Note also that for AC Herculis (Figure 9), the two structures but are artifacts due to the sampling appear localized on a visible at 365 days and 183 days obviously correspond to the short time or cover a large frequency range (vertical atoms). annual cycle of the observations. In the decomposition of the signal built from the AC Her- To complete our analysis, we examine the matching pur- culis sampling (Figure 11), we can identify two long atoms suit decomposition of the same simulations. The results pre- at exactly 0.03 day−1 and 0.01 day−1, characteristic of the sented here use the free graphical user interface developed at two frequencies introduced into the simulations. These are our institute (http://webast.ast.obs-mip.fr/people/fbracher) labeled 1 and 2 in the decomposition (cf. (6)). 2492 EURASIP Journal on Applied Signal Processing

4 2 0 −2 Magnitude −4 4.14.24.34.44.54.64.74.84.9 × 4 Modified Julian date (d) 10

(a)

2 4 8 16 32 64 128 Period (d) 256 Period (d) 512 1024 4.14.24.34.44.54.64.74.84.9 0 50 100 × 4 Modified Julian date 10 Power (mag2)

(b) (c)

Figure 5: (a) Simulated signal built from the AC Herculis sampling (light curve). (b) Wavelet power spectrum. The continuous white line indicates the cone of influence. (c) Global wavelet spectrum of the simulated signal with a Morlet wavelet. The 95% confidence level that could be obtained for a white noise is shown by the dashed line.

4 2 0 −2 Magnitude −4 3.43.63.844.24.44.64.855.2 × 4 Modified Julian date (d) 10

(a)

2 4 8 16 32 64 128 Period (d) 256 Period (d) 512 1024 3.43.63.844.24.44.64.85 5.2 0 50 100 4 Modified Julian date ×10 Power (mag2)

(b) (c)

Figure 6: (a) Simulated signal built from the RV Tauri sampling (light curve). (b) Wavelet power spectrum. The continuous white line indicates the cone of influence. (c) Global wavelet spectrum of the simulated signal with a Morlet wavelet. The 95% confidence level that could be obtained for a white noise is shown by the dashed line.

The decomposition of the signal built from the RV are among the most energetic in the decomposition (num- Tauri sampling (Figure 12)presentsanatomat0.03 day−1 bers 2 and 3—number 1, the very first, being a low- (with aliases corresponding to the annual cycle) in the frequency atom of short lifetime centered on t = 4000 MJD), second part of the data. Another atom appears at 0.01 the simulated frequencies are also perfectly highlighted day−1 with the same lifetime and aliases. As these atoms here. Time-Scale Analyses for Unevenly Sampled Time Series 2493

5

0 Magnitude −5 4.14.24.34.44.54.64.74.84.9 × 4 Modified Julian date (d) 10

(a)

2 4 8 16 32 64 128 Period (d) Period (d) 256 512 1024 4.14.24.34.44.54.64.74.84.9 01020 ×104 Modified Julian date Power (mag2)

(b) (c)

Figure 7: (a) AC Herculis light curve. (b) Wavelet power spectrum. The continuous white line indicates the cone of influence. (c) Global wavelet spectrum of the simulated signal with a Morlet wavelet. The 95% confidence level that could be obtained for a white noise is shown by the dashed line.

5

0 Magnitude −5 3.43.63.844.24.44.64.855.2 × 4 Modified Julian date (d) 10

(a)

2 4 8 16 32 64 128 Period (d) 256 Period (d) 512 1024 3.43.63.844.24.44.64.855.2 01020 × 4 Modified Julian date 10 Power (mag2)

(b) (c)

Figure 8: (a) RV Tauri light curve. (b) Wavelet power spectrum. The continuous white line indicates the cone of influence. (c) Global wavelet spectrum of the simulated signal with a Morlet wavelet. The 95% confidence level that could be obtained for a white noise is shown by the dashed line.

4.2. Application to the observations above 30 mag2: the first frequency (in terms of largest power) − The analysis of the AC Herculis and the RV Tauri light is centered at 0.0265 day 1 (37.736 days), the following one curves was conducted with the same methods. AC Her- at 0.0133 day−1 (75.188 days), and the two others at 0.0398 culis periodogram (Figure 3b) presents four important peaks day−1 (25.126 days) and 0.0531 day−1 (18.832 days). The first 2494 EURASIP Journal on Applied Signal Processing

100 2 1 90 0 80 −1 Magnitude 70 −2

) 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 2 60 Time (d) 50 (a) 40 Power (mag 30 0.04 20 10 0.03

0 ) 1

2 4 8 16 32 64 128 256 512 1024 − Period (d) 0.02

Figure 9: Global wavelet spectrum of the simulated signal made Frequency (d from the AC Herculis sampling (solid line) compared to the GWS . from the corresponding sampling signal (dashed line). 0 01

250 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Time (d) 200 (b) )

2 150 Figure 11: Time-frequency decomposition (b) of the simulated sig- nal (a) made from the AC Herculis sampling. Each grey level repre- 100 sents the energy density of each atom. Power (mag

50 (1219.5 days) which is the known long period of RV Tauri. Both peaks present aliases corresponding to the annual cycle of the observations. 0 WWZ analysis for AC Herculis reveals a high-amplitude 2 4 8 16 32 64 128 256 512 1024 peak at 38.24 days and a less energetic one at 71.43 days. For Period (d) RV Tauri, one can surprisingly identify the most prominent peak at 1299 days and two others of same amplitude at 502.51 Figure 10: Global wavelet spectrum of the simulated signal made and 38.8 days (as expected). In both cases, the value of the from the RV Tauri sampling (solid line) compared to the GWS from constant c was chosen as for the simulated signals: c = 0.04 the corresponding sampling signal (dashed line). The solid line with for AC Herculis and c = 0.0125 for RV Tauri. However, the diamonds represents the dashed line multiplied by a factor 10, for heuristic choice of c makes it difficult to find evidence for ex- clarity. act periods. For example, if one chooses a lower or a higher value for c, periods and/or amplitudes appear to be slightly different. It seems that this point is not discussed in the liter- two frequencies present aliases corresponding to the annual ature. cycle of the observations. The known period of AC Herculis Figures 7 and 8 present the WPS and the GWS from AC (T0 = 75.01 days) is correctly identified, although it appears Herculis and RV Tauri, respectively, obtained with the same less energetic than the one at T0/2, and its harmonics at T0/3 wavelet (Morlet) as for the simulated signals. and T0/4 are also well detected. Finding half the known pe- In AC Herculis WPS (Figure 7b),oneisabletodetermine riod, revealed by the wavelet analysis and the matching pur- two periods: 37 days and 75 days. Both of them are present suit as well, will be discussed further (Section 4). throughout the observation. Once again, the 37-day period The RV Tauri periodogram (Figure 4b) reveals very noisy appears just before (in terms of power) the 75-day period. In behavior, but the results indicate a first peak at 0.025471 the corresponding GWS, these two periods appear over the day−1 (39.26 days), corresponding to roughly half of the 95% confidence level and two other less energetic peaks also known period of this star, and a second one at 0.00082 day−1 appear above this level at 141 and 374 days. Note that the Time-Scale Analyses for Unevenly Sampled Time Series 2495

3.5 1 0 3 −1 Magnitude −2 2.5 )

0 10000 20000 2 − 2 Time (d)

(a) 1.5 Power (mag 1

0.5 0.03 0 ) 1 2 4 8 16 32 64 128 256 512 1024 − Period (d) 0.02 Figure 13: Global wavelet spectrum of the AC Herculis signal (solid

Frequency (d line) compared to the GWS from the corresponding sampling signal 0.01 (dashed line).

20 18 0 10000 20000 16 Time (d) 14 ) (b) 2 12 − 10 Figure 12: Time-frequency decomposition (b) of the simulated sig- nal (a) made from the RV Tauri sampling. Each grey level represents 8 Power (mag the energy density of each atom. 6 4 2 harmonics at T0/3andT0/4 detected in the periodogram do 0 not appear in the WPS. The corresponding GWS is analyzed 2 4 8 16 32 64 128 256 512 1024 in Figure 13, superposed on that of the “sampling” signal. In Period (d) fact, the nature of the sampling explains the last peak (374 days) and even the 141-day one, which are not real. Figure 14: Global wavelet spectrum of the RV Tauri signal (solid In the RV Tauri WPS (Figure 8b), only a single period at line) compared to the GWS from the corresponding sampling signal 39 days in the second part of the time interval can be identi- (dashed line). The solid line with diamonds represents the dashed fied. In the corresponding GWS (Figure 8c), it is also observ- line multiplied by a factor 20, for clarity. able, together with some others above 500 days, but essen- tially outside the cone of influence. The GWS from RV Tauri is presented in Figure 14. The superposition on that of the The harmonics revealed in the periodogram are also present corresponding sampling signal explains the peaks above 500 with a high energy in the decomposition (atom 10, fre- days, due to the sampling, which confirms that they are not quency: 0.0398 day−1 (25.12 days); atom 12, frequency: real (as was already revealed by the cone of influence). 0.0531 day−1 (18.83 days)). The matching pursuit analyses of AC Herculis and RV Among the frequencies that can be identified in the RV Tauri are computed with a Spline 0 window and keep- Tauri diagram (Figure 16), the first long lifetime atom (order ing the first 100 atoms of the decomposition, as was done 4) is at 0.0255 day−1 (39.29 days) and is centered around 12 for the simulated signals (Figures 15 and 16). Several long 000 days. It represents the most energetic structure of the sig- lifetime atoms appear in the AC Herculis decomposition: nal and is half the known oscillating period of this star (78.73 the two most energetic ones are located at the same fre- days). Two atoms (orders 15 and 20) are centered at 0.0227 quency: 0.0265 day−1 (37.73 days) and the third oscillates at day−1 (44.02 days) and 0.0282 day−1 (35.46 days). They cor- 0.0133 day−1 (75.19 days). AC Herculis known period (75.01 respond to aliases due to the annual cycle of observations. days) is thus well determined, but as in the wavelet analy- The 0.012695 day−1 frequency (78.77 days) is in a more dis- sis, half the known period appears first in terms of energy. tant position (order 28) from an energy point of view. 2496 EURASIP Journal on Applied Signal Processing

1 1 0 0 − Magnitude 1 −

Magnitude 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 10000 20000 Time (d) Time (d)

(a) (a)

0.04

0.06

. 0.05 0 03 ) 1 ) 1 − 0.04 − 0.02 0.03 Frequency (d 0.02 Frequency (d 0.01 0.01

0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 10000 20000 Time (d) Time (d) (b) (b)

Figure 15: Time-frequency decomposition (b) of the AC Herculis Figure 16: Time-frequency decomposition (b) of the RV Tauri sig- signal (a). nal (a).

Figure 17 represents the results of a matching pursuit de- The application of WWZ to the real data leads to the composition of S Persei detailed in another paper [23], show- identification of known periods in the four stars. However, in ing that this type of method can be applied to longer period the case of RV Tauri, the prominent period is 1299 days and stars. In that particular case, no long lifetime frequency can not 38.8daysasexpected.Different choices of c parameter be identified by eye in the diagram, but the first atom of the did not allow us to improve the result. Concerning AC Her- decomposition is at 0.00122 day−1 (819 days) and centered culis, S Persei and T Camelopardis, the most obvious features around 10 000 days. This atom is not as long as the strong are consistent oscillations, as they are indicated in the fourth apparent periodicity in the last part of the data because the edition of the General Catalog of Variable Stars. We were not energy of a longer lifetime atom (double lifetime) would be able to compute associated errors since, as indicated by Fos- smaller (it would take into account the noise on both sides of ter [11], their determination is “extraordinary complex.” this part of the light curve). While neither the periodogram The 95% confidence level associated with the WPS and nor the wavelet analysis was able to find S Persei known pe- the cone of influence are efficient tools to check if the high- riod (822 days), the matching pursuit analysis offers the pos- lighted frequencies are meaningful. However, in the regions sibility of clearly detecting it, since it appears as the first atom of interpolated large gaps, neither the cone of influence nor of the decomposition. the 95% confidence level is unable to discriminate spurious frequencies. But this method reveals problems for low fre- quencies: in the GWS, high-amplitude peaks are not always 5. DISCUSSION explained by those from the sampling signal. The poorest the Lomb’s periodogram provides well-resolved power density sampling quality, the more important the problems at low spectra but it is oversensitive to irregular sampling; in some frequencies. These problems are due to the linear interpola- of our cases (e.g., S Persei), the periodogram was too noisy to tion, but other interpolation methods (e.g., cubic spline) lead be analyzed without a preliminary resampling. Moreover, al- to the same problem. This is what prevents us from carrying though there exists a large literature concerning the statistical out an associated error analysis. properties of the periodogram, its intrinsic nature does not As for the periodogram, the matching pursuit analysis allow us to implement a reconstruction error analysis com- is oversensitive to the annual cycle of the observations, but parable to the error provided by matching pursuit analysis. the atom hierarchy provides quantitative information on the Time-Scale Analyses for Unevenly Sampled Time Series 2497

3 The matching pursuit decomposition appears to be par- 2 1 ticularly suitable for a deeper analysis of AC Herculis. In- 0 −1 deed, this star’s light curve is supposed to be chaotic [15]. −2 The matching pursuit analysis is probably the first step to Magnitude −3 use, towards a simple nonlinear dynamical analysis. Based on 0 10000 certain physical properties of such variable stars, a relevant Time (d) atom selection allows us to reconstruct a signal characteristic of their structural properties. Note that the WWZ, also high- (a) lighting two periods, would not allow us to select atoms in the same simple manner. Our ongoing work should provide 0.02 support for the results of Kollath et al. [15]. In the partic- ular case of AC Herculis, our work could also explain why thefoundperiod(37.73 days) turns out to be half the pe- riod indicated in the literature (75.01 days) and why these

) periods are so close from an energetics point of view. In 1 − fact, we are probably facing a period-doubling phenomenon: 0.01 if the star oscillates with a stable fundamental period, say T0 = 37.73 days, when some parameters vary, a period- doubling bifurcation may occur, leading to another stable pe- Frequency (d riod of 2T0 = 75.01 days. Both periods can be observed, with a variable amplitude which depends on the run.

6. CONCLUSION 0 0 10000 We have used a wavelet analysis and a matching pursuit Time (d) decomposition to investigate the role of irregular sampling with linear interpolation in the determination of the spec- (b) tral contents of variable star’s light curves and we have com- pared them with results given by a Lomb’s periodogram Z Figure 17: Time-frequency decomposition (b) of the S Persei sig- andbyweightedwavelet -transform (which is a time-scale nal (a). method that can be compared in its principle to Lomb’s pe- riodogram). The proposed algorithm is composed of two steps: first, a preprocessing is done by interpolating the ir- regularly sampled light curve; second, a time-scale or a time- dominating frequencies of the signal. We can easily define frequency analysis is applied on resampled data. the associated error as the quadratic reconstruction error af- Of course, the ability to analyze the frequency content ter decomposition on a large number of atoms (e.g., 400 in of a data set, together with its time dependence, is by it- our case). Table 1 presents this error for simulated signals self a powerful tool. To analyze unevenly spaced signals, sta- (third column) associated with the four variable stars pre- tionary methods, although technically suited, fail to deter- sented in Section 1. The mean number of observations per mine frequencies; nonstationary methods are better suited day is used to estimate the quality of the sampling (second even at the price of a preliminary resampling. This in- column in Table 1). One can note that the reconstruction er- cites a better use of time-scale or time-frequency meth- ror is not correlated with the sampling quality. This is ex- ods. In this context, the results yielded by the matching plained by the large choice of atoms of different lifetimes pursuit analysis are significantly better in terms of ability and frequencies offered by the dictionary, which can solve not only to recover the right frequency (see Table 1), but the problem caused by a very incomplete sampling (see, e.g., also to conduct an error analysis and remain independent the error reconstruction for T Camelopardis, compared to its of the sampling quality. It is important to notice at this sampling). Table 1 also presents the four variable stars peri- stage that if one had only kept the well-sampled parts of ods and the ones found by the periodogram (PPerio), the GWS the signals, the results would of course have been immedi- (PGWS), the matching pursuit decomposition (PMP), and the ate. Unfortunately, this procedure is impossible in the case weighted wavelet Z-transform (PWWZ). of systematic astronomical observations which produce a We have noted throughout the paper the fact that for AC large amount of data. This work is the first attempt to ap- Herculis, we systematically found half the known period for ply a matching pursuit algorithm to light curves of vari- this star. For RV Tauri, half the known period was also found. able stars. We attached attention to interpolation procedure However, with the matching pursuit decomposition, we were which at first sight could appear very simple. This is also also able to identify the known period (78.73 days). These the first time in astronomy where a periodogram is com- two periods also found by Zsoldos [30] are characteristics of pared to global wavelet power spectra (i.e., 2D spectra), and the double-wave shape of RV Tauri-like stars. to WWZ technique. Lastly, this is the first step to a chaotic 2498 EURASIP Journal on Applied Signal Processing behavior analysis in the continuity of Kiss and Szatmary’s´ wavelets/. The corresponding article [28] is also down- work [14]. loadable at this URL. Matching pursuit software uses a The known period of AC Herculis at 75.01 days was free graphical user interface developed in our institute and found (in second position) at 75.19 days by the periodogram available for Linux (and Windows for a simplest version) and the matching pursuit analysis, at 75.00 days by the GWS at http://webast.ast.obs-mip.fr/people/fbracher/.Thecodeis and at 71.43 by the WWZ. Harmonics at T0/2, T0/3, and based on LastWave software of E. Bacry (http://www.cmap. T0/4 have been clearly identified by the periodogram and polytechnique.fr/∼bacry/LastWave/). Weighted wavelet Z- the matching pursuit analysis. The two last (resp., three) har- transform was performed using the computer program monics do not appear in the GWS (resp., WWZ). The pe- WWZ developed by the American Association of Vari- riod of RV Tauri at 78.73 days was not found neither by the able Star Observers (http://www.aavso.org). The authors are periodogram nor by the global wavelet spectrum nor by the grateful to the staff of the Variable Star Network (VSNET) weighted wavelet Z-transform. However, half of this period for their online database. The authors wish to thank Herve´ could be found at 39.26 days by the periodogram, at 39.00 Carfantan and Laurent Koechlin for helpful discussions on days by the GWS, and at 38.80 by the WWZ. The matching early versions of this paper and for their comments and sug- pursuit decomposition reveals both periods: 39.29 days and gestions. We gratefully thank Natalie Webb for her essential 78.77 days. This last method also permits us to conduct an corrections. error analysis. The matching pursuit algorithm thus appears well suited for spectral investigation of irregularly sampled variable stars REFERENCES signals. This study, moreover, offers the new benefit of simply [1]P.A.Bradley,“TheoreticalmodelsforasteroseismologyofDA requiring a linear interpolation of the data and allows us to white dwarf stars,” Astrophysical Journal, vol. 468, pp. 350– propose a simple guideline for processing such signals: 368, 1996. [2] J. R. Buchler, Z. Kollath, T. Serre, and J. Mattei, “Nonlinear (1) resample the signal with a linear interpolation; analysis of the light curve of the variable star R scuti,” Astro- (2) choose a time step compatible with the searched fre- physical Journal, vol. 462, pp. 489–504, 1996. [3]M.Carbonell,R.Oliver,andJ.L.Ballester,“Powerspectraof quency range; gapped time series: a comparison of several methods,” Astron- (3) save this new signal as a column ascii file (zero mean); omy & Astrophysics, vol. 264, pp. 350–360, 1992. (4) download the matching-pursuit graphical user in- [4] C. Catala, J.-F. Donati, T. Bohm,¨ et al., “Short-term spectro- terface guimauve at webast.ast.obs-mip.fr/people/fbra- scopic variability in the pre-main sequence Herbig Ae star AB cher (Linux version preferred) and install it with the Aur during the MUSICOS 96 campaign,” Astronomy & Astro- physics, vol. 345, pp. 884–904, 1999. rpm command; [5] R. R. Coifman and M. V. Wickerhauser, “Entropy-based algo- (5) execute the command guimauve; rithms for best basis selection,” IEEE Trans. Inform. Theory, (6) open the signal file, set the right time step (menu Sig- vol. 38, no. 2, pp. 713–718, 1992. nal), decompose the signal (menu Matching), choose [6] I. Daubechies, TenLecturesonWavelets, SIAM, Philadelphia, the atom number to be investigated (100 by default) Pa, USA, 1992. and the window (Gaussian by default). Options are ac- [7] C. de Boor, A Practical Guide to Splines, Springer, New York, NY, USA, 1978. tivated by the mouse and the scroll bar; [8] S. de Waele and P. M. T. Broersen, “Error measures for resam- (7) information on atoms are written at the bottom of the pled irregular data,” IEEE Trans. Instrum. Meas., vol. 49, no. 2, window: ordered in hierarchy, lifetime of the oscilla- pp. 216–222, 2000. tion (“octave” defines the length of the atom as 2octave), [9] G. G. Fahlman and T. J. Ulrych, “A new method for estimating time and frequency. The energy is quantitatively acces- the power spectrum of gapped data,” Monthly Notices of the sible thanks to the menu File and Save Decomposition. Royal Astronomical Society, vol. 199, pp. 53–65, 1982. [10] P. Flandrin, Time-Frequency/Time-Scale Analysis,Academic A reconstruction is possible from an atom or parame- Press, San Diego, Calif, USA, 1998. ter selection. [11] G. Foster, “Wavelets for period analysis of unevenly sampled time series,” Astronomical Journal, vol. 112, no. 4, pp. 1709– Processing a simple linear interpolation before applying 1729, 1996. a time-frequency analysis offersadvantagesoveraFourier [12] H. J. Haubold, “Wavelet analysis of the new solar neutrino transform or a periodogram from nonresampled data. How- capture rate data for the Homestake experiment,” Astrophysics and Space Science, vol. 258, no. 1-2, pp. 201–218, 1997. ever, we plan to investigate, in a forthcoming work, a com- [13] J. H. Horne and S. L. Baliunas, “A prescription for period anal- parison of the matching pursuit results with linear interpola- ysis of unevenly sampled time series,” Astrophysical Journal, tion of the data versus the interpolation technique proposed vol. 302, no. 2, pp. 757–763, 1986. by Strohmer [25]. [14] L. L. Kiss and K. Szatmary,´ “Period-doubling events in the light curve of R Cygni: Evidence for chaotic behaviour,” Astronomy & Astrophysics, vol. 390, no. 2, pp. 585–596, ACKNOWLEDGMENTS 2002. [15] Z. Kollath, J. R. Buchler, T. Serre, and J. Mattei, “Analysis of Wavelet Matlab software was provided by C. Torrence and the irregular pulsations of AC Herculis,” Astronomy & Astro- G. Compo, and is available at paos.colorado.edu/research/ physics, vol. 329, no. 1, pp. 147–155, 1998. Time-Scale Analyses for Unevenly Sampled Time Series 2499

[16] D. Koester and G. Chanmugam, “Physics of white dwarf and analysis of astrophysical signals with time-frequency methods. stars,” ReportsonProgressinPhysics, vol. 53, no. 7, pp. 837– With the support of a postdoctoral fellowship from the French 915, 1990. Space Agency (CNES), she started working on optical observations [17] S. G. Mallat and Z. Zhang, “Matching pursuits with time- of space debris in October 2003 for the CNES. Since September frequency dictionaries,” IEEE Trans. Signal Processing, vol. 41, 2004, she is working on onboard data processing in CNES. no. 12, pp. 3397–3415, 1993. [18]R.E.Nather,D.E.Winget,J.C.Clemens,C.J.Hansen,and S. Roques was born in Narbonne (Aude), B. P. Hine, “The whole earth telescope—a new astronomi- France, in 1956. She attended the Uni- cal instrument,” Astrophysical Journal, vol. 361, pp. 309–317, versity Paul Sabatier at Toulouse, where September 1990. she received the M.S. degree in solid-state [19] A. Oppenheim and R. Schafer, Discrete-Time Signal Process- ff physics in 1980 and a Ph.D. degree in ing, Prentice-Hall, Englewood Cli s, NJ, USA, 1989. electron microscopy in 1982. She carried [20] D. H. Roberts, J. Lehar, and J. W. Dreher, “Time series anal- on her research at the “Centre National ysis with CLEAN. I. Derivation of a spectrum,” Astronomical de la Recherche Scientifique” (CNRS), the Journal, vol. 93, no. 4, pp. 968–988, 1987. [21] S. Roques, A. Schwarzenberg-Czerny, and N. Dolez, “Para- French National Science Council. She is the metric spectral analysis applied to gapped time-series of vari- Director of Research at the CNRS since able stars,” Baltic Astronomy, vol. 9, no. 1, pp. 463–477, 2000. 2000. She has published about 75 papers in electron microscopy, [22] S. Roques, B. Serre, and N. Dolez, “Band-limited interpola- optics, astronomy, and image processing. Her current research tion applied to the time series of rapidly oscillating stars,” deals with the reconstruction problems in signal and image Monthly Notices of the Royal Astronomical Society, vol. 308, processing with particular reference to universe sciences appli- no. 3, pp. 876–886, 1999. cations. She is presently the Managing Director of “Labora- [23] S. Roques and C. Thiebaut, “Spectral contents of astronomical toire d’Astrophysique” of Midi-Pyren´ ees´ Observatoire (Toulouse, unequally spaced time-series: contribution of time-frequency France). and time-scale analyses,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’03), vol. 6, pp. 473–476, Hong Kong, China, April 2003. [24] J. D. Scargle, “Wavelet methods in astronomical time series analysis,” in Applications of Time Series Analysis in Astronomy and Meteorology,T.S.Rao,M.B.Priestley,andO.Lessi,Eds., pp. 226–248, Chapman & Hall, New York, NY, USA, 1997. [25] T. Strohmer, “Computationally attractive reconstruction of band-limited images from irregular samples,” IEEE Trans. Im- age Processing, vol. 6, no. 4, pp. 540–548, 1997. [26] K. Szatmary,´ J. Vinko,´ and J. Gal,´ “Application of wavelet anal- ysis in variable star research. I. Properties of the wavelet map of simulated variable star light curves,” Astronomy and Astro- physics Supplement Series, vol. 108, pp. 377–394, 1994. [27] C. Thiebaut, M. Boer, and S. Roques, “Steps towards the development of an automatic classifier for astronomical sources,” in Astronomical Data Analysis II, J.-L. Starck and F. D. Murtagh, Eds., vol. 4847 of Proceedings of SPIE, pp. 379– 390, Waikoloa, Hawaii, USA, August 2002. [28]C.TorrenceandG.B.Compo,“Apracticalguidetowavelet analysis,” Bulletin of the American Meteorological Society, vol. 79, no. 1, pp. 61–78, 1998. [29] J. Ville, “Theorie´ et applications de la notion de signal analy- tique,” Cables et Transmission, vol. 2A, no. 1, pp. 61–74, 1948 (French). [30]E.Zsoldos,“RVTauriandtheRVBphenomenon.I.Photom- etry of RV Tauri,” Astronomy and Astrophysics Supplement Se- ries, vol. 119, no. 3, pp. 431–437, 1996.

C. Thiebaut was born in North of France in 1977. She received an Engineering degree of electronics and signal processing in 2000 from the Institut National Polytechnique de Toulouse (ENSEEIHT). She has got her Ph.D. degree in September 2003 in sig- nal and image processing with a Fellowship from the Centre National de la Recherche Scientifique (CNRS). During her three years in the Center for the Study of Radiation in Space (CESR), she has led studies on astronomical image analysis, object detection, automatic classification using neural networks, EURASIP Journal on Applied Signal Processing 2005:15, 2500–2513 c 2005 Hindawi Publishing Corporation

Restoration of Astrophysical Images—The Case of Poisson Data with Additive Gaussian Noise

H. Lanteri´ Laboratoire d’Astrophysique, Universit´e de Nice Sophia Antipolis, CNRS UMR6525, 06108 Nice Cedex 2, France Email: [email protected]

C. Theys Laboratoire d’Astrophysique, Universit´e de Nice Sophia Antipolis, CNRS UMR6525, 06108 Nice Cedex 2, France Email: [email protected]

Received 28 May 2004; Revised 28 October 2004

We consider the problem of restoring astronomical images acquired with charge coupled device cameras. The astronomical object is first blurred by the point spread function of the instrument-atmosphere set. The resulting convolved image is corrupted by a Poissonian noise due to low light intensity, then, a Gaussian white noise is added during the electronic read-out operation. We show first that the split gradient method (SGM) previously proposed can be used to obtain maximum likelihood (ML) iterative algorithms adapted in such noise combinations. However, when ML algorithms are used for image restoration, whatever the noise process is, instabilities due to noise amplification appear when the iteration number increases. To avoid this drawback and to obtain physically meaningful solutions, we introduce various classical penalization-regularization terms to impose a smoothness propertyonthesolution.WeshowthattheSGMcanbeextendedto such penalized ML objective functions, allowing us to obtain new algorithms leading to maximum a posteriori stable solutions. The proposed algorithms are checked on typical astronomical images and the choice of the penalty function is discussed following the kind of object. Keywords and phrases: restoration, astronomic images, Poisson transformation, MAP estimation, regularization, iterative algo- rithms.

1. INTRODUCTION considered, only the adequacy of the solution to the data is The image restoration problem and particularly image de- taken into account; so, when iterative algorithms are used, convolution, is an inverse problem, ill posed in the sense instabilities appear due to the noise amplification when the of Hadamard, whose solution is unstable when the data is iteration number increases. In this context, to obtain physi- corrupted by noise. For astronomical data, two noise pro- cally satisfactory solutions, the iterative process must be in- cesses are generally considered separately. The first one is terrupted before instabilities appears. Another way to avoid an additive Gaussian noise appearing in high intensity mea- this drawback is to perform an explicit regularization of the surements; in this case, the maximum likelihood estimator problem, that is, to introduce an a priori knowledge to im- (MLE) under positivity constraint is obtained, for example, pose, for example, a smoothness property on the solution, using the ISRA multiplicative iterative algorithm [1]. The then, a maximum a posteriori (MAP) solution is searched second one, dedicated to low intensity data, is a Poisson for. noise process; in this case the MLE is obtained from the ex- In previous papers of Snyder the regularization is per- pectation maximization (EM) Richardson-Lucy iterative al- formed by means of sieves functions. In the papers of Llacer gorithm [2]. In a more realistic but less used model, both and Nunez,˜ the proposed penalized iterative algorithm does noise processes must be taken into account simultaneously. not ensure neither the convergence nor the positivity. We will We will describe this model previously analyzed by Snyder et show that the SGM can be used to obtain regularized itera- al. [3, 4, 5, 6]andbyLlacerandNunez˜ [7, 8, 9]. We will show tive algorithms for a Poisson plus Gaussian noise model. In that the corresponding MLE iterative algorithm can be eas- so doing, we exhibit regularized iterative multiplicative algo- ily obtained using the split gradient method (SGM) we have rithms with constraints typical of astronomical imagery and previously proposed for Poisson or Gaussian noise [10, 11]. we show their effectiveness for some classical penalty func- However, in all the ML methods, whatever the noise process tions. Restoration of Astrophysical Images 2501

1.1. Optical astronomy imagery: Poisson data with where n is a Poisson process of mean additive Gaussian noise = The light emanating from the object of interest propagates Hx + d z,(7) through a turbulent atmosphere and is focused onto the = = ··· charge coupled device (CCD) by an imperfect optical system H TW, T diag(t1t2 tN ), and W is the classical block Toeplitz matrix for the convolution matrix form. Note that if that limits the resolution and introduces aberrations. = = The overall effect of the atmosphere and optical system there is no flat field, T I, H W. can be mathematically described by a convolution operation In the deconvolution problem for astronomical imaging, the object is generally composed of bright objects on a sky between the object x and the point spread function (PSF) w of the whole system. The response of the CCD detector be- background, assumed constant. This particularity of the as- trophysical object must be taken into account to avoid the ing nonuniform, a function T proportional to the quantum detector efficiency, called flat field, must be introduced in the “ringing” phenomena appearing in the vicinity of abrupt in- tensity variations. Moreover, in the convolution operation model [5, 6], giving a first deterministic transformation f of the object: with normalized kernels, the total intensity of the object is maintained. f (r, s) = T(r, s) w(r, s) ⊗ x(r, s) ,(1)The problem is then to restore the object x from the data y with the constraint x >m,see(2), and the total intensity where r and s are spatial coordinates and ⊗ denotes the two- conservation. dimensional convolution. W is generally obtained via separated calibration mea- Moreover, in astronomical imaging, x is generally the surementsaswellastheflatfieldtableT, d can be mea- sum of the sky background m and of the astronomical ob- sured, and the parameters of the Gaussian noise are generally ject field u, then known characteristics of the CCD [5, 6].

x(r, s) = u(r, s)+m(r, s). (2) 1.3. Image restoration method Consequently, a natural constraint appears for the object: The restoration method generally proposed for the previ- x ≥ m. In each pixel of the sensor, the interaction between ous model [5, 6, 7, 8, 9] is founded on the MLE and the the incident photons and the photosensitive material of the corresponding iterative algorithms are developed from the CCD creates photoelectrons in proportion to the number of EM method [12, 13]. As the deconvolution problem is an photons plus extraneous electrons due to heat and bias ef- ill-posed problem, instability in the solution appears as the fects. This photoconversion process is classically character- number of iterations increases. The problem is then to stop ized by a Poisson transformation of mean f + d,whered is them to get a physically satisfactory solution and for that, to the mean of the Poisson process for the extraneous electrons. determine the optimal iteration number [14, 15]. Finally, the detector is read by an electronic process which Another way to avoid instabilities is to regularize ex- adds a white Gaussian read-out noise b ∼ N (g, σ2), indepen- plicitly the problem. Numerous relevant methods have been dent of the Poisson process. We get then the observed image used in the literature (see [16] and references therein). The y(r, s)ofmean f (r, s)+d(r, s)+g. method of sieves, for example, was proposed by Grenan- der [17] and applied to image restoration by Snyder et al. 1.2. Imaging model and problem statement [3, 5, 18]. However, as mentioned in [19], penalized MLE outperforms sieves. Therefore, we focus on the basic princi- In the following, we use capital letters for N × N arrays and ples of the regularization by a penalty function. In this case bold letters for N × 1vectors,subscripti denotes the pixel i of the image lexicographically ordered. a penalty term is added to the likelihood term with the aim From the description in Section 1.1, a realization of the of introducing to the solution prior information, generally a smoothness property, (see, e.g., Bertero [20, 21, 22], De- value of the image in the pixel i can be modeled as moment [23], Titterington [24]).Therelativeweightofthe penalty versus the likelihood allows us to “pull” the solu- yi = ni + bi,(3) tion either towards the ML or towards the prior, changing where ni is a realization of a Poisson process of mean the MLE in the MAP estimation. This approach has been followed by Llacer and Nunez˜ ti(Wx)i + di = (Hx)i + di = zi,(4)[7, 8, 9], for the model (6)and(7) using as penalty function the Shannon cross entropy between the solution and a con- = where hi,j tiwi,j are the elements of H satisfying stant prior for the object. However, as mentioned in [25], the proposed algorithm has a significant computational burden, wj,i = 1, hj,i = ai = 1 ∀i. (5) the convergence is not guaranteed and the positivity of the j j solution is not always ensured. In the present work, we first show that the MLE algo- In matrix notation, rithm can be obtained using the SGM previously proposed [25], then the SGM is extended to the regularized approach, y = n + b,(6)in a rigorous way, using various convex penalty functions. 2502 EURASIP Journal on Applied Signal Processing

The paper is organized as follows. In Section 2 the MAP with estimation problem is given for the model Poisson plus Gaus- pj sian noise. The general algorithmic method is described for r = , j q variouspenaltyfunctionsinSection 3.InSection 4 simula-  j  2 tions on astronomical images are presented and discussed. yj − ni − g n = exp −  i ni , Finally, Section 5 proposes a discussion on the generalization pj 2 zj (14) n 2σ ni! of the proposed method to others applications. i   2 yj − ni − g 1 q = exp −  zni , j 2σ2 n ! j 2. POISSON NOISE PROCESS WITH ni i ADDITIVE GAUSSIAN NOISE 2.1. Likelihood function or, in matrix notation, for the whole vector x,  Following the model (3)and(4), we denote by zi the mean T 1 ∇J1(x) = H diag (Hx + d − r). (15) of the Poisson process in the pixel i.Thenumberni of pho- (Hx + d) toelectrons generated by this process has the classical proba- bility law As indicated in Section 1, to avoid instability of the MLE es- timator, we propose to regularize the problem, that is, to add ni zi to the objective function J1(x) a penalty term J2(x). P ni|zi = exp − zi . (8) ni! 2.2. Regularization During the read-out step, the integer ni is corrupted by a We will consider here two kinds of regularization functions: 2 Gaussian additive noise of mean g and variance σ giving a quadratic ones and entropic ones. process yi with a Gaussian probability law   2.2.1. Quadratic regularization 2 1 y − n − g P y |n = √ exp − i i  . (9) We consider a regularization in the sense of Tikhonov [26]: i i σ 2π 2σ2 1 J (x) = x − x¯ 2. (16) Then 2 2 It is a quadratic distance between x and an a priori solution P yi|zi = P yi|ni P ni|zi n x¯, called the “default image” [15, 27]. If there is no a priori i   ∞ 2 information, the default solution can be chosen as a constant 1 y − n − g zni = √ − i i  i − value p for a basic smoothness constraint; the choice of the exp 2 exp zi . = σ 2π 2σ ni! ni 0 value of p is not critical, it is frequently chosen equal to zero, = (10) however we choose to take p i xi/N, such that the total intensity conservation constraint is fulfilled also for the prior, With assumption of independence between pixels, the likeli- see Section 3.1. In this case, the gradient of J2(x)is hood function for the image y is ∇J2(x) = x − p. (17) L(y) = P(y|z)   Another common choice for J is the Laplacian operator − − 2 n 2 = √1 − yi ni g  zi − exp 2 exp zi . σ 2π 2σ ni! 1 2 i ni J2(x) = Cx , (18) (11) 2 which can be expressed in the form (16) Taking the negative log and dropping the terms independent of x,weobtaintheobjectivefunction 1 2 J2(x) = x − Ax , (19)     2 2 zni y − n − g J (x) = z − log  i exp − i i  . where A is deduced from the mask 1 i n ! 2σ2 i ni i   1 (12)  0 0   4  Then the MLE is obtained by minimizing J1(x)versusx with  1 1  the lower bound and the intensity conservation constraints,  0  (20)  4 4  as explained in Section 1.3. The gradient of J , for the pixel i,  1  1 0 0 is 4

r ∇ = − j and Ax is the matrix form notation for the convolution oper- J1(x) i hji hji , (13) j zj ation between x and this mask. The corresponding gradient Restoration of Astrophysical Images 2503

k is then the pixel i, αi > 0 is the relaxation factor, k is the iteration index, f (x) is a function having positive values when x sat- T T ∇J2(x) = I + A A − A − A x. (21) isfies the constraints. To obtain “product form” algorithms, the split-gradient method (SGM) is used. It can be summa- In this case, the solution is implicitly biased towards a rized as follows: the convex function J(xk, γ) admits a finite smoothed version of the solution. We note that A can be gen- unconstrained global minimum given by ∇J(xk, γ) = 0, then eralized to any form of lowpass filter. we can write 2.2.2. Entropic distances −∇J xk, γ = U xk, γ − V xk, γ , (26) We use for entropic regularization the Csiszar¨ directed diver- gence [28], a generalized form of the kullback-leibler (KL) distance between the solution and an a priori or default so- where U(xk, γ)andV(xk, γ) are two positive functions for all lution x¯: xk ≥ m.From(24), the total gradient can be decomposed as

xi −∇ k =−∇ k − ∇ k J2(x) = KL(x, x¯) = xi log + x¯i − xi . (22) J x , γ J1 x γ J2 x . (27) i x¯i −∇ k −∇ k In the same way as for the quadratic distance, x¯ can be a con- Splitting J1(x )and J2(x )asin(26), we have stant p,withp>0, or a smoothed version of the solution. We will consider here only the case of a constant prior,inorder k k k k k −∇J x , γ = U1 x − V1 x + γ U2 x − V2 x , to compare the results with those of [9]. The corresponding k k k gradient is U x , γ = U1 x + γU2 x , k k k V x , γ = V1 x + γV2 x . ∇J2(x) = log(x) − log(p). (23) (28) Note that we can also consider the distance between x¯ and x: KL(x¯, x), which gives more or less the same practical results Taking [29]. 1 f xk, γ = > 0, (29) i V xk, γ 3. GENERAL ALGORITHMIC METHOD i 3.1. SGM method (25)becomes The regularized problem is to solve a problem of minimiza- tion under constraints, that is, k − k+1 = k k k xi m k − k xi C xi + αi k Ui x , γ Vi x , γ . (30) (i) minimize with respect to x: Vi x , γ

= J(x, γ) J1(x)+γJ2(x); (24) k+1 − ≥ The maximum stepsize that ensures xi m 0, for all i, (ii) with the constraints for all k is given by (1) lower bound, xi − m ≥0forall i; = − − (2) energy conservation, i xi i(yi di g)/ti. 1 αk = min , (31) m C k k i∈ 1 − Ui x , γ /Vi x , γ J(x) consists of two terms, J1(x) that expresses the consis- tency with the data and J2(x), the term of regularization; γ is k the regularization parameter, which tunes the relative weight where C is the set of index i such that (∇J(x , γ))i > 0and k k k = between the two terms. We note that the considered func- xi >m;clearlyαm > 1, then for α 1, the constraint is k tions J1(x)andJ2(x)areconvex. always fulfilled. The optimal step size αc independent of i en- k We propose to devise algorithms deduced from the suring convergence must be computed in the range ]0, αm] k Kuhn-Tucker (KT) conditions in the general modified gra- (or ]0, αm[ if a strict inequality constraint is required) by a dient form [10, 11, 30]: line search procedure (see, e.g., [31, 32, 33, 34]) with the de- scent direction k+1 = k k k k k − − ∇ k xi C xi + αi fi x , γ xi m J x , γ i . k − (25) ρk = xi m k − k diag k U x , γ V x , γ . (32) Vi x , γ The initial estimate x0 is chosen as a constant value, such that all the constraints are fulfilled, Ck is a normalization This direction is no more the negative gradient but it re- factor for the total intensity conservation, subscript i is for mains a descent direction for J(x). With a unit step size, the 2504 EURASIP Journal on Applied Signal Processing

Table 1: Functions U and V following the kind of regularization.

Function Likelihood Quadratic, x¯ = p Quadratic, x¯ = Ax Entropic, x¯ = p

1 x U U = HT diag r U = γp U = γ AT + A x U =−γ log  1 (Hx + d) 2 2 2 x i i i T p V V1 = a,see(5) V2 = γx V2 = γ I + A A x V2 =−γ log i pi

250 20 0.8 200 500 40 0.6 150 60 1000 100 80 0.4

100 1500 50 0.2 120 0 500 1000 1500 20 40 60 80 100 120

(a) (b)

250 250

20 20 200 200 40 40 150 150 60 60

80 100 80 100

100 100 50 50

120 120 0 0 20 40 60 80 100 120 20 40 60 80 100 120

(c) (d)

Figure 1: (a) HST picture, (b) PSF, (c) star, (d) point sources. algorithm (30) takes an interesting simple form: 4. NUMERICAL ILLUSTRATIONS k 4.1. Object k+1 = k k − Uix , γ xi C m + xi m . (33) The proposed algorithm has been illustrated on a pic- V xk, γ i ture taken from the hubble space telescope (HST) site, The normalization, that is, the computation of Ck,isper- http://hubblesite.org/gallery/. It is a -like star nearing the × formed following [10]. The convergence of (33) is not the- end of its life, Figure 1a. Two 128 128 subpictures have oreticallyensuredbuthasalwaysbeenobservedinprac- been extracted from the HST image, one centered on the tice. To ensure the theoretical convergence of (30) without main structure, Figure 1c, the other containing bright point a dramatic increase of the computational cost, economic line sources, Figure 1d, where the same constant background has search using the Armijo rule [31] or the Goldstein rule [35] been added to both pictures. These two parts have been se- k lected in order to study the effectiveness of various penalty can be used to compute αc ,asmentionedin[32]. functions depending on the kind of pictures: more or less 3.2. Application to the Poisson Gaussian model diffuse. Table 1 gives the expressions of the functions U and V for the likelihood (12) and for several penalty functions, (16), (19), 4.2. PSF and (22). Algorithms can be obtained in a closed form by The data has been blurred with the normalized space- replacing these expressions in (30)or(33) using the decom- invariant PSF, Figure 1b. It is a realistic representation of position (28). the PSF of a ground-based telescope including the effects of Restoration of Astrophysical Images 2505

20 20 6 3

40 40

60 4 60 2

80 80 2 1 100 100

120 120 20 40 60 80 100 120 20 40 60 80 100 120

(a) (b)

20 10 20 20

8 40 15 40

60 60 6 10 80 80 4

100 0 100 2

120 120 20 40 60 80 100 120 20 40 60 80 100 120

(c) (d)

Figure 2: Normalized star: (a) 20 000 photons and (b) 10 000 photons. Normalized point sources: (c) 20 000 photons and (d) 10 000 photons.

Table 2: Image characteristics. age, m = 0.01, σ2 = 0.1 (set A) corresponding to a fairly noisy image, shown in Figures 6a and 10a; the other one with Set AB10 000 photons, m = 0.1, σ2 = 1 (set B) represented in Fig- Photons 20 000 10 000 ures 8a and 12a corresponding to a highly noisy image, see g 0.01 0.1 Table 2 for a summary of the characteristics. Gaussian noise σ2 0.11Figure 2 shows the objects normalized to the intensity of the noisy images, allowing a quantitative comparison of the results (normalized to the same quantity) with the objects. the atmospheric turbulence. The telescope aperture P(r)is simply given by an array of points “1” inside a circle and 4.4. Implementation “0” outside, the wavefront error δ(r)isanarrayofran- The HST images have been reconstructed using the algo- dom numbers smoothed by a lowpass filter. The quantity rithm (30), the behavior of the algorithm for the two pic- P(r)exp(2iπδ(r)/λ) represents the telescope aperture with tures is studied as a function of the regularization function the phase error; for the simulation, the peak-to-peak phase and of the “level” of noise in the raw images. The summation variation is small (less than π) and may correspond to typi- over the number of photons ni in the expressions of pj and cal telescope aberrations, see [10] for more details. The main qj (14) has been reduced to max(ni)+g +3σ corresponding interest of such a PSF simulation is that the optical trans- to significant values for the exponential term. fer function is a lowpass filter, limited in spatial frequencies To stop the iterative procedure before noise amplification to the extent of the aperture autocorrelation function; the and/or to check the quality of the restoration process, we use blurred image is then strictly band limited corresponding to a criterion based on the Euclidean distance (k, γ)between a realistic situation. the true object x∗ and the reconstructed one, computed as a function of k and γ: 4.3. Noise A Poisson transform is applied on the blurred image and fi- k − ∗2 nally a Gaussian noise is added. Two sets of noise parameters xγ x (k, γ) = . (34) have been chosen, the first one with 20000 photons in the im- x∗2 2506 EURASIP Journal on Applied Signal Processing

1.5

1.5

) 1 )

k,γ k,γ 1 ( (  

0.5 0.5

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 k k

γ = 0 γ = 2 γ = 0 γ = 0.04 γ = 1 γ = 3 γ = 0.02 γ = 0.05 γ = 0.03

Figure 3: Reconstruction error for the quadratic penalty with vari- Figure 5: Reconstruction error for the entropic penalty with con- able prior;extendedobject,setA. stant prior;brightpointsobject,setB.

penalty function, the prior, and the regularization parame- ter. The choice of these points depends on the properties of 1.5 the image, on the amount of noise, and on the expected ob- jective. The regularization factor is heuristically chosen here by selecting the one corresponding to an asymptotic mini- 

) mum value of (k, γ), see Figures 3, 4,and5 for the behavior  k,γ 1 of (k, γ). The best reconstructed images are given following (  the regularization function in Figures 6, 7, 8 and 9 for the ex- tended object and in Figures 10, 11, 12 and 13 for the bright point sources. 0.5 Note that statistical analysis including a lot of noise real- izations should be carried out to get significant quantitative results, implying explosive computational cost; consequently 10 20 30 40 50 60 70 80 90 100 only qualitative remarks and comparative conclusions in re- k gard to the penalty functions are given.

= = γ 0 γ 0.02 4.5. Results for extended objects γ = 0.01 γ = 0.03 4.5.1. Set A

Figure 4: Reconstruction error for the quadratic penalty with con- Results obtained for the extended object are shown in Fig- stant prior;extendedobject,setA. ures 6 and 7 for the set A and must be quantitatively com- pared to Figure 2a. The restoration is first performed using the nonpenalized algorithm (J(x) = J1(x)), in this case due to the semiconvergence of the algorithm, iterations must be Such a comparison cannot be made for a real case since the stopped before the noise increases. The result corresponding true object is not known. However, it allows a good charac- to the minimum of (k,0)isgiveninFigure 6b. terization of the behavior and performance of the algorithm A constant prior is then used in conjunction with a for simulated data, and it is useful to choose the regulariza- quadratic penalty function (16) or an entropic penalty func- tion parameter. tion (22); the corresponding behaviors of the restoration er- Indeed, the search for the correct regularization param- ror for several values of γ are given in Figures 4 and 5.The eter remains an open problem. Some methods such as “L minimum reconstruction error is always reached for γ = 0; curves” or “generalized cross validation” have been proposed, for γ increasing the curve tends to be monotonic and the cor- typically for pure Gaussian noise [36, 37, 38]. In fact, in the responding error increases. The best reconstructed objects regularization scheme, three elements must be chosen: the are shown in Figures 6c and 6d, respectively, for the quadratic Restoration of Astrophysical Images 2507

8

20 20 10 6 40 40

60 60 4 5 80 80

100 100 2

120 0 120 20 40 60 80 100 120 20 40 60 80 100 120 (a) (b)

6 20 20 5 6 40 40 4 60 60 4 3 80 80 2 2 100 100 1 120 120 20 40 60 80 100 120 20 40 60 80 100 120 (c) (d)

2 Figure 6: (a) Raw picture, 20000 photons, m = 0.01, σ = 0.1. (b) γ = 0, k = 5, Emin = 0.2956. (c) Quadratic penalty with constant prior, γ = 0.03, k = 5, Emin = 0.3099. (d) Entropic penalty with constant prior, γ = 0.03, k = 6, Emin = 0.2965.

6 50 20 20 5 40 40 40 4 30 60 60 3 80 80 20 2 100 100 1 10 120 120 20 40 60 80 100 120 20 40 60 80 100 120 (a) (b)

15 20 6 20

40 40 10 4 60 60

80 80 5 2 100 100

120 120 20 40 60 80 100 120 20 40 60 80 100 120 (c) (d)

Figure 7: (a) Quadratic penalty with variable prior, γ = 3, k = 100, Emin = 0.3315. (b) γ = 0, k = 100, E = 1.4318. (c) Quadratic penalty with constant prior, γ = 0.03, k = 100, E = 0.404. (d) Entropic penalty with constant prior, γ = 0.03, k = 100, E = 0.5286. 2508 EURASIP Journal on Applied Signal Processing

8 20 20 3 6 40 40 4 60 60 2 2 80 80 0 1 100 100 −2 120 120 −4 20 40 60 80 100 120 20 40 60 80 100 120 (a) (b)

3 20 20 3 2.5 40 40 2 60 60 2 1.5 80 80 1 1 100 100 0.5 120 120 20 40 60 80 100 120 20 40 60 80 100 120 (c) (d)

2 Figure 8: (a) Raw picture, 10 000 photons, m = 0.1, σ = 1. (b) γ = 0, k = 6, Emin = 0.3490. (c) Quadratic penalty with constant prior, γ = 0.04, k = 7, Emin = 0.3608. (d) Entropic penalty with constant prior, γ = 0.04, k = 8, Emin = 0.3544.

30

20 2.5 20 25

40 2 40 20 60 60 1.5 15 80 80 1 10 100 100 0.5 5 120 120 20 40 60 80 100 120 20 40 60 80 100 120 (a) (b)

20 20 6 3 5 40 40 4 60 2 60 3 80 80 2 1 100 100 1 120 120 20 40 60 80 100 120 20 40 60 80 100 120 (c) (d)

Figure 9: (a) Quadratic penalty with variable prior, γ = 5, k = 100, E = 0.3857. (b) γ = 0, k = 100, E = 1.5989. (c) Quadratic penalty with constant prior, γ = 0.04, k = 100, E = 0.4977. (d) Entropic penalty with constant prior, γ = 0.04, k = 100, E = 0.5006. Restoration of Astrophysical Images 2509

20 15 20 20 15 40 40 10 60 60 10 80 80 5

100 100 5

120 0 120 20 40 60 80 100 120 20 40 60 80 100 120 (a) (b)

15 20 20 15

40 40 10 60 60 10

80 80 5 5 100 100

120 120 0 20 40 60 80 100 120 20 40 60 80 100 120 (c) (d)

2 Figure 10: (a) Raw picture, 20 000 photons, m = 0.01, σ = 0.1. (b) γ = 0, k = 9, Emin = 0.4147. (c) Quadratic penalty with constant prior, γ = 0.005, k = 8, Emin = 0.4469. (d) Entropic penalty with constant prior, γ = 0.03, k = 13, Emin = 0.4277.

12 60 20 20 10 50 40 40 8 40 60 60 6 30 80 80 4 20 100 100 2 10 120 120 20 40 60 80 100 120 20 40 60 80 100 120 (a) (b)

20 15 20 20

40 40 15 10 60 60 10 80 80 5 100 100 5

120 120 20 40 60 80 100 120 20 40 60 80 100 120 (c) (d)

Figure 11: (a) Quadratic penalty with variable prior, γ = 0.8, k = 100, Emin = 0.5435. (b) γ = 0, k = 100, E = 1.209. (c) Quadratic penalty with constant prior, γ = 0.005, k = 100, E = 0.6622. (d) Entropic penalty with constant prior, γ = 0.03, k = 100, E = 0.5449. 2510 EURASIP Journal on Applied Signal Processing

12 10 20 20 10

40 40 8 5 60 60 6 80 80 4 0 100 100 2 120 120 20 40 60 80 100 120 20 40 60 80 100 120 (a) (b)

10 20 6 20 8 40 40

4 6 60 60

80 80 4 2 100 100 2

120 120 20 40 60 80 100 120 20 40 60 80 100 120 (c) (d)

2 Figure 12: (a) Raw picture, 10 000 photons, m = 0.1, σ = 1. (b) γ = 0, k = 9, Emin = 0.47847. (c) Quadratic penalty with constant prior, γ = 0.03, k = 8, Emin = 0.5495. (d) Entropic regularization, γ = 0.05, k = 9, Emin = 0.5043.

5 30 20 20 4 25 40 40 20 60 3 60 15 80 2 80 10 100 1 100 5 120 120 20 40 60 80 100 120 20 40 60 80 100 120 (a) (b)

12 20 6 20 10 40 40 8 60 4 60 6 80 80 4 2 100 100 2 120 120 20 40 60 80 100 120 20 40 60 80 100 120 (c) (d)

Figure 13: (a) Quadratic penalty with variable prior, γ = 4, k = 100, Emin = 0.6131. (b) γ = 0, k = 100, E = 1.1896. (c) Quadratic penalty with constant prior, γ = 0.03, k = 100, E = 0.6432. (d) Entropic regularization, γ = 0.05, k = 100, Emin = 0.5256. Restoration of Astrophysical Images 2511 penalty and the entropic one. Results are very similar as con- 4.6.2. Set B firmed by values of the reconstruction error. Data with the noise of the set B is in Figure 12a.Thecon- Figures 7b, 7c,and7d give the asymptotic reconstructed trast between the background and the point sources is re- images (k = 100) for respectively γ = 0, a quadratic regu- duced. The nonpenalized algorithm gives the best restora- larization, and an entropic regularization. Figure 7b exhibits tion, shown in Figure 12b. The best results in the sense of the typical behavior of the MLE when the iteration number the minimization of (k, γ)aregiveninFigures12c and 12d, increases due to the noise amplification, see the color table respectively, for the quadratic and the entropic penalties with in Figure 7. The effect of the regularization clearly appears constant prior. All the results are to compare to the objects of in Figures 7c and 7d and especially in the case of a quadratic Figure 2d. The regularization parameter is more difficult to penalty, for asymptotic iteration number. The result obtained find with respect to the extended image and changes from from the entropic penalty is unsatisfactory and is not im- one function to another. proved by an increase of the regularization factor. It has been For a large iteration number, results are given in Figures observed in [10] that the speckled effect obtained is typical 13b, 13c,and13d. As expected, looking at the error curves, of the constant prior. In order to confirm this observation, we see that the nonpenalized algorithm leads to a bad recon- the result obtained with a quadratic penalty using the Lapla- struction and corresponds to a large error while results for cian operator (18)and(19) is shown in Figure 7a.Thecor- the penalized algorithms are correct with a stabilization of responding reconstruction error is given in Figure 3;forγ the reconstruction error. greater than 3, (k, γ) decreases monotonically and the value As for the set A, the Laplacian regularization gives a reached by the error at high iteration number (k = 100) is less satisfactory result since the regularization has spread the very close to the minimum error obtained for γ = 0and point sources. k = 5. The reconstructed image is clearly better than those In conclusion, for point-like objects, the results obtained obtained with the constant prior. with penalized algorithms are good for the Tikhonov regu- larization whatever the prior image: constant or a lowpass 4.5.2. Set B version of the solution. Entropy penalty function leads to an The raw data with the set of parameters B is clearly much excessive resolution, even if the reconstruction error is low. noisier, Figure 8a. Results obtained must be compared to Figure 2b. They are very similar to those obtained with the less noisy image, the number of iterations and the regular- 5. DISCUSSION ization parameter corresponding to the best reconstructed The method proposed here can be applied to all the inverse image are slightly larger, Figures 8b, 8c,and8d. The error problems described by a first-kind Fredholm integral. Two (k, γ) presents the same behavior with higher values. The points must be outlined. First is the exact kind of noise: Laplacian operator gives better results than the other penalty Gaussian, Poisson, or, as considered here, Poisson plus Gaus- functions for 10 000 photons as well, Figure 9a. Asymptotic sian. The second point deals with the properties of the kernel results are given in Figure 9. of the integral equation, space invariant or not, separable or not. 4.6. Results for bright points object Concerning the first point, the case of pure Gaussian 4.6.1. Set A noise corresponds to high intensity imagery as in the infrared The same algorithms have been applied on the bright points case, while the pure Poisson noise corresponds to low light source for the set A, Figure 10a. The results of the reconstruc- level with perfect detectors. These two cases have been ana- tion must be compared to Figure 2c. lyzedinpreviouspapers[10, 11]. The case of Poisson data Note that the photons are more concentrated into small with additive Gaussian noise, considered here, corresponds circular parts. The first result is for the nonpenalized algo- to the more realistic situation of low light level imagery with rithm and the best result is shown in Figure 10b. Best re- imperfect CCD detectors. sults in the sense of the minimization of (k, γ)aregiven Concerning the second point, the matrix notation used in Figures 10c and 10d, respectively, for the quadratic and in this paper is general and can be applied whatever the prop- the entropic regularizations with a constant prior. Regular- erties of the kernel are. But in the case of imagery, the ma- ization parameters and the optimal iteration number are sig- trix form implies that the object and the image are lexico- nificantly different but the reconstruction error is approxi- graphically ordered vectors, then for n × n objects and im- matively the same, this is a first difference in regard to the ages, the corresponding matrix W will be n2 × n2,(recall extended image. that Hx = T(Wx)). If the kernel w does not exhibit any For a large iteration number, results are given in Figures specific property, the computations must be performed in 11b, 11c,and11d. As expected, looking at the error curves, matrix form with a heavy computational cost. Fortunately, we see that the nonpenalized algorithm gives a bad recon- for space-invariant kernels (convolution), the matrix W has struction and corresponds to a large error while results for a Toeplitz or block-Toeplitz structure, simplifying computa- the penalized algorithms are correct with a stabilization of tions. Indeed, the matrix expression Wx corresponds to the the reconstruction error. convolution operation w(r, s)⊗x(r, s) which is performed by The Laplacian regularization gives a poorer result with a means of fast Fourier transform. The operation (T)implies spreading effect on the point sources. only point-to-point products. Then the proposed algorithm 2512 EURASIP Journal on Applied Signal Processing is of general use whatever the exact properties of the kernel space-variant noise suppression,” Astronomy and Astrophysics w(r, s). Supplement Series, vol. 131, no. 1, pp. 167–180, 1998. [8] J. Nunez˜ and J. Llacer, “A general Bayesian image reconstruc- 6. CONCLUSION tion algorithm with entropy prior: preliminary application to HST data,” Publication of the Astronomical Society of the Pa- In this paper, we consider mainly the problem of restor- cific, vol. 105, no. 692, pp. 1192–1208, 1993. ing astronomical images acquired with CCD cameras. The [9] J. Llacer and J. Nunez,˜ “Iterative maximum likelihood and nonuniform sensitivity of the detector elements (flat field) is bayesian algorithms for image reconstruction in astronomy,” taken into account and the various noise effects such as the in The Restoration of Hubble Space Telescope Images,R.L. statistical Poisson effect during the image formation process White and R. J. Allen, Eds., pp. 62–69, The Space Telescope Science Institute, Baltimore, Md, USA, 1990. and the additive Gaussian read-out noise are taken into ac- [10] H. Lanteri,´ M. Roche, and C. Aime, “Penalized maxi- count. We first show that applying the split gradient method mum likelihood image restoration with positivity constraints: (SGM), maximum likelihood algorithms can be obtained in multiplicative algorithms,” Inverse problems,vol.18,no.5, a rigorous way; the relaxed convergent form of such algo- pp. 1397–1419, 2002. rithm is exhibited and it has been demonstrated that the [11] H. Lanteri,´ M. Roche, O. Cuevas, and C. Aime, “A gen- EM algorithms are nonrelaxed versions of the proposed algo- eral method to devise maximum-likelihood signal restoration multiplicative algorithms with non-negativity constraints,” rithm. The proposed method can be systematically applied Signal Processing, vol. 81, no. 5, pp. 945–974, 2001. in a rigorous way (ensuring convergence and positivity of [12] A. D. Dempster, N. M. Laird, and D. B. Rubin, “Maximum the solution) to MAP estimation for various convex penalty likelihood from incomplete data via the EM algorithm,” Jour- functions. In this paper three penalty functions have been nal of the Royal Statistical Society—Series B, vol. 39, no. 1, developed: the quadratic one with either constant or smooth pp. 1–38, 1977. prior and the entropic with constant prior. Previous attempts [13] Y. Vardi, L. A. Shepp, and L. Kaufmann, “A statistical model of regularization for this model using sieves has been pro- for positron emission tomography,” Journal of the American Statistical Association, vol. 80, no. 389, pp. 8–37, 1985. posed by Snyder et al. [3, 4, 6], however the penalty method [14] H. Lanteri,´ R. Soummer, and C. Aime, “Comparison between proposed outperforms “sieves” [19]. Another approach, pro- ISRA and RLA algorithms. Use of a Wiener filter based stop- posed by Llacer and Nunez˜ [7, 8], uses the penalized ML ap- ping criterion,” Astronomy and Astrophysics Supplement Series, proach with an entropy penalty function but here the con- vol. 140, no. 2, pp. 235–246, 1999. vergence and the positivity constraint are not always ensured [15] L. B. Lucy, “Optimum strategy for inverse problems in statis- [25], contrary to our method. The proposed algorithm has tical astronomy,” Astronomy and Astrophysics, vol. 289, no. 3, been checked on typical astrophysical images with antagonis- pp. 983–994, 1994. ff [16] J. A. Fessler and A. O. Hero, “Penalized maximum-likelihood tic characteristics: di use or bright points objects. We have image reconstruction using space-alternating generalized EM shown that a Laplacian operator gives satisfactory results for algorithms,” IEEE Trans. Image Processing, vol. 4, no. 10, extended objects and an over-smoothed solution for bright pp. 1417–1429, 1995. points objects. The reverse seems to occur when the prior is [17] U. Grenander, Abstract Inference, John Wiley & Sons, New a constant. York, NY, USA, 1981. [18] D. L. Snyder, T. J. Schultz, and J. A. O’Sullivan, “Deblurring REFERENCES subject to non negativity constraints,” IEEE Trans. Signal Pro- cessing, vol. 40, no. 5, pp. 1143–1150, 1992. [1] M. E. Daube-Witherspoon and G. Muehllehner, “An itera- [19] C. S. Butler and M. I. Miller, “Maximum a posteriori estima- tive image space reconstruction algorithm suitable for volume tion for SPECT using regularization techniques on massively ECT.,” IEEE Trans. Med. Imag., vol. 5, no. 2, pp. 61–66, 1986. parallel computers,” IEEE Trans. Med. Imag.,vol.12,no.1, [2] L. B. Lucy, “An iterative technique for the rectification of ob- pp. 84–89, 1993. served distributions,” Astronomic Journal, vol. 79, pp. 745– [20] M. Bertero, P. Boccaci, and F. Maggio, “Regularization meth- 754, 1974. ods in image restoration: an application to HST images,” In- [3] D. L. Snyder and M. I. Miller, “The use of sieves to stabilize the ternational Journal of Imaging Systems and Technology, vol. 6, images produced with the EM algorithm for emission tomog- pp. 376–386, 1995. raphy,” IEEE Trans. Nucl. Sci., vol. 32, no. 5, pp. 3864–3871, [21] M. Bertero, “Linear inverses and ill-posed problems,” Ad- 1985. vances in Electronics and Electron Physics, vol. 75, pp. 1–121, [4] D. L. Snyder, “Modification of the Richardson-Lucy iteration 1989. for restoring hubble space telescope imagery,” in Restoration [22] M. Bertero and P. Boccaci, Introduction to Inverse Problems in of Hubble Space Telescope Images, R. L. White and R. J. Allen, Imaging, I.O.P Publishing, London, UK, 1998. Eds., The Space Telescope Science Institute, Baltimore, Md, [23] G. Demoment, “Image reconstruction and restoration: USA, 1990. Overview of common estimation structures and problems,” [5] D. L. Snyder, A. M. Hammoud, and R. L. White, “Image re- IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 37, covery from data acquired with charge-coupled-device cam- no. 12, pp. 2024–2036, 1989. era,” Journal of the Optical Society of America A, vol. 10, no. 5, [24] D. M. Titterington, “General structure of regularization pro- pp. 1014–1023, 1993. cedures in image reconstruction,” Astronomy and Astrophysics, [6]D.L.Snyder,C.W.Helstrom,A.D.Lanterman,M.Faisal,and vol. 144, no. 2, pp. 381–387, 1985. R. L. White, “Compensation for readout noise in CCD im- [25] C. H. Wu and J. M. M. Anderson, “Novel deblurring algo- ages,” Journal of Optical Society of America A, vol. 12, no. 2, rithms for images captured with CCD cameras,” Journal of pp. 272–283, 1995. Optical Society of America A, vol. 14, no. 7, pp. 1421–1430, [7] J. Nunez˜ and J. Llacer, “Bayesian image reconstruction with 1997. Restoration of Astrophysical Images 2513

[26] A. Tikhonov and V. Arsenin, m´ethodes de r´esolution des probl`emes mal pos´es, Mir, Moscou, Russia, 1976. [27] K. Horne, “Images of accretion discs—the eclipse mapping method,” Monthly Notices of the Royal Astronomical Society, vol. 213, pp. 129–141, 1985. [28] I. Csiszar,¨ “Why least squares and maximum entropy? an ax- iomatic approach to inference for linear inverse problems,” The Annals of Statistics, vol. 19, no. 4, pp. 2032–2066, 1991. [29] C. L. Byrne, “Iterative image reconstruction algorithms based on cross-entropy minimization,” IEEE Trans. Image Process- ing, vol. 2, no. 1, pp. 96–103, 1993. [30] H. Lanteri,´ M. Roche, P.Gaucherel, and C. Aime, “Ringing re- duction in image restoration algorithms using a constraint on the inferior bound of the solution,” Signal Processing, vol. 82, no. 10, pp. 1481–1504, 2002. [31] L. Armijo, “Minimization of functions having continuous derivatives,” Pacific Journal of Mathematics, vol. 16, pp. 1–3, 1966. [32] D. P. Bertsekas, Nonlinear Programming, Athena Scientific, Belmont, Mass, USA, 1995. [33] M. Minoux, “Programmation mathematique—th´ eorie´ et algorithmes,” in Collection technique et scientifique des t´el´ecommunications, vol. 1, pp. XXXI–294, Paris, France, Dunod edition, 1983. [34] D. G. Luenberger, Introduction to Linear and Non Linear Pro- gramming, Addison Wesley, Reading, Mass, USA, 1973. [35] A. A. Goldstein, Constructive Real Analysis,Harper,NewYork, NY, USA, 1967. [36] A. Thompson, J. C. Brown, J. W. Kay, and D. M. Titterington, “A study of methods of choosing the smoothing parameter in image restoration by regularization,” IEEE Trans. Pattern Anal. Machine Intell., vol. 13, no. 4, pp. 326–339, 1991. [37] P. Hansen, “Analysis of discrete ill-posed problems by means of the L-curve,” SIAM Review, vol. 34, no. 4, pp. 561–580, 1992. [38] G. H. Golub, M. Heath, and G. Wahba, “Generalized cross- validation as a method for choosing a good ridge parameter,” Technometrics, vol. 21, no. 2, pp. 215–223, 1979.

H. Lanteri´ received the Ph.D. degree in electronics in 1968 and the Doctorat d’etat´ in 1978. He is a Professor of elec- trical engineering at the University of Nice Sophia Antipolis and is currently with the Laboratoire Universitaire d’Astrophysique de Nice (LUAN). His research interests in- clude inverse problems and information theory with application in astrophysics, im- age restoration, and image processing.

C. Theys was born in Paris, France, in 1967. She received the Ph.D. degree in electronic engineering in 1993 from the University of Nice Sophia Antipolis (UNSA), France. She is currently an Assistant Professor of elec- trical engineering at the Institut Universi- tairedeTechnologiedeNiceCoteˆ d’Azur, France. From 1995 to 2003, she was work- ing on digital signal processing for detec- tion and estimation with the Laboratoire d’Informatique Signaux et Systemes´ de Sophia (I3S). Since 2003, she is a member of the Laboratoire Universitaire d’Astrophysique de Nice (LUAN) and her research interests are inverse problems with application to astrophysical. EURASIP Journal on Applied Signal Processing 2005:15, 2514–2520 c 2005 Hindawi Publishing Corporation

A Data-Driven Multidimensional Indexing Method for Data Mining in Astrophysical Databases

Marco Frailis Dipartimento di Fisica, Universita` degli Studi di Udine, Via delle Scienze 208, 33100 Udine, Italy Email: frailis@fisica.uniud.it

Alessandro De Angelis INFN, Sezione di Trieste, Gruppo Collegato di Udine, Via delle Scienze 208, 33100 Udine, Italy Email: de angelis@fisica.uniud.it

Vito Roberto Dipartimento di Matematica e Informatica, Universita` degli Studi di Udine, Via delle Scienze 208, 33100 Udine, Italy Email: [email protected]

Received 1 June 2004; Revised 2 March 2005

Large archives and digital sky surveys with dimensions of 1012 bytes currently exist, while in the near future they will reach sizes of the order of 1015. Numerical simulations are also producing comparable volumes of information. Data mining tools are needed for information extraction from such large datasets. In this work, we propose a multidimensional indexing method, based on a static R-tree data structure, to efficiently query and mine large astrophysical datasets. We follow a top-down construction method, called VAMSplit, which recursively splits the dataset on a near median element along the dimension with maximum variance. The obtained index partitions the dataset into nonoverlapping bounding boxes, with volumes proportional to the local data density. Finally, we show an application of this method for the detection of point sources from a gamma-ray photon list. Keywords and phrases: multidimensional indexing, VAMSplit R-tree, nearest-neighbor query, one-class SVM, point sources.

1. INTRODUCTION the neighboring ones. Typical queries required by this kind At present, several projects for the multiwavelength obser- of analysis are the following: (i) point queries,tofindallob- vation of the universe are underway, for example, SDSS, jects overlapping the query point; (ii) range queries,tofind GALEX, POSS2, DENIS, and so forth [1]. In the next years, all objects having at least one common point with a query new spatial missions will be launched (e.g., GLAST, Swift [2, window; and (iii) nearest-neighbor queries,tofindallob- 3]), surveying the wall sky at different wavelengths (gamma- jects that have a minimum distance from the query object. ray, X-ray, optical). Another important operation is the spatial join,whichin In the astroparticle and astrophysical fields, data are the astrophysical field is needed to search multiple source ff mostly characterized by multidimensional arrays. For in- catalogs and cross-identify sources from di erent wave- stance, in X-ray and gamma-ray astronomy, the data gath- bands. ered by detectors are lists of detected photons whose prop- These multidimensional (spatial) data tend to be large erties include position (RA, DEC), arrival time, energy, error (sky maps can reach sizes of terabytes) requiring the integra- measures both for the position and the energy estimates (de- tion of the secondary storage, and there is no total ordering pendent on the instrument response), and quality measures on spatial objects preserving spatial proximity [4]. This char- ffi of the events. Source catalogs, produced by the analysis of acteristic makes it di cult to use traditional indexing meth- the raw data, are lists of point and extended sources charac- ods, like B+-trees or linear hashing. terized by coordinates, magnitude, spectral indexes, flux, and so forth. 2. AN OPTIMIZED R-TREE Data mining applied to multidimensional data analyzes the relationships between the attributes of a multidimen- The R-tree is a data structure meant to efficiently index mul- sional object stored into the database and the attributes of tidimensional point data or objects with a spatial extent [5]. An Indexing Method for Data Mining in Astrophysics 2515

The structure of an R-tree is the following: with no overlapping between MBBs. Moreover, the volume of the data space covered by each node (at a particular level) (i) an internal node of the R-tree has a number of entries is variable and dependent on data density. The main idea of of the form (cp, MBB), where cp is the address of a n this method is to recursively split the dataset on a near me- child node and MBB is the -dimensional minimum dian element along the dimension with maximum variance. bounding box of all entries in that child node; In particular, following the formalism given in [9], the index O (ii) a leaf node has a number of entries of the form ( id, construction algorithm comprises the following subtasks: MBB), where Oid refers to a record in the database de- scribing a particular object and MBB is the minimum (i) determine the tree topology: height and fanout of the bounding box of that object. For point data, the leaf internal nodes, and so forth; entries can also have the form (point, attributes), where (ii) compute the split strategy based on the tree topology; point is a coordinate in the n-dimensional space and (iii) use an external selection algorithm to bisect the data attributes are data associated to that point. on secondary storage; (iv) construct the index directory. A bounding box R is defined by the two endpoints S and T of its major diagonal in the n-dimensional data space: 2.1. Determination of the tree topology

R = (S, T), (1) The topology of a tree includes the height of the tree, the fanout of the internal nodes in each tree level, the capacity where of the leaf nodes, and the number of the data objects (i.e., records) stored in each subtree. The topology of the tree only S= s1, s2, ..., sn , T = t1, t2, ..., tn , si ≤ ti ∀1 ≤ i ≤ n. depends on static information which is invariant during the (2) construction such as the number of objects, the number of dimensions indexed, and the page capacity. B The level (or depth)ofanodex of the tree is the length Let be the maximum number of data objects in a data F (the number of nodes) of the path from the root r to the node page (i.e., a page storing a leaf node) and the fanout of x.Thefanout of a node x is the maximum number of entries a directory page (i.e., a page storing a nonleaf node). Then, using the floor and ceiling operations, respectively, indicated a node can have. The internal fanout is the fanout of nonleaf · · nodes (to be distinguished from the leaf fanout or capacity). by   and  , we have that Analogous to the B+-tree, the R-tree is a balanced tree and each node has a fanout dependent on the disk page size. page size B = , The dynamic R-tree (and its variant, the R*-tree [6]) de- size of (data object) fines particular insertion, deletion, and update operations to (3) reduce the overlapping between sibling nodes and guarantee F = page size . a minimum filling rate. size of (node entry) Usually, the analysis of astrophysical data is performed on a static dataset. In this case, using multiple insertions to build the index on the entire dataset is very slow: the cost is The maximum number of data objects in a tree with height h O(N logB N) I/O operations, where N is the number of input is MBBs and B is the number of MBBs fitting into a disk block. h An optimized index, in terms of construction time, mem- Cmax(h) = B · F . (4) ory occupied, and query performances, can be built using a priori information on the dataset by means of bulk loading Therefore, knowing the number N of data objects to be in- algorithms. Several bulk loading techniques have been pro- dexed, the height of the tree must be determined such that posed in the literature [7]. With these algorithms, the index C is greater than N. More formally, O N/B N/B max can be built with (( )log(M/B)( )) number of I/Os, where M is the number of MBBs fitting into main memory. The result is a near complete and balanced R-tree. The basic N N h = logF = logF . (5) idea used in these algorithms is the following: input MBBs or B B point data are sorted or partially sorted according to a cri- terion that preserves spatial proximity between adjacent ele- This corresponds to the height of the root node. The fanout ments in the ordering, then they are placed in the leaves in of the root node is evaluated considering its subtrees as com- that order. The rest of the tree is then built recursively in a plete trees with height h − 1 (the target index is a balanced bottom-up manner. tree). Hence, We have followed a top-down construction method called VAMSplit algorithm, described in [8], to build and op- N timized R-tree. This method preserves the spatial proximity fanout (h, N) = . (6) between sibling nodes, resulting in a partition of the dataset Cmax(h − 1) 2516 EURASIP Journal on Applied Signal Processing

2.2. The split strategy Given the topology of the target disk-based index tree, the split strategy is represented by a linear-space tree. For the VAMSplit algorithm, the split strategy is implicitly repre- sented by a binary tree, where, at each level, the dimension with maximum variance is chosen as the split dimension. Then, a near median element is selected as the split value andcomputedby  Figure 1: Minimum bounding boxes at different levels of the R-tree.  N ≤ ·  1 N if 2 cscap, gscap · ·  2 gscap gscap > 0, med = (7)   1 N Then, to test query performances, we built an optimized cscap · · otherwise, 2 cscap R-tree on the photons generated by a fast simulation of an entire year of observation for a total of 40.1 millions photons. where cscap stays for child subtree capacity and gscap stays Thesizeofeachphotonisof165bytes.Indexedattributesare for grandchild subtree capacity and again RA and DEC. The system in which we run the tests is a Pentium IV 2400 MHz with 512 MB (DDR 266 Mhz), an 80 GB 7200 rpm = C h − cscap max( 1), Ultra ATA/100 hard disk. The operating system is a standard cscap (8) Red Hat 9.0 and the page size is 4096 bytes. gscap = C (h − 2) = . max F The building of the R-tree index on the entire dataset re- quired 4 hours and 35 minutes. The result of the construction Hence, when N>2 · cscap, the split value is selected so that is a single index file with a size of 6.7 GB (it contains both the the left subtrees of the target index are fully utilized. When directory nodes and the data itself). N<2 · cscap, the split based on the cscap value would gen- We performed 25 circular queries on the optimized R- erate a strongly biased split; thus in this case the near median tree, each one repeated 4 times. Each query is defined by element is evaluated by means of the granchild subtree ca- a coordinate in RA and DEC together with a radius (of 15 pacity, but without introducing any extra page in the target degrees). Circular queries, on the R-tree, require a particu- index. lar handling. We performed two types of queries. In the first For large datasets, not fitting into main memory, an ex- type, the program converts a circular query into a rectangu- ternal selection algorithm using the secondary storage for lar query: partitioning the data around the median element is neces- sary. Our implementation uses a sampling strategy given by (RA, DEC, radius) −→ (min RA, min DEC), [10] to find a good pivot value and reduce the number of (9) I/O operations; a caching strategy explained in [9]hasbeen (max RA, max DEC) , adopted to partition the data into the secondary storage. When the number of records covered by a subtree fits into where the rectangle sides are tangent to the circular region. main memory, its construction is continued in main mem- This way, the photons obtained by the rectangular query are ory, reducing further the number of I/O operations. a superset of the one obtained by the circular query. The second type of query adds a filtering step to the first one, in which only photons inside the circular region are ac- 3. TESTS ON A PHOTON DATASET cepted. A Particular handling is required for circular queries To test the behavior of this method, we built the index on a intersecting the poles, but none of the 25 queries required it. list of gamma-rays simulated for the GLAST project. In par- The performances obtained are ticular, the optimized R-tree was built on the RA and DEC (i) rectangular query average time: 10.06 seconds, values while the other columns of each photon were consid- (ii) circular query average time: 10.47 seconds, ered as attribute data. (iii) average number of elements retrieved by a rectangular Figure 1 represents the structure of the R-tree built on the query: 1.210.800, first two days of simulated photons (for a total of 1 847 588 (iv) average number of elements retrieved by a circular photons). The background image represents the projection query: 973.239. on the RA-DEC plane of the photon counts. The root node contains only two rectangles (child nodes) splitting the The hierarchical triangular mesh (HTM) [11] is another ac- dataset on the RA value of the median element. For the rect- cess method to index data characterized by a spherical dis- angle on the right, the image shows the partition generated tribution, which is used in several astrophysical experiments, by the second level of the tree instead for the left rectangle like the Sloan Digital Sky Survey (SDSS) [12]andGLAST. the partition at the third level is shown. As one can notice, in To compare its performances with our indexing method, we regions where the flux is higher, the decomposition is finer. used an HTM with 5 levels (the same configuration adopted An Indexing Method for Data Mining in Astrophysics 2517

Figure 2: Adjacency between bounding boxes in the count map example. in the SDSS project) to partition the photon dataset. A level-5 HTM decomposes the sky into 8 192 spherical triangles, each one associated to a different file on disk. The building of the HTM index required 1 hour and 27 minutes. Then we per- formed the same 25 circular queries used to test our R-tree index. Given a circular query, the HTM library returns a list of HTM IDs, each one identifying a spherical triangle inter- secting the query region. The performances obtained are (i) circular query average time: 140.72 seconds, Figure 3: Local maxima obtained by bounding box sorting in the (ii) average number of level-5 triangles intersecting the galactic anticenter. query: 104.

[14]. This algorithm is also incremental, that is, the number 4. NEIGHBORHOOD AND “WEAK” ADJACENCY of nearest neighbors to be retrieved is not known in advance. The structure of the optimized R-tree can help exploring the Differently from space-partitioning data structures (like data and finding regions of interest. For this purpose, other the kd-tree or the HTM) the R-tree has no adjacency relation information can be added to each node: the total number of between its nodes (i.e., usually edges are not shared between data points covered by the node, their mean and variance, their bounding boxes). The adjacency relation is generally and other statistical moments. used in cluster analysis to find connected components. For Data mining techniques include clustering, classification, point data characterized by an isotropic noise or background and density estimation tasks. The application of these tech- distribution, we define a weak adjacency between the R-tree niques to large datasets involves the execution of multiple bounding boxes as follows. queries. Typical queries used for this tasks are nearest neigh- U = bor or similarity queries and adjacency queries. In particu- Definition 1 (weak adjacency). Two bounding boxes S T V = S T lar, for cluster analysis or density estimation, it can be useful ( , )and ( , ) are weakly adjacent if there exists k ∈{ ... n} to define neighborhood or adjacency relations not only be- 1, , such that   tween data objects but also between the internal nodes of the (i) ¬(si ≥ ti ∨ ti ≤ si )forall1≤ i ≤ n, i = k; ffi   optimized R-tree storing su cient statistics. We use the defi- (ii) there does not exist Z = (S , T ) such that Z and U P   nition of minimum distance between a point and a bound- satisfy (i) and Z and V satisfy (i) and (tk ≤ sk ≤ sk ∨ R   ing box given by Roussopoulos et al. in [13]: tk ≤ sk ≤ sk).

n In case of a regular grid in two dimensions, the above P R = p − r 2 MINDIST( , ) i i , (10) definition is equivalent to the 4-connectivity. Given an R- i= 1 tree bounding box, the algorithm to find all its weakly ad- jacent bounding boxes is based on the incremental nearest- where neighbor algorithm. Figure 2 shows an example of weakly ad-  jacent bounding boxes found with this method. si if pi t i  i if i i, (11) n V  elements covered by a node and the volume of its bound- pi otherwise, ing box approximates the data density in that region. We use this information to find local maxima into the dataset. Given which corresponds to the distance from the point to the near- a rectangular (n-dimensional) region Q and a level l of the est edge of the bounding box. Given a bounding box, its tree (chosen on the basis of the node resolution), the bound- nearest neighbors are found by means of the mindist from ing boxes at level l overlapping Q are sorted in decreasing its barycenter. An optimal algorithm, visiting only the nodes order of the n/V value. In Figure 3, the partition of the sim- necessary for obtaining the nearest neighbors, is designed in ulated photons in the galactic anticenter is shown: the first 2518 EURASIP Journal on Applied Signal Processing

90 bounding boxes in the ordering are filled, highlighting 4 where the αi are the Lagrange multipliers. This is a quadratic densest areas which correspond to point sources in that re- programming problem, solved by standard optimization gion. techniques. After solving the dual problem, the support of the distribution is given by 5. A STRATEGY FOR THE DETECTION OF POINT SOURCES f (x) = sgn αik xi, x − ρ . (15) i One of the major tasks, in the analysis of the data gathered by X-ray or gamma-ray detectors working in survey mode, 5.2. Scaling one-class method with is to distinguish point sources from diffuse background or the optimized R-tree extended sources. Point sources are mostly characterized by To scale the one-class method to a large dataset, the idea is a stronger flux, with respect to the surrounding, focused on to partition the data into pairwise disjoint convex subsets a small angular region. The area covered by a point source and use, in the one-class training, only one representative depends also on the instrument point spread function. for each subset. An analogous method has been applied for An optimized R-tree index can be built on a dataset in- classification tasks with the support vector machines (SVMs) cluding photons gathered in a certain range of time (we are [17]. Substituting the data in a subset X with a representa- using, for the analysis, a minimum interval of 6 days). To find 0 tive is equivalent to adding the following constraint to the static or strong variable sources (e.g., gamma-ray bursts), dual problem: only a bidimensional indexing on the RA and DEC values is needed. α α = 0 ∀i = ... In the following, we propose a point source detection 0,i , 1, , 0, (16) 0 algorithm based on kernel methods [15], and in particular on the one-class SVM [16]. Standard kernel methods have where 0 =|X0|. Using the Gaussian kernel k = exp(−λ x − memory and computational requirements that make them z 2), it can be shown that the best representative we can impractical for large datasets. In this work we show how to choose is the value, in the input space, satisfying speed-up the training process by reducing the number of training data with the partitioning generated by our opti- 2 0 mized R-tree. 1 min φ(x) − φ x0,i x 0 i=1 (17) 5.1. One-class SVM 0 2 2 The one-class SVM algorithm estimates the support of a = min 2 − exp − λ x − x0,i , x multidimensional distribution, that is, a binary function 0 i=1 such that most of the data will live in the region where the = / 0 function is nonzero. Given a dataset X ={x1, ..., x},its that is, x (1 0) i=1 x0,i. The partitioning we adopt is the strategy is to implicitly map the data into a high-dimensional one generated by one level of the optimized R-tree. We re- feature space F using a kernel function, that is, a function k duce the elements covered by a node of the partition to their such that mean value and train the one-class algorithm on such repre- sentatives. With respect to the standard one-class method, an k xi, xj = φ xi , φ xj , (12) approximate solution is found with a speedup that can be of two orders of magnitudes (depending on the level of detail in where φ is the mapping from X to the (inner product) feature the partitioning). space F. Then, in F, it separates the data from the origin with maximum margin solving the following problem: 5.3. Tests on the anticenter region Instead of trying to detect directly the point sources, we use 1 1 min w 2 − ρ + ξi s.t. w · φ xi ≥ ρ − ξi, the accelerated one-class method to estimate the support of ρ ξ ν w, , 2 i (13) the background and the diffuse emission distribution. Being ξi ≥ 0, i = 1, ..., , able to estimate such distribution, point sources are detected as outliers of the support found. where ν ∈ (0, 1] is a parameter of the problem. Constructing Our approach is to associate to each photon’s coordinates the Lagrangian and setting the derivatives to zero, we obtain the density of its surrounding area. We use the partition gen- the dual problem erated by indexing the data with the optimized R-tree to ac- celerate the training phase and get an approximate solution. 1 Moreover, at a certain level of the R-tree, the decomposition αiαj k i j αi = minα x , x s.t. 1, is finer in the areas with higher density. Hence, we use the ra- 2 i j i (14) tio between the volume of a bounding box and the number 1 of photons it covers to approximate the density associated to 0 ≤ αi ≤ , i = 1, ..., , ν each photon in that node. An Indexing Method for Data Mining in Astrophysics 2519

(a) (b)

200 180 160 140 120 100

Density 80 60 40 20 0 Outliers (d) (c)

Figure 4: Point sources detection applying one-class SVM to the partition generated by the optimized R-tree. (a) A counts map of the anticenter region. (b) Outliers with respect to the diffuse emission. (c) Densities of the outliers in increasing order. (d) Result after filtering out the outliers with low density.

Putting together position and local density information 6. CONCLUSIONS generates some redundancy. In fact, in areas where the den- In this work, we have realized a multidimensional indexing sity is higher the mean distance between the photons is method to efficiently access and mine large multidimensional smaller. A solution for removing redundancy in a dataset is to astrophysical data. The index is based on a static version of perform the principal component analysis (PCA) [18], which the R-tree data structure, the VAMSRtree. We have fixed the gives, in a dataset, the directions of maximum variance. Gen- original algorithm and adapted it to very large dataset, for erally, only a subset of the eigenvectors are kept, that is, the which the partial sort cannot be performed in main memory. ones corresponding to the directions capturing most of the We have adopted an efficient incremental nearest-neighbor variance. Hence, the data are first projected into the subspace algorithm and defined a weak adjacency relation between found: the R-tree nodes. These algorithms, together with the op- T timized R-tree structure, allow to efficiently query the data x˜i = Uk xi, i = 1, ..., , (18) (with point and n-dimensional range queries) and perform where k is the number of eigenvectors used. data mining tasks like clustering and density estimation. A Totest this method we have applied it again to the GLAST fast novelty detection algorithm, based on the one-class SVM photon dataset and in particular to the anticenter region (in- method, has been shown. We have used, as a running exam- cluding 25 890 photons). We have used the partition gener- ple, photon data gathered from a simulation for the Gamma- ated with the last level of the R-tree. The parameter values ray Large Area Space Telescope (GLAST). adopted are ν = 0.14 and λ = 0.0003 (the width of the Gaus- sian kernel). The training has required 0.44 second. REFERENCES The outliers detected are shown on Figure 4b.Apartfrom [1] R. J. Brunner, S. G. Djorgovski, T. A. Prince, and A. S. Sza- the four stronger sources, also areas with low density are lay, “Massive datasets in astronomy,” in Invited Review for the highlighted as outliers. This is due to the one-class method Handbook of Massive Datasets,J.Abello,P.Pardalos,andM. itself: it finds the most dissimilar objects on the boundary of Resende, Eds., pp. 931–979, Kluwer Academic, Norwell, Mass, the decision function. In this particular task, the most dis- USA, 2002. similar elements are the areas with a very high density with [2] GLAST—Gamma-ray Large Area Space Telescope, http:// glast.gsfc.nasa.gov/. respect to the surrounding and the areas with very low den- [3] Swift mission, http://www.nasa.gov/mission pages/swift/ sity. This can be seen also by plotting the histogram of the main/index.html. density values (Figure 5.2) for the outliers. Hence, a sim- [4] V. Gaede and O. Gunther, “Multidimensional access meth- pler task remains to filter out the areas with low density (see ods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170–231, Figure 5.2). 1998. 2520 EURASIP Journal on Applied Signal Processing

[5] A. Guttman, “R-trees: a dynamic index structure for spatial Alessandro De Angelis, a Professor of experimental physics at the searching,” in Proc. ACM SIGMOD International Conference University of Udine and at the IST of Lisboa, chairs the M.S. pro- on Management of Data, pp. 47–57, Boston, Mass, USA, June gram in computational physics in Udine, and is a Member of INFN, 1984. CIFS, SIF. After classical high school, he graduated (cum laude) in [6] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The physics in Padova in 1983, worked in the Group of Marcello Cresti, R∗-tree: an efficient and robust access method for points and and was a Technical Officer at the Terrestrial Weapons headquar- rectangles,” in Proc. ACM SIGMOD International Conference ters, Rome, in 1983/1984. From 1984 to 1992, he studied properties on Management of Data, pp. 322–331, Atlantic City, NJ, USA, of charmed particles with bubble-chamber detectors and worked in May 1990. Padova and Udine on the preparation of the DELPHI experiment [7] L. Arge, K. Hinrichs, J. Vahrenhold, and J. S. Vitter, “Effi- at the CERN LEP electron-positron collider. From 1993 to 1999 at cient bulk operations on dynamic R-trees,” in Proc. 1st Inter- CERN, Geneva, he worked in the Group of Ugo Amaldi, as a Re- national Workshop on Algorithm Engineering and Experimen- search Associate and Staff Member, coordinating the data analysis tation (ALENEX ’99), vol. 1619 of Lecture Notes In Computer software of DELPHI and the QCD Group, and was responsible for Science, pp. 328–348, Springer, Baltimore, Md, USA, January the software for the INFN project on artificial NN. Back to Italy in 1999. 1999, he founded in Udine a group on astroparticle physics, work- [8] D. A. White and R. Jain, “Similarity indexing: algorithms and ing on GLAST and MAGIC (detection of high-energy gamma rays performance,” in Storage and Retrieval for Still Image and with satellite and ground based, respectively), and giving a primary Video Databases IV, vol. 2670 of Proceedings of SPIE,San contribution to simulation, event display, and data acquisition. He Diego, Calif, USA, pp. 62–73, 1996. is the author of more than 400 publications, referee for leading sci- [9] C. Bohm¨ and H.-P. Kriegel, “Efficient bulk loading of large entific journals, and organizer of several conferences in the field of high-dimensional indexes,” in Proc. 1st International Confer- astroparticle physics. ence on Data Warehousing and Knowledge Discovery (DaWaK ’99), vol. 31, pp. 251–260, Florence, Italy, August–September Vito Roberto was born in 1951, got his “Laurea” degree in physics 1999. in 1973 and is now a Full Professor in computer science. Since 1979, [10] C. Mart´ınez and S. Roura, “Optimal sampling strategies he has been working at the University of Udine, Italy, where he in quicksort and quickselect,” SIAMJournalonComputing, founded the Machine Vision Laboratory (http://mvl.dimi.uniud.it) vol. 31, no. 3, pp. 683–705, 2001. and is now the Director of the Department of Mathematics and [11] P. Z. Kunszt, A. S. Szalay, and A. R. Thakar, “The hierarchical Computer Science. His main research interests are pattern recogni- triangular mesh,” in Proc. of the MPA/ESO/MPE Workshop in Mining the Sky,A.J.Bandy,S.Zaroubi,andM.Bartelmann, tion systems and network-based systems. Professor Roberto is the Eds., pp. 631–637, Garching, Germany, 2001. author of a book and over 80 scientific publications. He is a Mem- [12] Sloan Digital Sky Survey, http://www.sdss.org/. ber of the American Physical Society (APS) and the International Association of Pattern Recognition (IAPR). [13] N. Roussopoulos, S. Kelly, and F. Vincent, “Nearest neighbor queries,” in Proc. ACM-SIGMOD International Conference on Management of Data, pp. 71–79, San Jose, Calif, USA, May 1995. [14] G. R. Hjaltason and H. Samet, “Distance browsing in spatial databases,” ACM Transactions on Database Systems, vol. 24, no. 2, pp. 265–318, 1999. [15] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge, UK, 2004. [16] B. Scholkopf,¨ J. C. Platt, J. Shawe-Taylor, and A. J. Smola, “Estimating the support of a high-dimensional distribution,” Neural Computation, vol. 13, no. 7, pp. 1443–1471, 2001. [17] D. Boley and D. Cao, “Training support vector machine using adaptive clustering,” in Proc. 4th SIAM International Confer- ence on Data Mining (SIAM DM ’04), Lake Buena Vista, Fla, USA, April 2004. [18] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical Learning, Springer, New York, NY, USA, 2001.

Marco Frailis graduated in computer science in Udine in 2001. In November 2001, he started his Ph.D. degree at the Depart- ment of Mathematics and Computer Science, the University of Udine. His research work was carried out in collaboration with theDepartmentofPhysicsandwithintheGLASTproject(NASA). He received his Ph.D. degree in computer science in 2005 with a thesis entitled “Data Management and Mining in Astrophysical Databases” and he has a Postdoc position at the Department of Physics. His main research interests are pattern recognition and data mining techniques. EURASIP Journal on Applied Signal Processing 2005:15, 2521–2535 c 2005 Cinzia Lastri et al.

Virtually Lossless Compression of Astrophysical Images

Cinzia Lastri Institute of Applied Physics “Nello Carrara,” Italian National Research Council (IFAC-CNR), 64 via Panciatichi, 50127 Florence, Italy Email: [email protected] Bruno Aiazzi Institute of Applied Physics “Nello Carrara,” Italian National Research Council (IFAC-CNR), 64 via Panciatichi, 50127 Florence, Italy Email: [email protected] Luciano Alparone Department of Electronics and Telecommunications (DET), University of Florence, 3 via di Santa Marta, 50139 Florence, Italy Email: [email protected]fi.it Stefano Baronti Institute of Applied Physics “Nello Carrara,” Italian National Research Council (IFAC-CNR), 64 via Panciatichi, 50127 Florence, Italy Email: [email protected]

Received 10 June 2004; Revised 31 January 2005

We describe an image compression strategy potentially capable of preserving the scientific quality of astrophysical data, simultane- ously allowing a consistent bandwidth reduction to be achieved. Unlike strictly lossless techniques, by which moderate compression ratios are attainable, and conventional lossy techniques, in which the mean square error of the decoded data is globally controlled by users, near-lossless methods are capable of locally constraining the maximum absolute error, based on user’s requirements. An advanced lossless/near-lossless differential pulse code modulation (DPCM) scheme, recently introduced by the authors and relying on a causal spatial prediction, is adjusted to the specific characteristics of astrophysical image data (high radiometric reso- lution, generally low noise, etc.). The background noise is preliminarily estimated to drive the quantization stage for high quality, which is the primary concern in most of astrophysical applications. Extensive experimental results of lossless, near-lossless, and lossy compression of astrophysical images acquired by the Hubble space telescope show the advantages of the proposed method compared to standard techniques like JPEG-LS and JPEG2000. Eventually, the rationale of virtually lossless compression, that is, a noise-adjusted lossles/near-lossless compression, is highlighted and found to be in accordance with concepts well established for the astronomers’ community. Keywords and phrases: astrophysical images, differential pulse code modulation, lossless compression, near-lossless compression, noise estimation, statistical context modeling.

1. INTRODUCTION arrays, such as charge-coupled devices (CCDs). The size of digital arrays is also increasing, pushed by astronomical re- The volume of astrophysical data that is acquired and ex- search’s demands for more data in less time. Compression of changed among users, either scientists or not, is rapidly in- such images can reduce the volume of data that it is neces- creasing. This is partly owing to large digitized sky surveys sary to store (a concern for large-scale sky surveys) and can in the visible and near-infrared spectral intervals. These sur- shorten the time required to transmit images. The latter is- veys are made possible by the development of digital imaging sue is useful for remote observing of or remote access to data archives [1]. This is an open access article distributed under the Creative Commons Astronomical images have some rather unusual charac- Attribution License, which permits unrestricted use, distribution, and teristics that make many existing image compression tech- reproduction in any medium, provided the original work is properly cited. niques ineffective [2]. A typical image consists of a nearly flat 2522 EURASIP Journal on Applied Signal Processing background sprinkled with point sources and occasional ex- but to small blocks only, in which the assumption of sta- tended sources. Depending on acquisition bandwidths and tionarity approximately holds [6]. Hence, it fails in exploit- exposure times, images may be more or less noisy; in the for- ing long-range correlation and can effectively remove only mer case, lossless compression is ineffective for transmission short-range correlation. The dc component that is encoded bandwidth reduction because the coding bit rate is lower- stand-alone (e.g., by spatial DPCM) is a fundamental draw- bounded by the entropy of the noise [3]. Furthermore, the back of first-generation transform coders [6]. The new stan- images are usually subjected to stringent quantitative analy- dard JPEG2000 proposed by the Joint Photographic Experts sis, so any lossy compression method must be proven not to Group [8] was devised to overcome such limitations, thereby discard useful information, but should in principle discard leading to substantial benefits. The use of critically decimated only the noise [4]. decompositions, like Mallat’s octave wavelet pyramid [7], is Data compression methods can be classified as either motivated by a twofold requirement: lack of redundancy, the reversible, that is, lossless,orirreversible,lossy, depending on reason for which undecimated decompositions, like the “a` whether the original data may be exactly reconstructed from trous” wavelet transform, widely used in astrophysical im- the compressed data, or the decompressed data is not exactly age processing [9], are little suitable for data compression; the same as the original, because some distortion has been in- orthogonality, thanks to which the variance of quantization troduced by compression. Astronomers often insist that they errors in the transformed domain is preserved when the data can accept only lossless compression, in part because of con- is transformed back to the spatial domain.1 Thus, the mean servatism, and in part because the familiar lossy compression square error (MSE) can be easily controlled through the step methods sacrifice some information that is needed for accu- sizes of quantizers. However, quantization errors in the trans- rate analysis of image data. In fact, for an astronomer a scien- formed domain, which are likely to be uniformly distributed, tific frame is not simply a scene to be reproduced with a more at least if the step size is not greater than the RMS of the or less high fidelity,buta2Dmeasure of a scalar field repre- data, and are upper bounded in modulus by half of the step senting fluxes. Then, as for any other measure, random and size, are spread by the inverse transformation and may yield systematic errors must be carefully assessed, quantified, and heavy-tailed distributions, whose maximum absolute ampli- kept under strict control [5]. A common practice to achieve tude cannot be generally known a priori. Therefore, lossy this goal, given the root mean square (RMS) of the noise in- transform-based encoders are unable to control the distor- troduced by the analog instrument, is that the step size of the tion but in the MSE sense, which means that in the lossy case uniform threshold quantizer (UTQ) is chosen accordingly, relevant image features may be locally distorted by an un- based on application requirements; for example, target de- quantifiable extent. tection, and the outcome quantization levels are transmitted Let {g(i, j)},with0 ≤ g(i, j) ≤ g fs, g fs being the without further loss. However, since all astronomical images full scale, denote an integer-valued N-pixel digital image contain noise, which is inherently incompressible, lossy com- and {g(i, j)} its distorted version, integer valued as well, pression methods may produce much better compression re- achieved by compressing {g(i, j)} and decoding the outcome sults and are thus worth being investigated, provided that a bit stream. All values are intended to be expressed either in deep quantitative analysis of the impact of information loss an unspecified unit, or simply in digital counts. Widely used on the scientific products expected from the observation is distortion measurements are MSE, or squared L2 distance be- 2 preliminarily carried out. tween original and distorted image (L2), The classical image compression scheme consists of a     decorrelator, followed by a quantizer and an entropy cod- = 1 −  2 ing stage [6]. The decorrelator has the purpose of removing MSE g(i, j) g(i, j) ;(1) N i j spatial redundancy; hence it must be tailored to the specific characteristics of the data to be compressed. Examples are or- thogonal transforms, for example, discrete cosine transform maximum absolute distortion (MAD), or peak error,orL∞- (DCT) [6] and Mallat’s discrete wavelet transform (DWT) distance between original and distorted image, ff   [7], and di erential pulse code modulation (DPCM) [6]. The   quantizer introduces a distortion to allow a decrement in en- MAD = max g(i, j) − g(i, j) ;(2) i,j tropy rate to be achieved. Once an image has been decorre- lated and possibly quantized, it is necessary to find a compact peak signal-to-noise ratio (PSNR), representation of its coefficients, which may be sparse. Thus, an entropy coding algorithm maps such coefficients into 2 codewords, aiming at minimizing the average code length. g fs PSNR = 10 log (3) Decorrelation is crucial for compression. DWT calcu- (dB) 10 MSE + 1/12 lated on the whole image (full-frame DWT) allows long- range correlation to be effectively removed, unlike DCT, 1 in which full-frame processing leads to a spread of energy If the transformation is not orthogonal, like the biorthogonal wavelet transform used by JPEG2000, MSE distortions coming from quantized sub- in the transformed plane, because DCT is not suitable for bands must be multiplied by constant coefficients depending on filters syn- the analysis of nonstationary signals. Also computational is- thesizing each subband, before being summed together to yield the total dis- sues make DCT usually applied not to the whole frame, tortion. Virtually Lossless Compression of Astrophysical Images 2523 in which the MSE at denominator is incremented by the vari- 2. LOSSLESS/NEAR-LOSSLESS IMAGE COMPRESSION ance of the integer roundoff error, to handle the limit lossless case, when MSE = 0. Thus, PSNR will be upper bounded by 2.1. Adaptive prediction · 2 10 log10(12 g fs), in the lossless case, to indicate that the sig- Differential pulse code modulation (DPCM) schemes are in- nal detected by the sensor has been quantized before being deed the sole algorithms suitable for lossless/near-lossless reversibly compressed. image compression, or more exactly for L∞-constrained Noteworthy are those lossy methods that allow to settle compression. DPCM basically consists of a decorrelation fol- “a priori” the maximum reconstruction error, not only on lowed by entropy coding of the outcome residues, given as the whole, that is, globally, but also locally, that is, at each differences between true and estimated pixel values. If esti- pixel location. Control of the maximum value of the absolute mation of the current sample is carried out from past sam- error, that is, of MAD, is capable to ensure constant quality ples, according to the image scan fashion, DPCM is said to throughout the reconstructed image. If the L∞-error is con- be spatially causal and the estimation is a prediction, that strained to be not greater than a user-defined value, the cur- is, an extrapolation, driven by the previous samples. Con- rent definition of near-lossless compression, established for versely, estimation may be carried out hierarchically, that the medical community [10], applies. is, by increasing resolution: a low-resolution coarse image The evaluation of the maximum allowable distortion is version is interpolated to a finer scale and differences be- an open problem. In astrophysical applications, the data ac- tween true and interpolated samples are progressively en- quired from the instrument, after being preliminarily pro- coded. In this way DPCM is said to be spatially noncausal, cessed (preprocessed), for example, reduced and corrected or interpolation-based, and the outcome decoded bit stream for acquisition distortions, and calibrated, is usually postpro- resembles a pyramid [12], whose basis is the decompressed cessed to extract information that may not be immediately image. Both the causal and noncausal DPCM schemes may available by visual inspection. Under this perspective, an at- be L∞-constrained. However, the former is not redundant, tractive facility of near-lossless compression methods is that, the number of residues being identical to that of image pix- if the user-defined L∞-error is properly related to the RMS els, whereas the latter is redundant. Therefore, causal DPCM value of the background noise (assumed to be additive and performs better than noncausal DPCM for medium/high signal-independent) the decompressed image, even though bit rates, that is, close-to-lossless compression. Noncausal not identical to the original, may be virtually lossless [11]. DPCM, which has the attractive characteristic of progres- Originally introduced for remote-sensing data compression, sive decoding, is preferable for low bit rates, where its per- this term indicates not only that the decoded image is visu- formance plots lie in the middle between those of JPEG and ally indistinguishable from the original, but also that pos- of JPEG2000. sible outcomes of postprocessing (e.g., features extraction, Figure 1 outlines the flowcharts of causal DPCM encoder target detection, data modeling, classification, etc.) are sub- and decoder, featuring context modeling for entropy coding, stantially unchanged from those calculated from the original which will be described in Section 2.2 . For the sake of clar- data. Thus, the drawback of compression will be a small and ity, notation is one-dimensional. The difference e(n)between predictable increment in the equivalent sensor’s noisiness. the current sample g(n) and its estimation g(n) is quantized To conclude this section, we wish to recall that the in- by the block labeled with Q to yield the quantized predic- troduction of data compression can alleviate bandwidth re- tion error e∆(n), which is sent to the entropy coder (featur- quirements at the price of a computational effort for en- ing context modeling in the example), which outputs the en- coding (images can be extremely large in size and process- coded prediction error ε(n) and the array of data-dependent ing power is generally limited on spaceborne platforms) and context thresholds Θ, as side information. At the same time, − decoding, as well as of a possible loss of quality. The goal e∆(n)isinverselyquantized(Q 1) to the reconstructed pre- of this paper is investigating state-of-the-art and advanced diction error e(n), which is added to the output of predictor compression algorithms from the viewpoint of their poten- to yield the reconstructed sample g(n).Thelatterisdelayed tial suitability to preserve the scientific quality of astrophys- by one sample, for the causality constraint, which states that ical imagery. To this purpose, a statistical analysis of the the predicted value g(n)maynotdependong(n), but only compression-induced distortion, when compression is lossy, on g(n − 1), g(n − 2), and so on. The quantization noise will be carried out. feedback loop at the encoder allows the L∞-error to be con- The remainder of this paper is organized as follows. strained, by forcing prediction at the encoder to be carried Section 2 briefly reviews the theoretic fundamentals of differ- out from the same distorted samples that will be available ential pulse code modulation (DPCM) and state-of-the-art at the decoder, where an identical predictor is placed. In a data compression methods. Section 3 describes an advanced lossless implementation, g(n) is integer valued, the output of DPCM encoder, recently introduced by the authors, whose predictor is rounded to integer as well, the quantizer block is characteristics of adaptivity make it suitable for astrophys- missing together with the feedback loop, and the predictor is ical image compression. Section 4 reports extensive coding straightforwardly fed by the delayed sequence of g(n). results on a large set of astrophysical images in a comparison The simplest way to design a predictor is to take a lin- with such compression standards as JPEG-LS and JPEG2000. ear or nonlinear combination of the values of pixels lying Concluding remarks are drawn in Section 5. within a causal neighborhood of the current pixel, that is, 2524 EURASIP Journal on Applied Signal Processing

Θ g(n) ∆ Context e(n) e (n) g(n − 1) + Q − arithmetic z 1 − encoder ε(n) g(n) Q−1

e(n) g(n − 1) Predictor g(n) Predictor z−1 + +  e(n) g(n) Context Θ e∆(n) arithmetic Q−1 ε(n) decoder

(a) (b)

Figure 1: Flowchart of DPCM with quantization noise feedback loop at the encoder, suitable for error-bounded near-lossless compression: (a) encoder; (b) decoder. surrounding the current pixel and such that they have been Following a trend established in the literature, first in previously encountered along the image scan path, thereby the medical field [20], then for lossless coding in general representing past pixels. A linear combination, or regression, [21, 22], and recently for near-lossless coding [23, 24], predic- with fixed coefficients usually provides limited decorrelation. tion errors are entropy coded by means of a classified imple- For better performance, the coefficients, whose number rep- mentation of an entropy coder, generally arithmetic [25]or resents the order of prediction, may be calculated so as to Golomb-Rice [26]. For this purpose, they are arranged into a yield minimum MSE (MMSE) over the whole image. Such user-defined number of statistical classes based on the spatial coefficients are constant throughout an image, but change context that can be a measure of magnitude or activity of past from an image to another image. The globally MMSE pre- surrounding pixel values and/or prediction errors. If such diction, however, is optimal only for stationary signals. To classes are statistically discriminated, then the entropy of a overcome this drawback, two variations have been proposed: context-conditioned model of prediction errors will be lower adaptive DPCM (ADPCM) [6], in which the coefficients of than that of a stationary memoryless model of the (decorre- the MMSE predictor are continuously recalculated from the lated) source [27]. incoming new data at each pixel location on a subset of past pixels (the procedure is symmetrical at the decoder, so that 2.3. Review of standards and state-of-the-art methods ff the coefficients need not to be transmitted); classified DPCM Considerable e orts have recently been spent on the de- [13], in which a preliminary training phase is aimed at recog- velopment of lossless and near-lossless image compression nizing some statistical classes of pixels and at calculating an techniques. The first specific standard has been the lossless MMSE predictor optimized for each class. Once such predic- version of JPEG [6], which relies on a set of linear pre- ffi tors are available, the most performing (in the MMSE sense) dictors with fixed coe cients. A new standard, which pro- on a block of pixels may be selected to encode the current vides also near-lossless compression, has been released un- block [14]. Alternatively, predictors may be adaptively com- der the name JPEG-LS [22].Itisbasedonanadaptivenon- bined [15], also based on fuzzy-logic concepts [16], to at- linear prediction, potentially capable to fit contours, and ex- tain an MMSE space-varying prediction. The two strategies ploits statistical context modeling of prediction errors fol- of classified prediction will be referred to as adaptive selec- lowed by Golomb-Rice entropy coding. A similar context- tion/combination of adaptive predictors (ASAP/ACAP). based algorithm named CALIC has also been recently pro- Eventually, we wish to remind the reader that a forerun- posed [21]. The simple adaptive predictors used by JPEG-LS ner of the ACAP paradigm is the fuzzy 3D DPCM developed and CALIC, however, the median adaptive predictor (MAP) by some of the authors for lossless compression of multispec- and the gradient adjusted predictor (GAP), have been em- tral and hyperspectral remotely sensed images [17]. In this pirically tailored to the average characteristics of gray-scale case, the prototype MMSE spatial/spectral linear predictors images. Thorough comparisons with methods following the constituting the dictionary were calculated on clustered data, ASAP and ACAP paradigms [14, 16] have revealed that an idea successfully developed in later works [18]. their performance is limited and still far from the entropy bounds. In fact, the original 2D encoder following the ACAP 2.2. Context modeling paradigm [16] achieves lossless compression ratios 5% bet- A notable feature of all advanced image compression meth- ter than CALIC and 10% than JPEG-LS, on average. Al- ods [19] is statistical context modeling for entropy coding. though the 2D ASAP encoder [14] is slightly less perform- The underlying rationale is that prediction errors should be ing than the former, its feature of real-time decoding is similar to stationary white noise as much as possible. As a highly valuable in application contexts, since an image is matter of fact, they are still spatially correlated to a certain usually encoded only once, but decoded many times. Fur- extent and especially are non-stationary, which means that thermore, the crisp algorithm takes more advantage than they exhibit space-varying variance. The better the predic- the fuzzy one from a low noisiness of the data to com- tion, however, the more noise-like prediction errors will be. press. Virtually Lossless Compression of Astrophysical Images 2525

g(n) Regression matrix Φ

Quantized Context Relaxation FCM Block prediction thresholds Compute block Context- Θ clustering errors MMSE labeling predictor based block of Φ and arithmetic 0 & predictors e∆(n) ε(n) predictors predictors quantizer encoder refinement Initialization Block labels matrix Λ

Figure 2: Flowchart of the relaxation-labeled prediction encoder (RLPE). The box marked as “block predictor and quantizer” includes a quantization noise feedback loop.

Eventually, the JPEG2000 image coding standard [8] least squares (LS) algorithm. Thus, a large number of predic- incorporates a lossless mode, based on reversible integer tors, each optimized for a single block, is produced. wavelets, and is capable of providing a scalable bit stream The S coefficients of each predictor are arranged into that can be decoded from the lossy up to the lossless an S-dimensional space. Since the coefficients of any predic- level. The possibility of defining regions of interest (ROI) is tor sum to one, all predictors lie on the hyperplane passing another facility of JPEG2000: for example, compression can through the unit vectors of the coordinate axes. It can be no- be lossless inside ROIs and lossy elsewhere. However, de- ticed that statistically similar blocks exhibit similar predic- spite its advanced and powerful facilities, JPEG2000 is an L2- tors. Thus, the MMSE predictors calculated for each block constrained encoder and thus not capable of providing near- are clustered on the hyperplane, instead of being spread. lossless compression, except for the limit lossless case. Auser-providednumberM of representative predictors are identified by the fuzzy-C-means (FCM) clustering algo- rithm [28]. Such dominant predictors are calculated as cen- 3. RELAXATION-LABELED PREDICTION ENCODER troids of as many clusters in the predictors space, according (0) (0) The DPCM encoder proposed for astrophysical image com- to a Euclidean metrics. Thus, an S × M matrix Φ ={φm , pression [14] follows the ASAP paradigm, being based on m = 1, ..., M} containing the coefficients of the M predic- a classified linear-regression prediction, with context-based tors is produced. The superscript (0) highlights that such arithmetic coding of prediction errors. The image is parti- predictors are start-up values of an iterative refinement pro- tioned into blocks, typically 8 × 8, and an MMSE linear pre- cedure. dictor is calculated for each block. Given a prefixed number of classes, a clustering algorithm produces an initial guess 3.2. Relaxation labeling and predictors refinement of as many classified predictors that are delivered to an it- erative labeling procedure, which classifies pixel blocks si- Once M predictors have been found out through fuzzy clus- multaneously refining the associated predictors. In order to tering, they are used to initialize an iterative procedure in achieve reduction in bit rate within the constraint of a near- which image blocks are assigned to M classes and an opti- lossless compression, prediction errors are quantized with mized predictor is obtained for each class. odd-valued step sizes, ∆ = 2δ +1,withδ denoting the Step 0. Classify blocks based on their mean square prediction induced L∞-error. Quantized prediction errors are then ar- error (MSPE). The label of the predictor minimizing MSPE ranged into activity classes based on the spatial context, that for a block is assigned to the block itself. This operation has are entropy coded by means of arithmetic coding. Figure 2 the effect of partitioning the set of blocks into M classes that shows the flowchart of the encoder. Besides encoded predic- are best matched by the currently available predictors. tion errors ε(n), the refined predictors are transmitted along with the label of each block and the set of thresholds defining Step 1. Recalculate each of the M predictors from the data the context classes for entropy coding. belonging to the blocks of each class. The new set of predic- tors is thus designed so as to minimize MSPE for the current 3.1. Initialization block partition into M classes. Patterns of pixel values occurring within the causal neighbor- hood of each pixel, also known as prediction support,reflect Step 2. Reclassify blocks: the label of the new predictor min- local image features, for example, edges, textures, and shad- imizing MSPE for a block is assigned to the block itself. This ings. An efficient prediction should be capable of embodying operation has the effect of moving some blocks from one and reflecting such features as much as possible. After pre- class to another, thus repartitioning the set of blocks into M liminarily partitioning the input image into square blocks, new classes that are best matched by the current predictors. for example, 8 × 8, a prediction support of size S (i.e., con- taining S samples) is set, and the S coefficients of an MMSE Step 3. Check convergence; if realized, stop; otherwise, go to linear predictor are calculated for each block by means of a Step 1. 2526 EURASIP Journal on Applied Signal Processing

3.3. Blockwise prediction and quantization (WFPC2) and are available at http://archive.eso.org,courtesy Once all blocks have been classified and labeled, together of the European Southern Observatory (ESO). with the optimized predictors, the image is raster scanned The WFPC2 is a two-dimensional imaging photometer, and the M refined predictors are activated based on the whose field of view (FOV) is located at the center of focal classes of crossed blocks. Thus, each pixel value g(n)belong- plane of the Hubble Space Telescope (HST) and covers the ing to one block of the partition that has been labeled to the spectral range between approximately 1150 A˚ and 10500 A.˚ mth class is predicted by using the mth predictor. Since g(n) The central portion of the f/24 beam coming from the Op- is integer, g(n) is rounded to integer as well and the out- tical Telescope Assembly (OTA) is intercepted by a steer- come integer-valued prediction error, e(n) = g(n) − g(n), able pick-off mirror attached to the WFPC2 and is diverted is uniformly quantized with a step size ∆ as e∆(n) = round through an open port entry into the instrument. The beam [e(n)/∆] and delivered to the context-coding section. then passes through a shutter and interposable filters. A to- The operation of inverse quantization e(n) = e∆(n) · ∆ tal of 48 spectral elements and polarizers are contained in introduces an error, whose variance is approximately (∆2 − an assembly of 12 filter wheels. The light then falls onto a 1)/12 (provided that ∆ is lower than the RMS value of e(n)) shallow-angle, four-faceted pyramid, located at the aberrated and whose maximum absolute value is L∞ =∆/2. Since OTA focus. Each face of the pyramid is a concave spherical MSE is a quadratic function of ∆, odd-valued step sizes yield surface, dividing the OTA image of the sky into four parts. lower L∞ than even sizes do. The step size ∆ is set identical After leaving the pyramid, each quarter of the full field of for all blocks, both to minimize L∞ and to avoid blocking view is relayed by an optically flat mirror to a Cassegrain re- artifacts in reconstructed images. lay that forms a second field image on a charge-coupled de- vice (CCD) of 800×800 pixels. Each of these four detectors is 3.4. Context-based arithmetic coding housed in a cell sealed by an MgF2 window, which is figured Prediction errors are classified into a predefined number of to serve as a field flattener. The optics of three of the four statistically homogeneous classes based on the spatial con- cameras —the Wide Field Cameras (WF2, WF3, WF4)— are text. A context function is defined and measured on predic- essentially identical and produce a final focal ratio of f/12.9. tion errors lying within a circular neighborhood of the cur- The fourth camera, known as the Planetary Camera (PC or rent pixel, possibly larger than the prediction support, as the PC1), has a focal ratio of f/28.3. The WFPC2 simultaneously RMS value of prediction errors (RMSPE). Again, causality of images a 150×150 L-shaped region with a spatial sampling neighborhood is necessary in order to make the same infor- of 0.1 per pixel, and a smaller 34 × 34 square field with mation available both at the encoder and at the decoder. At 0.046 per pixel. Figure 3 shows the field of view of WFPC2 the former, the probability density function (PDF) of RM- projected onto the sky. The four operational configurations SPE is calculated and partitioned into a number of intervals of WFPC2 are described in Table 1. chosen as equally populated to yield equiprobable contexts. The total system quantum efficiency (WFPC2+HST) This choice is motivated by the subsequent use for residues ranges from 5% to 13% at visual wavelengths, and drops to belonging to each class of adaptive arithmetic coding, which ≈ 0.5% in the far UV. Detection of faint targets will be lim- benefits from a number of data in each class large enough ited by either the sky background (for broad filters) or by for training, which happens simultaneously with coding. The noise in the read-out electronics (for narrow and UV filters) residue in each class is split into sign bit and magnitude. with an RMS equivalent to 5 detected photons. Bright targets The former is strictly random and is coded as it stands, can cause saturation (more than 53 000 detected photons per the latter exhibits a reduced variance in each class; thus, pixel), but there are no related safety issues. it may be coded with fewer bits than the original residue. A large test set of images acquired by WFPC2 was used From the PDF of context, L − 1 thresholds Θ ={θl ∈ R, for lossless compression experiments. The subjects are Glob- l = 1, ..., L − 1}, that define the decision intervals of each ular Cluster M30 (NGC7099), Irregular Galaxy Small Magel- class, are calculated. Θ,aswellasΦ, is stored in the file header lanic Cloud (SMC), and Ring (NGC6720). For each as overhead. scene, several observations, differing by spectral filter and ex- It is noteworthy that the context-coding procedure in- posure time, were considered. All the images downloaded troduced by the authors [23] is independent of the partic- from the archive are raw data that have been neither reduced ular method used to decorrelate the data. Unlike most of the nor calibrated, have 12 bit dynamic range, and are packed in schemes, for example, CALIC [21], in which context coding 16 bit words. Units are digital counts, which are converted is embedded in the decorrelation, it can be applied to any into physical measure units once the calibration process is DPCM scheme, either lossless or near-lossless, without ad- accomplished. A subset of images —one for each scene— on justments for the near-lossless case [24], as a patch between which lossy compression experiments have been carried out, decorrelation and entropy coding stages. is shown in Figure 4. Acquisition parameters are summarized in Table 2 and statistics, including the measured noise RMS 4. EXPERIMENTAL RESULTS value, in Table 3. 4.1. Dataset 4.2. Lossless compression performance comparisons All the images used in the following experiments have The methods compared are RLPE with context modeling been acquired by the Wide Field and Planetary Camera 2 (CTX) and arithmetic coding (AC), RLPE without CTX and Virtually Lossless Compression of Astrophysical Images 2527

respect to JPEG-LS, which exploits Golomb coding [31], +U2(−V2) together with a context model optimized to its nonlinear +U3(−V3) prediction. The baseline CCSDS scheme [30] is somewhat Y poorer, notwithstanding all predictors are relatively short X PA V3 (one-to-four-pixel neighborhoods), mainly because a one- dimensional predictor cannot adequately remove an intrin- ◦ N 45.28 sically two-dimensional redundancy. Also, Rice entropy cod- WF2 E ing appears to be far less powerful than arithmetic coding. PC The advantage of the former over the latter for space applica- tion is that, at the time of its standardization, space-qualified hardware was already available for Rice coding, but not WF3 WF4 for arithmetic coding. Eventually, lossless JPEG2000, which is not based on DPCM, yields results somewhat poor on Globular Cluster, being superior to ZOP-Rice only; on SMC To the sun the average performance is identical to that of RLPE+Rice. However, JPEG2000 outperforms RLPE+Rice on Ring Neb- 1arcmin ula and closely approaches the performances of JPEG-LS. By watching Figure 4 the explanation of these trends is eas- ily found. JPEG2000 is penalized with respect to advanced Figure 3: Field of view (FOV) of the Wide Field and Planetary DPCM schemes on a dark background sprinkled by stars be- Camera 2 (WFPC2) projected onto the sky. U2 and U3 axes are cause of its compact-support oscillating functions, the decre- defined by the “nominal” Optical Telescope Assembly (OTA) axis, ment in performances against DPCM algorithms being di- which is near the center of the FOV of WFPC2. rectly related to the density of bright spots. Given the intrinsically multispectral nature of the astro- physical data under concern, joint spectral and spatial decor- Table 1: Operational configurations of the Wide Field and Plane- relation was investigated, by using the 3D version of RLPE tary Camera 2. [11]. It was found that the same strategy of adaptive pre- diction carried out from spectrally adjacent bands, unlike Camera Pixels Field of view Scale f/ratio what happens for conventional remote-sensing data, is not    PC 800 × 800 36 × 36 0.0455 /pixel 28.3 rewarding in terms of compression performances, the aver- WF2,3,4 800 × 800 80 × 80 0.0966/pixel 12.9 age bit rate saving being less than one hundredth of bpp. This is not surprising, since astronomical bands, even if ad- jacent, are defined in order to select different physical emis- with AC, RLPE with CCSDS-Rice coding [29] (including its sion mechanisms, with the consequence that images may be own context model), JPEG-LS, lossless JPEG2000, and plain somewhat different. zero-order prediction with Rice coding (ZOP-Rice), stan- Computationally speaking, ZOP-Rice is obviously the dardized as baseline space encoder [30], though a more so- fastest scheme, closely followed by JPEG-LS, RLPE-Rice, phisticated predictor is left as an open concern. RLPE uses plain RLPE without context, and full RLPE (with context and 8 × 8 blocks, 5 predictors (each with 4 coefficients, with val- arithmetic coding). Table 7 reports encoding and decoding ues summing to one) refined with one iteration of relax- times for the three main schemes. Unlike the publicly avail- ation labeling. Context modeling uses nine context classes, able official versions of JPEG-LS and JPEG2000, the code of that is, eight thresholds, and context calculated on a circu- RLPE was written in C++, but was not optimized. Unpub- lar causal neighborhood of radius three. We wish to point lished results of experiments specifically carried out on hy- out that goal of the experimental section is comparing the perspectral data have demonstrated that coding time might lossless, near-lossless, and unconstrained lossy compression be reduced by, say, 4 ÷ 5 times, by optimizing the algorithm modalities, rather than providing a comprehensive compari- flow and the code, as well as by training off line.Asitappears, son among compression algorithms, as it can be found, for a notable feature of RLPE is its processing asymmetry: due example, in [16]. Most of state-of-the-art algorithms, like to training of predictors and block classification, encoding CALIC [21], are not available for image data having more is more onerous than decoding, whose complexity is essen- than 8 bits per pixel, being developed for multimedia images tially dictated by context and arithmetic decoding. This fea- rather than for scientific data. ture may be valuable for remote access to archives, since an Bit rates on disk including overhead and entropy coding image is coded only once (when it is placed in the archive), are reported in Tables 4, 5,and6 for a wide variety of ob- butdecodedasmanytimesasitisretrievedbyusers. servation of the three test scenes. A trend steady intra-table and inter-table shows that RLPE yields the lowest bit rates. 4.3. Near-lossless compression performance Benefits stem from arithmetic coding and especially from comparisons context modeling. The coupling of RLPE with the CCSDS- Two DPCM algorithms having L∞-constrained coding capa- Rice context-based entropy coding is slightly penalized with bility will be compared first. Figure 5 shows performances of 2528 EURASIP Journal on Applied Signal Processing

(a) (b) (c)

Figure 4: Details of size 256×256 taken from the three sample scenes. (a) Globular Cluster M30. (b) Irregular Galaxy SMC. (c) Ring Nebula.

Table 2: Subset of test images used for near-lossless compression experiments.

Name u5fw0106r u5wob405r u531010er Subject Globular Cluster M30 Irregular Galaxy SMC Ring Nebula Acquisition date 31/05/1999 24/05/1999 16/10/1998 Center wavelength 334.4 nm 801.2 nm 501.2nm Bandwidth 37.4 nm 153.9nm 2.7nm Exposure time 200 s 100 s 100 s

Table 3: Statistics of the three astrophysical images used for near-lossless compression experiments; all values are expressed as digital counts (squared, for variance), typical of uncalibrated data.

Name Minimum Maximum Mean Variance Noise σ u5fw0106r 304 4095 324.592 792.830 1.04 u5wob405r 304 4095 325.469 1502.920 1.29 u531010er 303 4095 326.403 517.191 0.80

RLPE+CTX+AC and JPEG-LS, carried out in terms of PSNR [8]— but near-lossless compression is not. The consequence and MAD, on Globular Cluster and SMC.Onbothimages is that MADs larger and larger than those of RLPE are no- RLPE attains a steady gain of about one dB on JPEG-LS, for ticed as the bit rate decreases. The scale on ordinate was bit rates greater than 1 bpp, slightly lower elsewhere. Since shrunk by a factor thirteen with respect to that of Figure 5, both methods are near lossless, errors of decoded values are in order to accommodate the large range of MAD in the likely to be uniformly distributed in (−δ, δ), with the quan- JPEG2000 plot. Besides being near-lossless, RLPE outper- tization step size ∆ = 2δ +1.Hence,MSE= (∆2 − 1)/12 and forms JPEG2000 also in PSNR. For rates higher than 1 bpp, from (3) the relationship between the MAD (δ)andPSNR the PSNR gain of RLPE over JPEG2000 is about 2 dB. Equiv- will be alently, RLPE saves 0.39 bpp in the reversible case, corre- sponding to 83 dB PSNR. As the bit rate decreases, this gain = PSNR(dB) 10 log10 12 + 20 log10 g fs vanishes and the two plots cross each other at approximately (4) ff − 20 log (2δ +1) 0.1bpp.Thise ect is typical of all DPCM schemes and is due 10 to the quantization noise feedback loop at the encoder. if g fs = 4095, (4)becomes The previously noticed error trends also reflect the vi- sual quality of the decompressed images. Figure 7 shows Ring = − ff PSNR(dB) 83 20 log10(2δ +1) (5) Nebula compressed at six di erent bit rates, including the lossless case, by RLPE (with CTX and AC) and JPEG2000. which is in accordance with the plots in Figure 5 ,forbitrates When the rate is high (0.913 bpp for RLPE and 0.912 for greater than 0.5 bpp, that is, as long as quantization errors are JPEG2000), the visual appearance of the two images is quite independent of the data that is quantized. similar, notwithstanding the former exhibits MAD equal to Performance comparisons between RLPE and JPEG2000 one, the latter to 6, and the difference in PSNR is around have been carried out on the Ring Nebula test image and are 2 dB. Both compressed images are hardly distinguishable shown in Figure 6. Unlike JPEG-LS, JPEG2000 is not L∞- from the original (MAD = 0), even though JPEG2000 yields bounded, but L2-bounded, which means that lossless com- a perceivably smoother result. However, as the bit rate per pression is attainable —thanks to short 5/3 wavelet filters pixel decreases, the JPEG2000 versions become smoother Virtually Lossless Compression of Astrophysical Images 2529

Table 4: Bit rates on disk for lossless compression of different observations of the u5fw010 scene (Globular Cluster M30).

Name RLPE+CTX+AC RLPE+AC JPEG-LS RLPE+Rice JPEG2000 ZOP-Rice u5fw0101r 2.56 2.70 2.76 2.93 2.98 3.19 u5fw0102r 2.06 2.09 2.22 2.38 2.42 2.67 u5fw0103r 2.56 2.71 2.76 2.93 2.99 3.19 u5fw0104r 2.63 2.79 2.81 2.98 3.03 3.23 u5fw0105r 2.61 2.76 2.82 2.96 3.01 3.24 u5fw0106r 2.41 2.50 2.63 2.80 2.84 3.04 u5fw0107r 2.55 2.68 2.76 2.93 2.98 3.19 u5fw0108r 2.38 2.47 2.59 2.77 2.82 2.99 u5fw0109r 3.36 3.61 3.49 3.78 2.83 4.00 u5fw010ar 2.00 2.03 2.17 2.33 2.37 2.63 u5fw010br 2.38 2.47 2.59 2.75 2.81 3.00 u5fw010cr 2.38 2.47 2.59 2.77 2.83 3.00 Average 2.49 2.61 2.68 2.86 2.91 3.11

Table 5: Bit rates on disk for lossless compression of observations of u5wob40 scene (SMC).

Name RLPE+CTX+AC RLPE+AC JPEG-LS RLPE+Rice JPEG2000 ZOP-Rice u5wob401r 2.32 2.38 2.51 2.66 2.72 2.93 u5wob402r 3.11 3.27 3.31 3.56 3.52 3.77 u5wob403r 2.09 2.12 2.25 2.42 2.46 2.70 u5wob404r 2.84 2.96 3.10 3.33 3.31 3.52 u5wob405r 2.41 2.48 2.61 2.76 2.82 3.02 u5wob406r 3.12 3.27 3.33 3.55 3.54 3.76 u5wob407r 1.91 1.91 2.04 2.24 2.25 2.53 u5wob408r 2.02 2.03 2.18 2.36 2.39 2.63 u5wob409r 2.77 2.87 3.07 3.28 3.28 3.42 u5wob40ar 2.79 2.88 3.09 3.33 3.30 3.45 u5wob40br 2.80 2.89 3.06 3.30 3.27 3.40 Average 2.56 2.64 2.77 2.98 2.98 3.19

Table 6: Bit rates on disk for lossless compression of observations of u531010 scene (Ring Nebula).

Name RLPE+CTX+AC RLPE+AC JPEG-LS JPEG2000 RLPE+Rice ZOP-Rice u5310109r 2.23 2.27 2.47 2.51 2.64 2.87 u531010am 2.34 2.39 2.63 2.68 2.79 2.99 u531010br 2.36 2.53 2.64 2.67 2.68 2.98 u531010cr 2.35 2.52 2.63 2.67 2.67 2.97 u531010dr 2.35 2.52 2.62 2.65 2.67 2.97 u531010er 2.24 2.36 2.49 2.56 2.57 2.87 u531010fr 2.24 2.36 2.49 2.56 2.56 2.87 u531010gr 2.24 2.36 2.49 2.56 2.57 2.87 Average 2.29 2.41 2.55 2.60 2.64 2.92

Table 7: Computing times (on 1.8GHz Pentium PC) of The grainy appearance of the nebula completely disappears, RLPE+CTX+AC, JPEG-LS, and JPEG2000 for an 800 × 3200, 12 replaced by an artificially uniform smoothness. At the low- bit frame. est bit rate (0.111 bpp), JPEG2000 yields a result definitely Processing RLPE+CTX-AC JPEG-LS JPEG2000 unacceptable: all fine details have been removed and ring- Encoder 25 s 0.05 s 3.6s ing artifacts appear around stars. Conversely, in the near- Decoder 1 s 0.03 s 4 s lossless RLPE-compressed versions, the grainy appearance of the nebula becomes coarser and coarser as the bit rate de- creases. Also at the lowest bit rate (0.111 bpp), even if a strip- and smoother, mainly because MAD increases from 6 to ing distortion markedly appears in the dark background, the 53, since the difference in PSNR vanishes at 0.111 bpp. image has still a certain fidelity to the original. 2530 EURASIP Journal on Applied Signal Processing

84 5 82 4.5 80 4 78 3.5 76 3 74 2.5 MAD

PSNR (dB) 72 2 70 1.5 68 1 66 0.5 64 0 00.511.522.52 00.51 1.522.52 Bit rate (bpp) Bit rate (bpp)

JBEG-LS JBEG-LS RLPE RLPE

(a) (b)

84 5 82 4.5 80 4 78 3.5 76 3 74 2.5 MAD

PSNR (dB) 72 2 70 1.5 68 1 66 0.5 64 0 00.51 1.522.52 00.51 1.522.52 Bit rate (bpp) Bit rate (bpp)

JBEG-LS JBEG-LS RLPE RLPE

(c) (d)

Figure 5: Lossy compression performances of RLPE+CTX+AC and JPEG-LS. Globular Cluster: (a) PSNR versus bit rate, (b) MAD versus bit rate. SMC: (c) PSNR versus bit rate, (d) MAD versus bit rate.

To provide a deeper insight into the difference between Second-order statistics of the compression-induced er- near-lossless and lossy compressions, or better between L∞- rors have been investigated as well. Figure 9 shows the bounded and L2-bounded compressions, the amplitude dis- original Ring Nebula and the pixel map of errors introduced tributions of compression-induced errors have been plotted by RLPE and by JPEG2000 at the same bit rate of 0.111 bpp, in Figure 8 for RLPE and JPEG2000 at high and low bit rates. corresponding to approximately identical 66 dB PSNR. Dis- The distribution of errors introduced by RLPE is practically played errors have been linearly stretched and biased to avoid uniform at high rate, slightly decaying at low rates, because negative values. While the distortion introduced by RLPE is quantization errors are no longer independent of prediction substantially similar to pure noise, especially in the body of errors that are quantized. In both cases, however, their dis- the nebula, the error map produced by JPEG2000 contains tribution has no heavy tails. Instead, JPEG2000 exhibits tails plenty of fine spatial details (including edges of stars) that more pronounced than those of a Gaussian function. Loga- have been destroyed by compression. An analysis of the spa- rithmic scale on the y-axis is used throughout, for displaying tial correlation coefficient (CC) of each error map reveals convenience. that RLPE yields CC equal to 0.19 (average between row Virtually Lossless Compression of Astrophysical Images 2531

65

84 55 82 80 45 78 35 76 MAD 74 25 PSNR (dB) 72 15 70

68 5 66 0 00.51 1.522.52 00.511.522.52 Bit rate (bpp) Bit rate (bpp)

JBEG2000 JBEG2000 RLPE RLPE

(a) (b)

Figure 6: Lossy compression performances of RLPE+CTX+AC and JPEG2000. Ring Nebula: (a) PSNR versus bit rate. (b) MAD versus bit rate. and column directions); conversely, the average CC of the appear as an additional amount of noise, being uncorrelated JPEG2000 error map is 0.28, thereby revealing that what has and having space-invariant first-order statistics such that the been removed by compression is more likely to be a corre- overall probability density function (PDF) of the noise cor- lated signal. rupting the decompressed data, that is, intrinsic noise plus compression-induced noise, closely matches the noise PDF 4.4. Virtually lossless compression of the original data. This requirement is trivially fulfilled The analysis reported has pointed out that quality evaluation if compression is lossless, but may also hold if the differ- of compressed astrophysical images cannot rely on PSNR dis- ence between uncompressed and decompressed data exhibits tortion measurements only. We notice that the wavelet-based a peaked and narrow PDF without tails, as it happens for JPEG2000 algorithm achieves the effect of progressively “de- near-lossless techniques, whenever the user defined MAD noising” the image as the target compression ratio increases. is sufficiently smaller than the standard deviation σn of the This fact is not surprising, since it has been demonstrated background noise. Both MAD and σn are intended to be ex- that suppression of small wavelet coefficients, which hap- pressed either in physical units, for calibrated data, or as dig- pens because of quantization, yields a powerful method for ital counts otherwise. Therefore, noise modeling and estima- image denoising, established also in the field of astrophysi- tion from the uncompressed data becomes a major task to cal image processing [32]. Image denoising may also become accomplish a virtually lossless compression [11]. The under- the key to compression of astronomical images [4], when lying assumption is that the dependence of the noise on the the bottleneck of a very low bit rate imposes a reduction in signal is null, or weak. However, signal independence of the image entropy, selectively obtained by denoising the back- noise may not strictly hold for astronomical images, espe- ground only. However, what may appear as “noise” is likely cially for weak signals, dominated by shot noise. This fur- to be informative to an astrophysicist. Therefore, the data ther uncertainty in the noise model may be encompassed by may become little useful once they have been compressed imposing a margin on the relationship between target MAD by means of an otherwise advanced L2-bounded method like and RMS value of background noise. JPEG2000. In the present case, the noise standard deviation σn of On the contrary, near-lossless methods, like JPEG-LS and the three test images, whose statistics are reported in Table 3, RLPE seem to be more suitable than JPEG2000 for locally was measured by means of the scatterplot-based method de- preserving even subtle objects of variable coarseness. The scribed in [34, 35], and found to be σn = 1.04, σn = 1.29, main reason of that is the quantization noise-shaping effect and σn = 0.80, for Globular Cluster, SMC,andRing Neb- achieved by L∞-bounded image encoders, like those based ula, respectively. Near-lossless compression is crucial for Ring on DPCM. Indeed, noise modeling was found to be the key Nebula,asitvisuallyappearsfromFigure 7. In fact, near- to compression of astrophysical images [33]. lossless compression with MAD = δ = 1 (i.e., quantiza- The term virtually lossless compression, which motivates tion step size√ ∆ = 2δ +1 = 3) would yield an RMS dis- the present paper, is now discussed in greater detail. It indi- tortion  = 2/3 ≈ 0.82, slightly greater than the noise RMS cates that the distortion introduced by compression should value σn = 0.80, which would have the effect of increasing 2532 EURASIP Journal on Applied Signal Processing

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 7: Compressed 256 × 256 details taken from Ring Nebula obtained at the same bit rates per pixel bpp, with exception of lossless versions included for both RLPE and JPEG2000 (JP2K). RLPE for (a) 0.111 bpp, MAD=5; (b) 0.179 bpp, MAD=4; (c) 0.207 bpp, MAD=3; (d) 0.447 bpp, MAD=2; (e) 0.913 bpp, MAD=1; (f) 2.24 bpp, MAD=0. JP2K for (g) 0.111 bpp, MAD=53; (h) 0.179 bpp, MAD 32; (i) 0.207 bpp, MAD =20; (j) 0.447 bpp, MAD =10; (k) 0.912 bpp, MAD =6; (l) 2.63 bpp, MAD =0.

√ by a factor greater than 2, after decompression. Equiva- Figure 8, is convolved with the intrinsic noise PDF, assumed lently, the intrinsic SNR of the uncompressed image would to be tailed, the overall PDF will be approximately unchanged be decremented by 3 dB after compression/decompression. in shape, even if doubled in variance. This behavior explains In this specific case, virtually lossless compression should bet- why some of the RLPE-compressed versions of Ring Nebula ter coincide with lossless compression. Near-lossless com- are more similar to the original than to the corresponding pression of Ring Nebula with MAD equal to one is unable JPEG2000 versions. The reason is that tails in the error PDF to retain the quality of the data, because the compression- may give rise to, or suppress, local “noise” patterns, whose 2 induced MSE is not one order of magnitude lower than σn , presence, or absence, is unlikely to be found in the uncom- as it would be recommended for virtually lossless compres- pressed image. sion. However, when the extremely concentrated error PDF, The rationale of virtually lossless compression can be produced by RLPE when δ = 1 and shown as first entry in summarized by the following protocol. Measure the noise Virtually Lossless Compression of Astrophysical Images 2533

7 7

6 6

5 5

4 4

3 3

(number of occurrences + 1) 2 (number of occurrences + 1) 2 10 10

Log 1 Log 1

0 0 −50 −40 −30 −20 −10 0 10 20 30 40 50 −50 −40 −30 −20 −10 0 10 20 30 40 50 Error amplitude Error amplitude

(a) (b)

7 7

6 6

5 5

4 4

3 3

(number of occurrences + 1) 2 (number of occurrences + 1) 2 10 10

Log 1 Log 1

0 0 −50 −40 −30 −20 −10 0 10 20 30 40 50 −50 −40 −30 −20 −10 0 10 20 30 40 50 Error amplitude Error amplitude

(c) (d)

Figure 8: Reconstruction error distributions for near-lossless/lossy coding of Ring Nebula. RLPE for bit rate of (a) 0.913 bpp, (b) 0.111 bpp. JPEG2000 for bit rate of (c) 0.912 bpp, (d) 0.111 bpp.

(a) (b) (c)

Figure 9: Original 256 × 256 details of Ring Nebula: (a) pixel differences with 0.111 bpp, (b) RLPE decompressed version, and (c) JPEG2000 decompressed version.

RMS, σn;ifσn < 1, lossless compression is mandatory. Other- pression with MAD = 1 is recommended, to avoid wast- wise, if 1 ≤ σn < 3, near-lossless compression with MAD = 1 ing bits encoding the noise. In the general case, the rela- (hence, ∆ = 3) might be attempted. For 3 ≤ σn < 5, com- tionship between MAD and σn, also including a margin of 2534 EURASIP Journal on Applied Signal Processing approximately one dB, is [5] M. Maris, D. Maino, C. Burigana, A. Mennella, M. Bersanelli, ff and F. Pasian, “The e ect of signal digitisation in CMB exper- iments,” Astronomy & Astrophysics, vol. 414, no. 2, pp. 777– σ − 1 MAD = max 0, n . (6) 794, 2004. 2 [6] K.K.RaoandJ.J.Hwang,Techniques and Standards for Image, Video, and Audio Coding, Prentice-Hall, Englewood Cliffs, NJ, This protocol is substantially in accordance with the results USA, 1996. reported by Maris et al. [5]. The main difference is opera- [7] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, tive. In [5] the step size of the UTQ to quantize the ana- San Diego, Calif, USA, 1998. [8] ISO/IEC JTC 1/SC 29/WG1, ISO/IEC FCD 15444-1: Informa- log signal is designed in such a way that compression must tion technology-JPEG 2000 image coding system: Core coding be lossless thereafter. In the present case, the signal may system [WG 1 N 1646], March 2000. have been previously quantized based on different require- [9] J.-L. Starck and F. Murtagh, “Astronomical image and signal ment; afterwards a check on the noise is made to decide processing: looking at noise, information and scale,” IEEE Sig- whether lossless compression is really necessary, or near- nal Processing Mag., vol. 18, no. 2, pp. 30–40, 2001. lossless compression could be used instead without penalty, [10] K. Chen and T. V. Ramabadran, “Near-lossless compression of medical images through entropy-coded DPCM,” IEEE Trans. being de facto virtually lossless. Depending on the applica- Med. Imag., vol. 13, no. 3, pp. 538–548, 1994. tion context and the type of data, the relationship (6)may [11] B. Aiazzi, L. Alparone, and S. Baronti, “Near-lossless compres- also be relaxed, for example, by imposing that the ratio sion of 3-D optical data,” IEEE Trans. Geosci. Remote Sensing, MSE(noise)/MSE(compression) is greater than, say, 3 dB, in- vol. 39, no. 11, pp. 2547–2557, 2001. stead of the 10 ÷ 11 dB, given by (6). [12] B. Aiazzi, L. Alparone, S. Baronti, and F. Lotti, “Lossless im- age compression by quantization feedback in a content-driven enhanced Laplacian pyramid,” IEEE Trans. Image Processing, 5. CONCLUDING REMARKS vol. 6, no. 6, pp. 831–843, 1997. The key to achieve a compression preserving the scientific [13] F. Golchin and K. K. Paliwal, “Classified adaptive prediction quality of the data, for either astrophysical or remote-sensing and entropy coding for lossless coding of images,” in Proc. IEEE International Conference on Image Processing (ICIP ’97), applications, is represented by the following twofold recom- vol. 3, pp. 110–113, Santa Barbara, Calif, USA, October 1997. mendation: (1) absence of tails in the PDF of the error be- [14] B. Aiazzi, L. Alparone, and S. Baronti, “Near-lossless image tween uncompressed√ and decompressed image, in order to compression by relaxation-labelled prediction,” Signal Pro- maximize the ratio MSE/MAD, that is, RMSE/MAD, or cessing, vol. 82, no. 11, pp. 1619–1631, 2002. equivalently to minimize MAD for a given RMSE; (2) MSE [15] G. Dong, H. Ye, and L. W. Cahill, “Adaptive combination lower by one order of magnitude than the variance of back- of linear predictors for lossless image compression,” IEE ground noise σ2. Near-lossless methods are capable of fulfill- Proceedings-Science, Measurement and Technology, vol. 147, n no. 6, pp. 414–419, 2000. ing such requirements, provided that the quantization step [16] B. Aiazzi, L. Alparone, and S. Baronti, “Fuzzy logic-based ∆ ∆ ≈ size is chosen as an odd integer such that σn. If the matching pursuits for lossless predictive coding of still im- data is intrinsically little noisy, the protocol may lead to the ages,” IEEE Trans. Fuzzy Syst., vol. 10, no. 4, pp. 473–483, direct use of lossless compression, that is, ∆ = 1, to obtain 2002. what has been denoted as virtually lossless compression. [17] B. Aiazzi, P. Alba, L. Alparone, and S. Baronti, “Lossless compression of multi/hyper-spectral imagery based on a 3-D fuzzy prediction,” IEEE Trans. Geosci. Remote Sensing, vol. 37, ACKNOWLEDGMENT no. 5, pp. 2287–2294, 1999. [18] J. Mielikainen and P.Toivanen, “Clustered DPCM for the loss- The authors wish to thank the anonymous referees, whose less compression of hyperspectral images,” IEEE Trans. Geosci. insightful comments and constructive criticisms have greatly Remote Sensing, vol. 41, no. 12, pp. 2943–2946, 2003. improved the clarity of presentation of the concepts ex- [19] B. Carpentieri, M. J. Weinberger, and G. Seroussi, “Lossless pressed in the paper, thereby enlarging its potential scope compression of continuous-tone images,” Proc. IEEE, vol. 88, within the community of astrophysicists. no. 11, pp. 1797–1809, 2000. [20] T. V. Ramabadran and K. Chen, “The use of contextual in- REFERENCES formation in the reversible compression of medical images,” IEEE Trans. Med. Imag., vol. 11, no. 2, pp. 185–195, 1992. [1] F. Murtagh, J.-L. Starck, and M. Louys, “Distributed visual [21]X.WuandN.D.Memon,“Context-based,adaptive,lossless information management in astronomy,” IEEE Computing in image coding,” IEEE Trans. Commun., vol. 45, no. 4, pp. 437– Science & Engineering, vol. 4, no. 6, pp. 14–23, 2002. 444, 1997. [2]M.Louys,J.-L.Starck,S.Mei,F.Bonnarel,andF.Murtagh, [22] M. J. Weinberger, G. Seroussi, and G. Sapiro, “The LOCO- “Astronomical image compression,” Astronomy and Astro- I lossless image compression algorithm: principles and stan- physics Supplement Series, vol. 136, no. 3, pp. 579–590, 1999. dardization into JPEG-LS,” IEEE Trans. Image Processing, [3]R.E.RogerandJ.F.Arnold,“Reversibleimagecompres- vol. 9, no. 8, pp. 1309–1324, 2000. sion bounded by noise,” IEEE Trans. Geosci. Remote Sensing, [23] B. Aiazzi, L. Alparone, and S. Baronti, “Context modeling vol. 32, no. 1, pp. 19–24, 1994. for near-lossless image coding,” IEEE Signal Processing Lett., [4] J.-L. Starck, F. Murtagh, B. Pirenne, and M. Albrecht, “As- vol. 9, no. 3, pp. 77–80, 2002. tronomical image compression based on noise suppression,” [24] X. Wu and P. Bao, “ L∞ constrained high-fidelity image com- Publications of the Astronomical Society of the Pacific, vol. 108, pression via adaptive context modeling,” IEEE Trans. Image pp. 446–455, 1996. Processing, vol. 9, no. 4, pp. 536–542, 2000. Virtually Lossless Compression of Astrophysical Images 2535

[25] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding a Research Fellow and Research Assistant, since 2001, he has been a for data compression,” Communications of the ACM, vol. 30, Researcher with IFAC-CNR (formerly IROE-CNR), Florence, Italy, no. 6, pp. 520–540, 1987. where he currently participates in research activities concerning [26] R. F. Rice and J. R. Plaunt, “Adaptive variable-length coding image quality definition and measurement, advanced methods for for efficient compression of spacecraft television data,” IEEE lossless and near-lossless remote sensing data compression, mul- Trans. Commun. Technol., vol. 19, no. 6, pp. 889–897, 1971. tiresolution image analysis and data fusion, and SAR image analy- [27] M. J. Weinberger, J. J. Rissanen, and R. B. Arps, “Applica- sis and classification. He has been involved in several international tions of universal context modeling to lossless compression of research projects with ESA, CNES, and ASI on advanced data com- gray-scale images,” IEEE Trans. Image Processing, vol. 5, no. 4, pression and image fusion algorithms for environmental applica- pp. 575–586, 1996. tions. He is responsible for SAR image analysis and classification [28] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function in a project funded by the Italian Space Agency. He has coau- Algorithm, Plenum Press, New York, NY, USA, 1981. thored over 20 papers published in international peer-reviewed [29] Consultative Committee for Space Data Systems, Washington, journals. DC, USA: CCSDS, Lossless Data Compression: Report Con- cerning Space Data Systems Standards, (Green Book), May 1997. Luciano Alparone obtained the Laurea de- [30] ISO TC 20/SC 13/ICS 49.140, 15887-2000, Space data and gree in electronic engineering “summa cum information transfer systems—Data systems—Lossless data laude” and the Ph.D. degree from the Uni- compression, 2000. versity of Florence, Florence, Italy, in 1985 [31] S. W. Golomb, “Run-length encodings,” IEEE Trans. Inform. and 1990, respectively. Since 1992, he has Theory, vol. 12, no. 3, pp. 399–401, 1966. been with the Department of Electronics [32] J.-L. Starck and F. Murtagh, “Image restoration with noise and Telecommunications of the University suppression using the wavelet transform,” Astronomy and As- of Florence, first as an Assistant Professor, trophysics, vol. 288, no. 1, pp. 342–348, 1994. and since 2002 as an Associate Professor [33] F. Murtagh, J.-L. Starck, and M. Louys, “Very-high-quality of Electrical Communications. In 1989, he image compression based on noise modeling,” International was a Research Fellow with the Signal Processing Division at the Journal of Imaging Systems and Technology, vol. 9, no. 1, University of Strathclyde, Glasgow, UK. During spring 2000 and pp. 38–45, 1998. summer 2001, he was a Visiting Professor at the Tampere In- [34]B.Aiazzi,L.Alparone,A.Barducci,S.Baronti,andI.Pippi, “Information-theoretic assessment of sampled hyperspectral ternational Center for Signal Processing (TICSP), Tampere, Fin- imagers,” IEEE Trans. Geosci. Remote Sensing, vol. 39, no. 7, land. His main research interests are lossless and near-lossless pp. 1447–1458, 2001. compression of remote sensing and medical imagery, multires- [35]B.Aiazzi,L.Alparone,A.Barducci,S.Baronti,andI.Pippi, olution image analysis and processing, nonlinear filtering, mul- “Estimating noise and information for multispectral im- tisensor image fusion, and analysis and processing of hyper- agery,” Optical Engineering, vol. 41, no. 3, pp. 656–668, 2002. spectral and synthetic aperture radar images. He coauthored 50 papers published on international peer-reviewed journals and Cinzia Lastri was born in Florence, Italy holds a patent on a procedure for progressive image transmis- in 1976. She received the Laurea degree sion. in telecommunication engineering from the University of Florence, Florence, Italy, in Stefano Baronti was born in Florence, Italy, 2002 with a thesis on “Reversible compres- in 1954. He received the Laurea degree in sion of remote sensing hyperspectral data electronic engineering from the University acquired from satellite.” Since 2002, she has of Florence, Florence, Italy, in 1980. After a been working with the support of CNR period spent with the Italian Highway Com- at the Istituto di Fisica Applicata “Nello pany working on data collection and analy- Carrara” (IFAC-CNR) in Florence, in the sis, he joined the National Research Coun- framework of the ESA contract ESTEC 15662/01/NL/MM “Ad- cil of Italy (CNR) in 1985 as a Researcher vanced methods for lossless compression of hyperspectral data.” In of the “Nello Carrara” IFAC-CNR (for- 2004 she applied for the Ph.D. degree at the University of Florence merly IROE-CNR), Florence, Italy. From on the theme “Digital image processing and compression of mul- 1985 to 1989, he was involved in an ESPRIT Project of the Eu- tispectral and hyperspectral data” for transmission and archiving ropean Union aimed at the development of an automated sys- of remote sensing data. She has coauthored three chapters in in- tem for quality control of composite materials through analy- ternational books on data compression and more than ten works sis of infrared image sequences. Later, he moved toward remote published in proceedings of international conferences. sensing image processing by participating in and as the head of several projects funded by the Italian, French, and European Bruno Aiazzi was born in Borgo San Space Agencies. His research topics are in digital image process- Lorenzo, Florence, Italy, in 1961. He re- ing and analysis aimed at computer vision and cultural heritage ceived the Laurea degree in electronic en- applications, data compression, and image communication (in- gineering from the University of Florence, cluding medical imaging), and optical and microwave remote Florence, Italy, in 1991. In 1992, he won a sensing by synthetic aperture radar. He has coauthored about fellowship on digital image compression for 40 papers published in international peer-reviewed journals. He broadband communications networks, sup- is a Member of IEEE Signal Processing Society and the IEEE ported by the National Research Coun- Geoscience and Remote Sensing Society’s Data Fusion Commit- cil (CNR), Italy. After several years as tee. EURASIP Journal on Applied Signal Processing 2005:15, 2536–2545 c 2005 Hindawi Publishing Corporation

Astrophysical Information from Objective Prism Digitized Images: Classification with an Artificial Neural Network

Emmanuel Bratsolis D´epartement Traitement du Signal et des Images, Ecole´ Nationale Sup´erieure des T´el´ecommunications, 46 rue Barrault, 75013 Paris, France Email: [email protected] Section of Astrophysics, Astronomy and Mechanics, Department of Physics, University of Athens, 15784 Athens, Greece Email: [email protected]

Received 28 May 2004; Revised 14 December 2004

Stellar spectral classification is not only a tool for labeling individual stars but is also useful in studies of stellar population syn- thesis. Extracting the physical quantities from the digitized spectral plates involves three main stages: detection, extraction, and classification of spectra. Low-dispersion objective prism images have been used and automated methods have been developed. The detection and extraction problems have been presented in previous works. In this paper, we present a classification method based on an artificial neural network (ANN). We make a brief presentation of the entire automated system and we compare the new classification method with the previously used method of maximum correlation coefficient (MCC). Digitized photographic material has been used here. The method can also be used on CCD spectral images. Keywords and phrases: objective prism stellar spectra, classification, artificial neural network.

1. INTRODUCTION an automated method, useful to study the spatial distribu- tion of stars (we have the stellar coordinates from the detec- Large surveys are concerned with two things. The first is find- tion method) in groups with the same spectral type (from ing unusual objects. Once detected, these unusual objects the classification method). It is useful in astrophysics be- must always be analyzed individually. The second one is to cause we can have a spatial distribution of stellar groups do statistics with large numbers of objects. In this case, we with the same age (grosso modo) and we can study them need an automated classification system. separetly (morphology, mixture of different populations, High-quality film copies of IIIa-J (broad blue-green etc.). band) plates, taken with the 1.2 m UK Schmidt Telescope in The final aim of this automated method is to study the Australia, have been used. The spectral plates are with disper- stellar population synthesis of Magellanic cloud regions. The sion of 2 440 A/mm˚ at Hγ and spectral range from 3 200 to detection procedure gives the stellar coordinates on the prism 5 400 A.˚ The photographic material has been digitized at the plate [6]. Here we test an ANN based on the classical back- Royal Observatory of Edinburgh using the SuperCOSMOS propagation learning procedure. machine. Stellar classification with ANNs as a nonlinear tech- 2. IMAGE REDUCTION nique has been used by many other researchers in the last decade [1, 2, 3, 4, 5]. These methods were utilized for dif- Our test image contains, in pixel size, a region of ferent databases and different spectral dispersion images. 3 200 (EW) × 3 150 (SN) of the small Magellanic cloud. The In this work, we use wide-field images from the 1.2m UK scanning pixel size of the SuperCOSMOS measuring ma- Schmidt Telescope in Australia with an objective prism P1. chine is 10 µm and the plate scale is 67.11 arcsec/mm. So h m In this case, we can work directly on the image making our image is centered RA2 000 = 1 16 and DEC2 000 = detection, extraction, classification, and testing of popula- −73◦20 and contains a region of 35.8arcmin(EW) × tion synthesis. The main contribution here is that there is 35.2 arcmin (SN) of the SMC (Figure 1). Astrophysical Information from Objective Prism Images 2537

Table 1: Details for the features on objective prism P1. Every spec- trum has a length of 128 pixels. The zero-point (or detection point) corresponds to the pixel number 10.

Feature λ(A˚ ) Distance (mm) Pixel no. Zero-point 5 400 0.000 ± 0.005 10 ± 1 TiO 5 000 0.100 ± 0.005 20 ± 1

Hβ 4 861 0.150 ± 0.005 25 ± 1 TiO 4 800 0.160 ± 0.005 26 ± 1

Hγ + G 4 340, 4 300 0.320 ± 0.005 42 ± 1 CaI 4 227 0.340 ± 0.005 44 ± 1

Hδ 4 101 0.430 ± 0.005 53 ± 1

H + H 3 970 0.500 ± 0.005 60 ± 1 CaII + K 3 936, 3 934 0.520 ± 0.005 62 ± 1 MgI + FeI blend 3 820 0.570 ± 0.005 67 ± 1 Figure 1: Low-dispersion objective prism image of a region of FeI+H blend 3 730 0.640 ± 0.005 74 ± 1 35.8arcmin(EW)×35.2 arcmin (SN) of the small Magellanic cloud FeI blend 3 580 0.740 ± 0.005 84 ± 1 (inner wing). The squares correspond to the positions of the se- lected spectra for test.

Spectral plates taken with Schmidt-class telescopes con- The spectral length contains 128 pixels. These are the zero- tain thousands of spectra. Our initial objective is to detect point plus 118 pixels on the right of zero-point plus 9 pixels these spectra and to extract the basic information. The de- on the left of zero-point. For a better signal-to-noise ratio, tection algorithm takes as input an image frame, divides the the actual extraction of the spectrum is performed by means frame in subframes, and applies a signal processing method. of rectangular weighted “slit” sliding on data. Its width and The processing of detection (DETSP) is carried out in shape are either fixed or determined by the average fit on the four sequential stages [6]. transversal sections of the spectrum. (1) Image frame preprocessing. The whole image frame is Our detected zero-point defined by DETSP at 5 400 Aon˚ filtered by a sequence of median and smoothing filters. A grid the dispersion curve of the objective prism P1 helps us to of subframes is fixed on the filtered image, according to the define the distance measurements for various features. The overlapping mode. results are shown in Table 1. (2) Subframe signal processing. Each one of the fixed The extracted spectra are stored in a two-dimensional file subframes is processed by applying the detection algorithm n × 128, where n is the number of detected spectra. Every based on a signal processing method. The detected spectral row of this file is an independent normalized spectrum with positions are saved in table (file). length 128 pixels. The maximum number of spectra used for (3) Detection table processing. There are possible dou- testing here is N = 426. The low-dispersion objective prism bledetectionsofspectraneartheedgesofneighboringsub- P1 allow us to classify the stellar spectra only in six classes frames. For this reason, the table of detected spectra is now (OB, A, F, G, K, M). Although the number of classes is lim- processed to remove the doubling. It is sorted as well. ited, the method is useful to study the spatial distribution of (4) Detection fine adjustment. The signal processing ap- stars in groups with the same spectral type. proach is used again. Now, as many subframes are fixed as the number of detected spectra. The subframes are narrower 3. CLASSIFICATION BY USE OF ANN and each one includes a particular detected spectrum image. This leads to fine adjustment of the position. The adjusted The objective of classification is to identify similarities and position table is finally sorted. differences between objects and to group them. These groups One of the advantages of the SuperCOSMOS machine is (classes) are motivated by a scientific understanding of the that it scans the plates with a direction parallel to the longi- objects. From spectral energy distributions, we take useful tudinal axis of the spectra. Thus, our spectra are parallel to informations about the intrinsic properties of stars like the a coordinate axis. The success of DETSP procedure is that it mass, age and abundances or the related to these like the ra- detects all the spectra at the same common-wavelength zero- dius, effectivetemperatureandsurfacegravity. point at 5 400 A.˚ This zero-point (0.000 mm) corresponds to An ANN is designed to solve a particular problem by our pixel scale (1–128) at 10 pixels. completing two stages: training and verification. During the After the spectral detection, a new procedure starts, re- training stage, a proposed network is provided with a set sponsible for the extraction of spectra (EXTSP) in one- of examples (input with desired output) of the relation- dimensional streams containing all the basic information. ship to be learned, and by implementing specific algorithms, 2538 EURASIP Journal on Applied Signal Processing

(l) usually iterative in nature, the network becomes able to re- where wi is the weight vector of the connections between produce these examples. Once the training stage has been the neurons from the previous layer l − 1 and neuron i in completed, the verification stage, can begin. During the ver- layer l, y(l−1) is the output vector of neurons in layer l − 1, (l) ification stage, a set of new examples, not contained in and bi is a bias term for the neuron i in layer l [7]. the training set, are presented to the network. If the net- The network training is a nonlinear minimization pro- work is unable to generalize the new set, then some re- cess in W dimensions, where W is the number of weights design steps involving addition of more examples and/or in the network. As W is typically large, this can lead to vari- modifications in topology of network must be accomplished ous complications. One of the most important is the problem and the two stages are repeated until satisfactory results are of local minima. To help avoid local minima, a momentum achieved. term is added in the weight update equation. ANNs are connectionist systems consisting of many Weights and biases have been initialized by real random primitive units (artificial neurons) which are working in par- numbers between −1 and 1 and adjusted layer by layer back- allel and are connected via directed links. The general neural ward, according to the enhanced back-propagation learning unit ui has M inputs. Each input is weighted with a weight rule given by M factor wij, so that input information is xi = j= wijuj .The 1   main processing principle of these units is the distribution (l) (l) (l) ∆wi (n +1)=−γ∇J wi + α∆wi (n), (4) of activation patterns across the links similarly to the ba- sic mechanism of a biological neural network. The knowl- edge is stored in the structure of the links, their topology and where γ is the learning rate of the network, α is a momentum weights which are organized by training procedures. The link parameter, and n is the number of cycles. A delta learning connecting two units is directed, fixing a source and a target algorithm (δi is the local gradient for the neuron i)hasbeen unit. The weight attributed to a link transforms the output used for error minimization [8]. of a source unit to an input on a target unit. This is a super- According delta rule, the synaptic weights of the network vised learning. Depending on the weight, the transmitted sig- in layer l are nal can take a value ranging from highly activating to highly forbidding. w(l) n = w(l) n γδ(l) n u(l−1) n The basic function of a unit is to accept inputs from units ij ( +1) ij ( )+ i ( ) j ( ) acting as sources, to activate itself, and to produce one out-   (5) α w(l) n − w(l) n − put that is directed to units-targets. Based on their topology + ij ( ) ij ( 1) and functionality, the units are arranged in layers. The layers can be generally divided into three types: input, hidden, and and the δ’s for the backward computation are output. The input layer consists of units that are directly ac- tivated by the input pattern. The output one is made by the  (L) (L) units that produce the output pattern of the network. All the δi (n) = ei (n)yi(n) 1 − yi(n) (6) other layers are hidden and directly inaccessible. Supervised learning proceeds by minimizing a cost (or for neuron i in output layer L and error) function with respect to all of the network weights. The cost function J of the network is given by   (L) (l) (l) (l+1) (l+1) δi (n) = ui (n) 1 − ui (n) δk (n)wki (n)(7) k 1 1 J = e2 = t − y2,(1) 2 2 for neuron i in hidden layer l. Every input causes a response to the neurons of the first where t is the desired output vector and y the response vector layer, which in turn cause a response to the neurons of the of the network to the training pattern. next layer, and so on, until a response is obtained at the out- The activation function f of the unit ui is given by the put layer. The response is then compared with the target re- sigmoid function sponse, and the error difference is calculated. From the error difference at the output neurons, the algorithm computes the rate at which the error changes as the activity level of the neu-   1 yi = f xi =    . (2) ron changes. Here is the end of forward pass. Now, the algo- − M w u 1+exp j=1 ij j rithm steps back one layer before the output layer and re- calculates the weights between the last hidden layer and the neurons of the output layer so that the output error is min- (l) Aneuroni in layer l has an output yi that is given by imized. The algorithm continues calculating the error and computing new weight values, moving layer by layer back-   ward, toward the input. When the input is reached and the (l) (l) (l−1) (l) yi = f wi y − bi ,(3)weights do not change, the algorithm selects the next pair of Astrophysical Information from Objective Prism Images 2539 input-target patterns and repeats the process. Although re- 1 sponses move in a forward direction, weights are calculated by moving backward, hence the name back-propagation. As . the patterns are chosen randomly, the complete name of this 0 8 method is “stochastic back-propagation with momentum.” The learning algorithm can be applied as follows. 0.6 (1) Initialize the weights to small random values. (2) Choose a training example pair of input-target (x, t). 0.4 (l) (3) Calculate the outputs yi from each neuron i in a l layer starting with the input layer and proceeding layer by 0.2 layer toward the output layer. (l) (l) (4) Compute the δi and the wij for each input of the 0 neuron i in a layer l starting with the output layer and back- tracking layer by layer toward the input. 0 50 100 (5) Repeat steps (2)–(4) until the termination criterion. Position

Figure 2: Some of the OB spectra used for training of the ANN. 4. CLASSIFICATION BY USE OF MCC

Let Dij = Di(λj ), j = 1, ..., N, be the normalized value for 1 the ith stellar spectrum and let Skj = Sk(λj ), j = 1, ..., N, be the normalized value of the kth class standard stellar spec- trum with k = 1, ...,6.Fork = 1, ..., 6, the standard stellar 0.8 spectra are OB, ...,M.Thecorrelationcoefficient for the ith stellar spectrum for the kth class is 0.6

    N 0.4 j= Dij − D¯ i Skj − S¯k r = 1 ik       ,(8) N D − D¯ 2 N S − S¯ 2 j=1 ij i j=1 kj k 0.2

0 with D¯ i being the mean value (over the j variable) of the ith spectrum, i = 1, ..., 426, and S¯k the mean value of the kth 0 50 100 class standard spectrum. Position The correlation coefficient rik for the ith spectrum for the class k was calculated with displacement ±3 pixels to predict Figure 3: Some of the A spectra used for training of the ANN. a possible displacement from the detection algorithm caused by the local background. For these seven correlation coeffi- k cients for every class , the maximum value was chosen. The F, G, K, M). The O and B stars present one class, the OB. The final classification was given by the maximum value of the ffi r r ffi “back-propagation” learning procedure has been used with a coe cient i of all the ik coe cients as “training” mode in which the network learns to associate in-   puts and desired outputs which are repeatedly presented to it ri = arg max rik , k = 1, ...,6. (9) (supervised learning) and a “verification” mode in which the network simply responds to new patterns according to prior ffi r > . training. Only spectra with correlation coe cients ik 0 95 were ac- The experimental database consisted of 426 digitized cepted. In other case, stellar spectra were overlapped or satu- spectra. This allowed us to initialize, update, and train the rated. ANN. No more than 2000 cycles were needed to stabilize the learning with learning rate γ = 0.05 and momentum param- α = . 5. EXPERIMENTS AND RESULTS eter 0 1. Some of the training spectra are presented in Figures 2, 3, 4, 5, 6,and7. 85 spectra have been used for A neural network of three layers and 72 input units, 32 hid- training, 85 for verification, and 426 for “test.” den units, and 6 output units has been chosen here. The in- The results are presented in Tables 2, 3, 4, 5, 6,and7.The put units are normalized pixel value units with pixel posi- ANN column has numbers with an integer and a decimal tions from 11 to 82 corresponding to the central part of digi- part. The integer part corresponds to the accepted-by-the- tized spectra. The output units are units corresponding to the system class and the decimal part to the percentage gravity six different classes of low-dispersion stellar spectra (OB, A, of the accepted-by-the-system class. For example, ANN 1.86 2540 EURASIP Journal on Applied Signal Processing

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0 0 50 100 0 50 100 Position Position

Figure 4: Some of the F spectra used for training of the ANN. Figure 7: Some of the M spectra used for training of the ANN.

1

0.8

0.6

0.4

0.2

0 0 50 100 Figure 8: Training by using Stuttgart neural network simulator (72 Position input units, 32 hidden units, and 6 output units).

Figure 5: Some of the G spectra used for training of the ANN. means that this spectrum has been classified as 1 (or OB) with gravity 86%. No is the number of spectrum (from the sample of 426 spectra), Sp the spectral type (class) which is 1 1 for OB, 2 for A, 3 for F, 4 for G, 5 for K, and 6 for M. Us the quality of spectrum which denotes 1 for only recogniz- 0.8 able, 2 for good, and 3 for very good spectra. Spectra denoted as 10, 20, and 30 are the corresponding recognizable, good, and very good used for the training of ANN. The ANN has . 0 6 been developed with the freely distributed Stuttgart neural network simulator (Figure 8). 0.4 The results for the MCC and ANN methods are presented in Tables 8 and 9. Confusion matrices for two methods, com- 0.2 pared with a human expert (HE) results, are presented in Ta- bles 10 and 11. It is evident that the ANN method is better than the MCC. There is an exception for the extreme classes, 0 OB and M, in which the MCC method gives better results. 0 50 100 This means that for some specific cases, like the detection of Position OB stars, the MCC method gives very good results [9]. To quantify the degree of agreement between different Figure 6: Some of the K spectra used for training of the ANN. classification methods, we have calculated the mean error Astrophysical Information from Objective Prism Images 2541

Table 2: OB-type spectra from the sample of 426 selected for test. No is the sample number of spectrum, Sp is the spectral class, Us is the quality of spectrum, and ANN is the artificial neural network classification. For the ANN column, the integer part corresponds to the accepted-by-the-system class and the decimal part to the percentage gravity of the accepted-by-the-system class.

No Sp Us ANN No Sp Us ANN No Sp Us ANN No Sp Us ANN 21 2 1.86 72 1 2 1.74 187 1 2 1.68 284 1 2 1.38 31 1 1.46 74 1 30 1.70 191 1 2 1.20 287 1 2 1.83 51 1 1.74 76 1 1 1.22 192 1 2 1.82 290 1 1 1.88 71 2 1.78 77 1 1 1.28 193 1 1 1.28 293 1 2 1.27 91 2 1.85 78 1 1 1.70 194 1 1 1.49 299 1 2 1.49 10 1 2 1.90 79 1 1 1.22 201 1 2 1.77 302 1 2 1.17 11 1 3 1.73 80 1 1 1.49 202 1 2 1.74 304 1 2 1.67 12 1 1 1.44 81 1 2 1.75 203 1 2 1.87 305 1 2 2.18 14 1 1 1.29 82 1 1 1.41 207 1 2 1.84 306 1 2 2.34 15 1 10 1.76 89 1 10 1.84 208 1 1 1.50 311 1 2 1.65 17 1 2 1.77 93 1 2 1.87 209 1 2 1.87 313 1 2 1.81 21 1 2 1.88 94 1 2 1.85 211 1 2 1.35 314 1 2 1.26 22 1 3 1.57 96 1 2 1.69 213 1 1 1.68 318 1 2 1.84 24 1 1 1.10 99 1 2 1.75 215 1 1 1.64 320 1 2 1.18 33 1 1 1.89 102 1 1 1.28 216 1 2 1.81 323 1 2 1.81 34 1 2 1.84 103 1 2 1.61 217 1 1 1.76 327 1 1 1.47 35 1 3 1.88 104 1 1 2.18 220 1 2 1.38 335 1 2 2.52 36 1 3 1.86 110 1 1 1.23 221 1 30 1.67 337 1 1 2.26 37 1 2 1.72 118 1 20 1.78 223 1 2 3.18 342 1 2 1.89 40 1 2 1.63 120 1 1 3.22 225 1 1 1.42 344 1 2 1.86 43 1 2 1.76 131 1 2 1.72 227 1 2 1.75 347 1 2 1.19 46 1 20 1.87 139 1 1 1.78 228 1 2 1.17 354 1 2 3.16 47 1 1 1.43 141 1 30 1.58 236 1 2 3.25 358 1 1 1.28 49 1 1 1.68 146 1 1 1.59 238 1 1 1.48 359 1 2 1.40 51 1 1 1.85 149 1 1 1.62 240 1 2 1.83 360 1 30 1.62 53 1 10 1.67 154 1 20 1.82 246 1 2 1.61 367 1 2 1.82 56 1 2 1.45 156 1 2 1.71 251 1 2 1.61 368 1 2 1.52 59 1 2 1.46 157 1 1 2.43 252 1 2 1.38 369 1 10 1.83 60 1 10 1.80 158 1 1 1.54 254 1 2 1.87 377 1 2 3.16 62 1 1 1.87 167 1 20 1.69 256 1 2 1.46 378 1 3 1.70 63 1 10 1.73 172 1 2 1.79 260 1 2 1.87 381 1 2 1.71 65 1 10 1.78 174 1 2 1.62 261 1 1 1.57 383 1 2 1.37 67 1 1 1.53 175 1 1 1.86 273 1 2 1.80 385 1 2 1.38 68 1 30 1.89 178 1 1 1.67 280 1 2 1.52 387 1 1 1.33 69 1 2 1.88 180 1 1 1.72 281 1 2 1.86 390 1 2 1.79 70 1 20 1.82 181 1 1 1.42 283 1 2 2.19 401 1 2 1.61

σ σ Ci i MEHEMCC between human expert and maximum correlation persions HEMCC and HEANN. HE is the classified spectrum ffi Ci coe cient classification and MEHEANN between human ex- by the human expert, MCC by the maximum correlation co- ffi Ci pert and artificial neural network and the corresponding dis- e cient method, and ANN bytheartificialneuralnetwork 2542 EURASIP Journal on Applied Signal Processing

Table 3: A-type spectra from the sample of 426 selected for test. No Table 4: F-type spectra from the sample of 426 selected for test. No is the sample number of spectrum, Sp is the spectral class, Us is the is the sample number of spectrum, Sp is the spectral class, Us is the quality of spectrum, and ANN is the artificial neural network clas- quality of spectrum, and ANN is the artificial neural network clas- sification. For the ANN column, the integer part corresponds to the sification. For the ANN column the integer part corresponds to the accepted-by-the-system class and the decimal part to the percentage accepted-by-the-system class and the decimal part to the percentage gravity of the accepted-by-the-system class. gravity of the accepted-by-the-system class.

No Sp Us ANN No Sp Us ANN No Sp Us ANN 42302.66 182 2 1 2.40 30 3 1 3.32 . . 13 2 1 3 21 206 2 1 2 52 54 3 20 3.35 . . 19 2 1 3 31 218 2 2 1 29 61 3 20 3.31 25 2 1 2.28 231 2 1 2.21 91 3 20 3.55 31 2 1 2.70 241 2 2 2.77 133 3 1 2.20 38 2 1 3.16 263 2 1 2.50 140 3 2 3.16 42 2 30 2.85 264 2 2 2.78 152 3 1 3.20 44 2 2 2.76 269 2 2 2.49 153 3 20 3.22 45 2 2 2.53 277 2 2 2.57 . 50 2 1 1.10 286 2 1 2.35 183 3 2 3 52 . 71 2 10 2.76 296 2 3 2.11 186 3 1 2 28 73 2 2 2.31 308 2 2 2.68 188 3 1 3.10 83 2 1 3.21 309 2 2 2.79 198 3 1 3.47 98 2 30 2.61 312 2 20 2.85 212 3 10 3.24 107 2 2 2.77 316 2 1 2.37 214 3 10 3.48 109 2 30 2.88 329 2 20 2.59 219 3 1 3.21 . . 111 2 1 2 57 338 2 10 2 67 235 3 1 1.27 . . 113 2 20 2 88 341 2 2 2 85 258 3 2 3.17 119 2 10 2.42 346 2 30 2.84 262 3 2 3.17 122 2 2 2.72 357 2 2 2.48 265 3 10 3.24 125 2 1 2.11 389 2 1 2.49 271 3 1 3.24 129 2 1 2.83 398 2 2 3.20 285 3 10 3.39 138 2 1 1.51 405 2 1 2.11 294 3 2 3.33 159 2 20 2.85 406 2 1 1.14 . 161 2 1 2.56 411 2 20 2.31 298 3 1 2 24 . 164 2 1 2.80 414 2 10 2.29 331 3 10 3 34 168 2 1 3.36 416 2 1 2.82 349 3 1 3.28 169 2 1 2.30 421 2 1 2.59 386 3 20 3.47 177 2 20 2.93 424 2 30 2.80 426 3 2 3.30 method: Tables 12 and 13 give the global statistical properties be- tween different classification methods. 426 Figure 1 shows the region N83-84-85 which belongs to = 1 Ci − Ci = . MEHEMCC HE MCC 0 15, the inner wing of the SMC and is of interest because of its 426 i=1 OB associations and nebulae. This region has been studied   by different astronomers in the past. It is evident that there is  426   σ =  1 Ci − Ci 2 = . HEMCC HE MCC 0 43, a correlation between associations like NGC 456, 460a,b, and 426 i=1 465 with the nebulae of ionized gas. (10) There are groups of stars with age variations of 4–10 Myr 426 = 1 Ci − Ci = . and spatial scales of 30–400 pc. There is also an extended re- MEHEANN HE ANN 0 13, 426 i=1 gion containing N83-84-85 with a diameter of more than 7  500 pc and sequential star formation on a scale of 10 yr  426 which seems to be part of a supergiant shell.  1  i i 2 σ =  C − C = 0.37. We focus on this region because it seems to show a feed- HEANN 426 HE ANN i=1 back between OB star formation and the physical properties Astrophysical Information from Objective Prism Images 2543

Table 5: G-type spectra from the sample of 426 selected for test. No Table 6: K-type spectra from the sample of 426 selected for test. No is the sample number of spectrum, Sp is the spectral class, Us is the is the sample number of spectrum, Sp is the spectral class, Us is the quality of spectrum, and ANN is the artificial neural network clas- quality of spectrum, and ANN is the artificial neural network clas- sification. For the ANN column, the integer part corresponds to the sification. For the ANN column, the integer part corresponds to the accepted-by-the-system class and the decimal part to the percentage accepted-by-the-system class and the decimal part to the percentage gravity of the accepted-by-the-system class. gravity of the accepted-by-the-system class.

No Sp Us ANN No Sp Us ANN No Sp Us ANN No Sp Us ANN 64 2 4.61 189 4 1 4.36 15305.80 234 5 2 5.27 84104.54 222 4 30 4.65 18 5 30 5.82 237 5 2 5.58 23 4 1 4.59 244 4 2 4.33 26 5 10 5.83 239 5 2 5.81 27 4 1 4.51 249 4 1 4.50 32 5 2 5.56 242 5 2 5.44 57 4 2 4.23 250 4 2 4.41 39 5 2 5.61 255 5 2 5.65 64 4 2 4.28 253 4 2 4.56 41 5 1 4.23 266 5 3 5.69 66 4 2 4.65 259 4 2 4.19 58 5 20 5.75 267 5 2 5.85 86 4 20 4.36 270 4 30 4.57 75 5 20 5.44 268 5 2 4.48 87 4 30 4.58 278 4 1 4.65 85 5 30 5.63 275 5 2 5.64 95 4 2 4.54 279 4 2 4.42 88 5 1 4.41 292 5 2 5.68 100 4 1 4.64 288 4 2 4.51 90 5 1 5.70 300 5 2 5.87 106 4 30 4.42 291 4 1 3.45 97 5 3 4.59 325 5 2 5.36 112 4 1 4.61 297 4 20 4.70 124 5 1 5.77 334 5 2 5.85 115 4 1 4.56 303 4 2 4.36 134 5 2 5.29 336 5 2 5.44 117 4 2 4.47 315 4 1 4.59 135 5 2 5.60 339 5 2 5.86 128 4 1 4.40 322 4 2 4.56 136 5 1 5.83 340 5 2 5.79 137 4 1 4.38 332 4 2 4.39 142 5 20 5.54 355 5 2 5.58 143 4 20 4.63 352 4 30 4.56 144 5 20 5.87 361 5 2 4.33 145 4 2 4.57 362 4 20 4.65 147 5 20 5.59 366 5 2 4.37 150 4 1 5.74 370 4 2 4.37 155 5 2 5.52 371 5 2 4.27 160 4 10 4.29 388 4 30 4.58 163 5 2 5.73 376 5 2 5.88 162 4 1 4.60 394 4 2 4.51 165 5 20 5.74 379 5 30 5.68 170 4 2 4.46 396 4 30 4.69 195 5 2 5.67 380 5 2 5.29 173 4 20 4.58 403 4 1 4.68 196 5 2 4.60 384 5 30 5.79 176 4 2 4.66 413 4 2 4.32 197 5 2 4.40 395 5 2 5.73 185 4 2 4.58 422 4 1 4.59 199 5 2 5.43 397 5 3 5.72 200 5 2 5.42 399 5 2 5.84 204 5 10 5.67 402 5 2 5.74 of the interstellar medium. It suggests that star formation and . . interstellar medium properties probably are self-regulated. 205 5 2 4 40 410 5 1 5 63 Our automated method helped us to show some morpho- 210 5 1 4.53 412 5 1 5.60 logical characteristics of these OB associations and to explain 224 5 2 5.29 415 5 2 5.72 the triggered star formation by possible supernova explo- 226 5 20 5.86 417 5 2 4.36 sions. 230 5 1 5.68 419 5 2 5.55 We have to note that the reddening for the used SMC stel- 233 5 2 5.60 420 5 1 5.62 lar spectra is considered negligible and the training of the ANN has been made directly by the normalized measured spectra of our region. Initially the field of view of CCD images was much This method is used for the moment as classification tool smaller than photographic plates, but digital detectors have only at the University of Athens. now caught up with the development of CCD mosaic cam- eras. Our algorithm can be used also on CCD spectral im- 6. CONCLUSIONS ages. It is evident that if the resolution of digitized plates is the same as the resolution of CCD spectral images, the In this paper, we have described an automated method of method is exactly the same. In the other case, we have to classification for digitized low-dispersion objective prism modify the method to the new resolution. stellar spectra by using an ANN system. This method has 2544 EURASIP Journal on Applied Signal Processing

Table 7: M-type spectra from the sample of 426 selected for test. No Table 8: Results for the MCC after the test with 426 stellar spectra. is the sample number of spectrum, Sp is the spectral class, Us is the quality of spectrum, and ANN is the artificial neural network clas- Human expert classification MCC Class sification. For the ANN column, the integer part corresponds to the OB A F G K M accepted-by-the-system class and the decimal part to the percentage −3 0 0000 0 gravity of the accepted-by-the-system class. −2 0 0200 1 −1 0 8729 4 No Sp Us ANN No Sp Us ANN 0 141 40 16 45 52 72 16 6 2 6.33 295 6 2 6.82 +1 1 9 2 5 7 0 . . 20 6 30 6 94 301 6 20 6 94 +2 2 1 0 0 0 0 . . 28 6 30 6 96 307 6 2 6 90 +3 0 0 0 0 0 0 29 6 30 6.60 310 6 2 6.94 Badclass3 18117165 48 6 3 6.92 317 6 2 6.70 All used 144 58 27 52 68 77 52 6 20 6.90 319 6 2 6.93 55 6 30 6.93 321 6 2 6.79 Table 9: Results for the ANN after the test with 426 stellar spectra. 84 6 1 6.33 324 6 2 6.45 92 6 10 6.96 326 6 2 6.88 Human expert classification ANN Class 101 6 2 5.31 328 6 2 6.95 OB A F G K M 105 6 30 6.87 330 6 2 6.93 −3 0 0000 0 108 6 30 6.95 333 6 2 6.93 −2 0 0100 0 − 114 6 3 6.97 343 6 2 5.84 10431129 116 6 2 6.74 345 6 2 6.93 0 132 48 23 50 56 68 +1 7 6 0 1 0 0 121 6 1 6.54 348 6 2 6.47 +2 5 0 0 0 0 0 123 6 1 5.48 350 6 2 6.24 +3 0 0 0 0 0 0 126 6 20 6.76 351 6 2 6.91 Bad class 12 10 4 2 12 9 127 6 3 6.93 353 6 2 6.92 All used 144 58 27 52 68 77 130 6 1 6.71 356 6 2 6.32 132 6 2 5.54 363 6 2 6.87 148 6 10 6.75 364 6 2 5.18 Table 10: Confusion matrix for human expert and MCC combina- tion. The results are expressed in percentages. 151 6 2 6.87 365 6 2 6.95 166 6 10 6.81 372 6 2 6.91 OB A F G K M 171 6 20 6.91 373 6 2 6.96 OB 97.92 0.69 1.39 0.00 0.00 0.00 179 6 2 6.75 374 6 2 6.93 A13.79 68.97 15.52 1.72 0.00 0.00 184 6 10 6.97 375 6 2 6.78 F7.41 25.92 59.26 7.41 0.00 0.00 190 6 2 6.92 382 6 2 6.91 G0.00 0.00 3.85 86.54 9.61 0.00 ...... 229 6 2 6.94 391 6 2 5.63 K000 0 00 0 00 13 24 76 47 10 29 ...... 232 6 2 6.95 392 6 2 6.24 M000 0 00 0 00 1 30 5 19 93 51 243 6 2 6.58 393 6 3 6.96 245 6 2 6.95 400 6 2 6.86 Table 11: Confusion matrix for human expert and ANN combina- 247 6 1 5.50 404 6 10 6.87 tion. The results are expressed in percentages. . . 248 6 3 5 76 407 6 2 6 54 OB A F G K M 257 6 2 6.79 408 6 3 6.97 OB 91.67 4.86 3.47 0.00 0.00 0.00 272 6 2 6.37 409 6 2 6.91 A6.90 82.76 10.34 0.00 0.00 0.00 274 6 2 6.95 418 6 2 6.66 F3.70 11.11 85.19 0.00 0.00 0.00 276 6 2 6.93 423 6 2 6.84 G0.00 0.00 1.92 96.16 1.92 0.00 282 6 2 6.92 425 6 2 6.65 K0.00 0.00 0.00 17.65 82.35 0.00 ...... 289 6 2 5.89 M000 0 00 0 00 0 00 11 69 88 31

been compared with the previously used MCC method and dispersion objective prism images. The detected objects with gave better results. The automated classification is a part of their coordinates are stored in tables (files). The method is a fully automated method, developed for stellar detection, useful because we can study the spatial distribution of stars extraction of basic information, and classification from low- in groups with the same spectral type. Astrophysical Information from Objective Prism Images 2545

Table 12: Statistical properties for the different classification meth- Emmanuel Bratsolis was born in Greece. ods: human expert, maximum correlation coefficient, and artificial He recieved a B.S. degree in physics from neural network. the University of Athens (UA), Greece, an M.S. degree in astrophysics and space OB A F G K M Spectral-type class technology from the University of Paris 1 23456VII, France, and an M.S. degree in sig- HE 144 58 27 52 68 77 nal and image processing from Ecole´ Na- MCC 148 48 32 61 62 75 tionale Superieure´ des Tel´ ecommunications´ (ENST), Paris, France. He also recieved a ANN 137 58 35 62 66 68 Ph.D. degree in astrophysics (UA) and a Ph.D. degree in image processing (ENST). He has been a researcher in different projects. His research interests include image and signal Table 13: Statistical comparison between the different classification processing, remote sensing, and data analysis in astrophysics. methods: human expert-maximum correlation coefficient, and hu- man expert-artificial neural network.

Test Mean error (ME) Dispersion (σ) HEMCC 0.15 0.43 HEANN 0.13 0.37

ACKNOWLEDGMENT The author is grateful to I. Bellas-Velidis for fruitful discus- sions.

REFERENCES [1] T. von Hippel, L. J. Storrie-Lombardi, M. C. Storrie-Lombardi, and M. J. Irwin, “Automated classification of stellar spectra—I. Initial results with artificial neural networks,” Monthly Notices of the Royal Astronomical Society, vol. 269, no. 1, pp. 97–104, 1994. [2] R. K. Gulati, R. Gupta, P. Gothoskar, and S. Khobragade, “Stel- lar spectral classification using automated schemes,” Astrophys- ical Journal, vol. 426, no. 1, pp. 340–344, 1994. [3] E. F. Vieira and J. D. Ponz, “Automated classification of IUE low-dispersion spectra. I. Normal stars,” Astronomy and Astro- physics Supplement Series, vol. 111, pp. 393–398, 1995. [4] H. P.Singh, R. K. Gulati, and R. Gupta, “Stellar spectral classifi- cation using principal component analysis and artificial neural networks,” Monthly Notices of the Royal Astronomical Society, vol. 295, no. 2, pp. 312–318, 1998. [5]C.A.L.Bailer-Jones,M.Irwin,andT.vonHippel,“Automated classification of stellar spectra—II. Two-dimensional classifi- cation with neural networks and principal components analy- sis,” Monthly Notices of the Royal Astronomical Society, vol. 298, no. 2, pp. 361–377, 1998. [6] E. Bratsolis, I. Bellas-Velidis, E. Kontizas, F. Pasian, A. Dapergo- las, and R. Smareglia, “Automatic detection of objective prism stellar spectra,” Astronomy and Astrophysics Supplement Series, vol. 133, no. 2, pp. 293–297, 1998. [7] D. E. Rumelhart and J. L. McClelland, and the PDP Research Group, Parallel Distributed Processing, vol. 1, MIT Press, Cam- bridge, Mass, USA, 7th edition, 1988. [8] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, NY, USA, 1994. [9] E. Bratsolis, M. Kontizas, and I. Bellas-Velidis, “Triggered star formation in the inner wing of the SMC. Two possible super- nova explosions in the N83-84-85 region,” Astronomy and As- trophysics, vol. 423, no. 3, pp. 919–924, 2004. EURASIP Journal on Applied Signal Processing 2005:15, 2546–2558 c 2005 Hindawi Publishing Corporation

Multiband Segmentation of a Spectroscopic Line Data Cube: Application to the HI Data Cube of the Spiral Galaxy NGC 4254

Farid Flitti Laboratoire des Sciences de l’Image, de l’Informatique et de la T´el´ed´etection (LSIIT) UMR 7005, Universit´e Louis Pasteur (Strasbourg 1), Boulevard S´ebastien Brant, B.P. 10413, 67412 Illkirch, Cedex, France Email: fl[email protected] Christophe Collet Laboratoire des Sciences de l’Image, de l’Informatique et de la T´el´ed´etection (LSIIT) UMR 7005, Universit´e Louis Pasteur (Strasbourg 1), Boulevard S´ebastien Brant, B.P. 10413, 67412 Illkirch, Cedex, France Email: [email protected] Bernd Vollmer Centre de Donn´ees Astronomiques de Strasbourg (CDS), UMR 7550, Observatoire Astronomique de Strasbourg, 11 rue de l’Universit´e, 67000 Strasbourg, France Email: [email protected] Franc¸ois Bonnarel Centre de Donn´ees Astronomiques de Strasbourg (CDS), UMR 7550, Observatoire Astronomique de Strasbourg, 11 rue de l’Universit´e, 67000 Strasbourg, France Email: [email protected]

Received 28 May 2004; Revised 26 February 2005

A new method for the multiband segmentation of a spectroscopic line data cube is presented. This method is intended to help astronomers to handle complex spectroscopic line data cubes where the inspection of the channel and moment maps is difficult. Due to the Hughes phenomenon, the number of input images for the segmentation process is limited. Therefore, the spectrum of each pixel is fitted with a mixture of 6 Gaussians with fixed mean values and variances. The maps of the Gaussian weights are the input for a Markovian segmentation algorithm. The final segmentation map contains classes of pixels with similar spectral line profiles. The application of our method to the HI data cube of the Virgo spiral galaxy NGC 4254 shows that kinematically interesting regions can be detected and masked by our method. Keywords and phrases: spectroscopic data cube reduction, Gaussian mixture model, hierarchical hidden Markov model, multi- component image, Bayesian segmentation, HI 21 cm line spiral galaxy NGC 4254.

1. INTRODUCTION creasing tremendously with the upgrade of existing instru- The natural output of line observations with radio interfer- ments and the building of new telescopes (EVLA, ALMA, ometers like the VLA, Westerbork, ATCA, or Plateau de Bure SKA). This will lead to an enormous increase of available 3D are 3D data cubes, with the astronomical coordinates as x- data, which will be more and more complex. and y-axes and the frequency or velocity channels as third Data cubes from radio line observations are well-suited axis. Within these data cubes, each image pixel contains an test cases for new image processing techniques, because these atomic or molecular line spectrum. Single-dish-line obser- data contain only one single line (e.g., a CO or HI line) vations, of comparable sensitivity, of an equidistant grid of which is shifted according to the radial velocity of the ob- sky positions can also be treated as 3D data cubes (e.g., the served gas (Doppler effect). The standard method for the Parkes survey). The complexity of the 3D structure contained study of data cubes is the inspection of the channel maps by in these data cubes increases with the sensitivity of the obser- eye and the creation of moment maps after clipping the spec- vations. In the radio domain, telescope sensitivities are in- tra at a level of 3σ,whereσ is the rms noise in one channel: Multiband Segmentation of Spiral Galaxy HI Data Cube 2547

×10−3

3.5 35 20 20 3 30 40 40 2.5 25 60 60 2 20 80 80 1.5 15 100 100 1 10

120 0.5 120 5

140 0 140 0 20 40 60 80 100 120 20 40 60 80 100 120

(a) (b)

Figure 1: NGC 4254 [1]. (a) HI emission distribution (moment 0). The x-axis corresponds to , the y-axis to , the intensity is given in mJy. (b) HI velocity field (moment 1), the velocities are given in channel maps.

the zero-order moment is the integrated intensity, the first- not shown, is almost uniform with a small maximum in the order moment is the velocity field, and the second-order mo- galaxy center. The subsequent work was done on the same ment is the distribution of the velocity dispersion. As long as data cube clipped at 3σ. the 3D intensity distribution is not too complex, these maps The segmentation process on such a data cube requires give a fair impression of the 3D information contained in the a nontrivial modeling step based on Bayesian inference. cube. However, when the 3D structure becomes complex, the Due to the curse of dimensionality (Hughes phenomenon), inspection of the velocity channels by eye becomes difficult Bayesian segmentation can only be carried out on reduced and important information is lost in the moment maps, be- data (principal or independent component analysis [2], pro- cause they are produced by integrating the spectra, and thus jection pursuit [3], etc.). We therefore fit all spectra with do not reflect the individual line profiles. The method pro- a mixture of Gaussians (Section 2.1) and select the 6 most posed in this paper is an attempt to provide an additional 2D representative mean values (channel number of the max- segmentation map, which contains additional information imum) and variances (widths) (Section 2.3). In this way on line profiles. This method aims at helping the astronomer the data cube, which had initially 42 channels, is reduced in handling complex 3D data. With the results of our analy- to 6 effective bands. In a second step, the weights of the sis, it is possible to focus the inspection of the channel maps Gaussians with fixed mean values and variances are deter- on kinematically interesting regions. mined by fitting again the observed spectra. The segmenta- For this purpose, we have chosen a target which is not too tion task, which is carried out on the weights, then consists complex; NGC 4254, a spiral galaxy located in the Virgo clus- of clustering the pixels into different classes (or labels) ac- ter.TheHI21cmobservationsweremadewiththeVLA[1]. cording to similar behaviors defined by a chosen criterion The central velocity of the data cube is v = 2408 km s−1 at (Section 3). In this way, we obtain a segmentation map con- channel number 22 and the velocity resolution is 10 km s−1 taining spatially homogeneous classes of pixels with a simi- per channel. For simplicity, we keep in this paper pixel num- lar spectrum. We present the application of this method to bers for the coordinate axis and channel numbers for the the data cube of NGC 4254 in Section 4 including a com- velocity axis. The rms noise of one channel map is σ = parison with other algorithms used for 3D image processing 0.43 mJy.1 The cube was clipped at an intensity level of 6σ = (Section 4.2). A summary and our conclusions are given in 2.58 mJy. Figure 1 shows the maps of the first two moments Section 5. integrated over the whole data cube. The map of the HI emis- This method is intended to be additional and comple- sion distribution (moment 0, Figure 1a) shows an inclined mentary to the traditional methods for the study of 3D data gas disk with a prominent one-armed spiral to the west. The cubes, that is, the inspection of the channel maps by eye, velocity field (moment 1, Figure 1b) is that of a rotating disk the creation of moment maps, and the creation of position- with perturbations to the north-east and to the north. The velocity plots. Our method is intended to help astronomers distribution of the velocity dispersion (moment 2), which is to handle complex spectroscopic line data. The obtained seg- mentation map together with the moment maps can be used for a first inspection of the cube. In addition, masks of the 1The jansky (abbreviated Jy) is a unit of radio flux density (the rate of different classes of the segmentation map can be produced to flow of radio waves); 1Jy = 10−26 watts/m2/Hz. isolate kinematically interesting regions. 2548 EURASIP Journal on Applied Signal Processing

2 2. MULTIBAND ASTRONOMICAL IMAGE PROCESSING where ak(s) is the weight, σk (s) the variance, µk(s) the mean associated with the kth Gaussian component of the sth spec- When no prior knowledge is available for astronomical im- trum, and j is the number of the frequency channel of the ages, the problem of the physical meaning of the reduced data cube. data remains difficult. Apart from inspection by eye, radio astronomers usually use some parametrizations of the spec- trum on each pixel to analyze huge data cubes [4], especially 2.2. Model estimation of parameters the intensity map (moment 0), velocity field (moment 1), The observed spectra of each pixel are fitted by a mixture and the distribution of the velocity dispersion (moment 2). of Gaussians using an expectation-maximization (EM) al- In this paper, a Gaussian mixture with P components mod- gorithm, which is an iterative method for maximum like- els the spectrum on each pixel as an alternative to the classi- lihood estimation. It has different applications [5, 6], but cal parametrizations. Our proposed modeling leads to multi- amongst all, the parameter estimation of the Gaussian den- band image processing, since only a reduced parameter set sity mixture model is probably one of the most widely en- will be required to characterize the spectrum at each pixel countered in the statistical pattern recognition community. (3 × P parameters instead of N values, N standing for the It consists to approximates the probability density function number of spectral bands; P N). (pdf)ofanobserveddatasetbyaP-component Gaus- sian density mixture model [5, 6, 7]. Usually, the EM al- 2.1. Gaussian mixture model gorithm works on realizations of an unknown pdf. In our We consider a multispectral image with N channels defined case, we assume that a spectrum represents already the pdf. on a regular lattice S of size H × W pixels. This image can Thus, we adapted the algorithm to fit directly the spectrum. be viewed as D = H × W spectra ys, s ∈ S,eachoneofsize To satisfy the assumptions required by the EM adaptation, N P . Each spectrum is modeled by a -component Gaussian each spectrum ys must be normalized to look like a pdf, mixture j ≥ j N j = that is, ys( ) 0, for all and j=1 ys( ) 1. Within   k P   the iterative process, the contribution of the th compo- ak(s) −1 2 R[q] k s j ys(j) =  exp j − µF k(s) ,(1) nent of the Gaussian mixture ( , , ) is calculated as fol- 2 σ2 s k=1 2πσk (s) 2 k ( ) lows:

   [q] 2[q] 2[q] [q] 2 ak (s)/ 2πσk (s) exp − 1/2σk (s) j − µk (s) q R[ ](k, s, j) =    ,(2)  q q q q 2 P a[ ] s / πσ2[ ] s − / σ2[ ] s j − µ[ ] s k=1 k ( ) 2 k ( ) exp 1 2 k ( ) k ( )

where q is the number of the iteration step. These contribu- 2.3. Basis selection tions are then inserted into the parameter calculation of the The modeling using (1)requires3parametersforeachGaus- next iterative step: sian component and leads to 3 × P parameters instead of N  channels for each pixel. The Markovian classifier allows N [q] q j= ys(j)R (k, s, j) a maximum number of 9 input images due to the curse a[ +1] s = 1 k ( ) N , of dimensionality (Hughes phenomenon). With P = 6, j= ys(j) 1 we already obtain 18 reduced images (three parameters set  {a s σ s µ s } s ∈ S N [q] k( ), k( ), k( ) k=1,...,6 for each ). We therefore se- q j= jys(j)R (k, s, j) [ +1] 1 σk s µk s µk (s) =  , lect the most representative ( )and ( ) among all Gaus- N j R[q] k s j (3) j=1 ys( ) ( , , ) sian mixtures described in Section 2.2, which are then as- sumed to be constant. For this parameter selection, we used  q 2 N [ +1] [q] a clustering algorithm (K-means; [8]) on the set of vectors j= j − µk (s) ys(j)R (k, s, j) 2[q+1] 1 { µ s σ s } σk (s) =  . ( k( ), k( )) s∈S. In this way, only the weights of the Gaus- N j R[q] k s j j=1 ys( ) ( , , ) sianshavetobedetermined.Equation(1)becomes a s This approach assumes that the weights k( )areallpositive. P ak s −   ak s σk s  ˜ ( ) 1 2 After convergence, one obtains the parameters ( ), ( ), ys(j) = exp j − µk ,(4) 2 σ2 and µk(s) to estimate ys(j)of(1). k=1 2πσk 2 k Multiband Segmentation of Spiral Galaxy HI Data Cube 2549

t− = r Scale R = 2(root) Scale 1 xt− = ωj

P(xt = i/xt− = j)

xt = ωi z Scale 0 n t Pi (zt)

Scale 1

t

Scale 0

t+

Figure 2: Example of a dependency graph corresponding to a quadtree structure on a 16 × 16 lattice. Black circles represent labels and white − + circles represent multicomponent observations {zs = (a˜k(s))k=1,...,P}s∈S.Eachnodet has a unique parent t and four “children” t . where µk and σk do not depend anymore on the loca- estimation step: this phase is realized according to the iter- tion s. This is equivalent to the projection of the spec- ated conditional estimation algorithm [15]. tra on the subspace generated by the following basis: Details of the Markovian classifier procedure is beyond 2 2 2 {(1/ 2πσk )exp((−1/2σk )(j − µk) )}k=1,...,P. the scope of this paper, and we describe here only its main We use the Levenberg-Marquardt algorithm [9]todeter- features (the reader may find some detailed information mine efficiently the a˜k(s) with the selected basis. Thus a mul- about the procedure in [14]). ticomponent image z with P components of size H × W is obtained to feed the Markovian classifier. The Markovian as- 3.1. Hierarchical Markovian model sumption takes into account the neighborhoods when clas- Let z be the multicomponent image, where zs = sifying each pixel, allowing us to regularize the solution, as (a˜k(s))k=1,...,P. For each observation zs, one associates a hid- explained in Section 3.1. Each pixel s is represented by a P den state xs = ωi,whereωi belongs to the label set Ω = {ω } x vector zs = (a˜k(s))k=1,...,P. i i=1,...,K . The segmentation map is obtained using a joint probability P(x, z) and a chosen cost function. In hierarchical 3. BAYESIAN CLASSIFIER FED BY REDUCED DATA Markovian modeling, one assumes an in-scale dependence between hidden states (Figure 2) to model the spatial corre- The segmentation task consists of clustering the pixels into lation of the observations. different classes (or labels) according to similar behaviors de- Let the quadtree G = (T, L)beagraphcomposedofa fined by a chosen criterion. This leads to a segmentation map set T of nodes and a set L of edges. A hidden state will be where each pixel belongs to a given class. Many approaches associated with each node, as illustrated in Figure 2.Each exist in the literature [10], based on neural networks, mor- node t, apart from the root r, has a unique predecessor, its phological filtering, multiscale decomposition, and statisti- “parent” t− which leads ultimately to the root. Each node t, cal analysis. Bayesian statistical theory is a powerful and con- apart from the terminal ones, the “leaves,” has four “chil- venient tool for many segmentation tasks [11]. It allows us dren” t+. The set of nodes T can be divided into “scales,” to statistically regularize the solution using all available ob- T = T0 ∪ T1∪···∪TR, according to the path length from servations. The hidden markov model (HMM) framework each node to the root. Thus, TR ={r}, Tn involves 4R−n sites, within the Bayesian theory models the spatial dependencies and T0 is the finest scale formed by the leaves (T0 = S). between neighboring pixels and imposes a spatial regular- We consider a labeling process2 x which assigns a class la- ity constraint on a segmentation map in a statistical way x G x ={xn}R xn ={x t ∈ Tn} bel t to each node of : n=0 with t, , [12, 13]. The goal is to obtain a final segmentation map con- where xt takes its values in the set Ω. The hidden process, that taining spatially homogeneous classes of pixels with a similar is, the class label x, is supposed to be Markovian in scale. In spectrum. this way, each label set at level n only depends on the upper From the observed image z to the segmented image x, the levels: P(xn | xk, k>n) = P(xn | xn+1). Moreover, the prob- algorithm can be decomposed into three main phases [14]. abilities of interscale transitions can be factorized in the fol- n n+1 (1) Initialization step: the aim is to provide a first estimation lowing way [16, 17]: P(x | x ) = t∈Tn P(xt | xt− ), where of the parameters (K-means algorithm). (2) Segmentation step: the restoration is then achieved using the maximum a posteriori mode (MPM) segmentation rule. (3) Parameter 2Note that, for clarity, x stands for the random process and its realization. 2550 EURASIP Journal on Applied Signal Processing

Image cube EM algorithm on Gaussian mixture P × 3 parameters W × H × N model with P components (Eqs 1–3) foreachspectrum foreachlocations.

(a) Choice of the P most pertinent Gaussians (K-means) Projection on the basis of all image (b) spectra (choice of the weights (Eq. 4)) Levenberg-Marquardt algorithm Unique basis

Reduced images (c)  Weight images W × H × P

Markovian classifier (d)

Segmentation map

Figure 3: Data cube processing method summary. (a) Adapted EM algorithm to fit all spectra using (1). (b) Selection of the P most relevant Gaussians obtained in (a) using a K-means algorithm. (c) Projection on the basis chosen in (b) using a Levenberg-Marquardt algorithm. (d) The reduced images feed a Markovian classifier leading to the final segmentation map. t− designates the parent of site s,asillustratedinFigure 2. pixels. As explained in Section 2.3, we fixed the number of The likelihood of the observations z being in a state x at the Gaussian components to P = 6. It turned out that in- bottom of the quadtree (T0 = S) is expressed as the following tensities in all velocities corresponding to channel num- product (assuming conditional independence): P(z | x) = bers 10 to 35 are equally present in the data cube and that n s∈S P(zs | xs), where for all s ∈ S, P(zs | xs = ωi)  Pi (zs) the widths of the corresponding lines do not vary signifi- represents the likelihood of the data zs. A multidimensional cantly. Therefore the selected variances, which are fixed for Gaussian pdf model is used to derive the latter expression. the further processing of the data cube, have similar val- σ = ...... Under these assumptions, the joint distribution P(x, z) ues: k 1 5, 1 5, 1 6, 1 5, 1 4, 1 2 and the fixed mean val- µ = canbefactorizedasfollows: P(x, z) = P(xr ) t=r P(xt | ues (velocity of the maximum) are almost equidistant: k ...... xt− ) t∈T0 P(zs | xs)[16]. 12 3, 16 0, 20 3, 25 0, 29 2, 33 0 (in channel number). In this way, the initial 42 velocity channels are reduced to 6 effective 3.2. Bayesian segmentation channels. The basis selection algorithm (Section 2.3)ensures The expression of P(x, z)allowsustocalculateexactlyand that we do not lose important information. efficiently P(xt = ωi|z)forallnodest ∈ T. The segmenta- Figure 4 shows the multivariate image (6 images in in- tion label map at the bottom of the quadtree is finally given verse video of the weights of the 6 Gaussians with fixed vari- by xs = arg maxωi∈Ω P(xs = ωi|z). This expression assumes ances and mean values) obtained with our reduction tech- that the model parameters (parameters of the Gaussian pdfs nique. Due to the fact that the mean values of the basis are and interscale transition probabilities) are known. Such an approximately equidistant, the multivariate images resemble estimation is obtained in an unsupervised way using the it- binned channel maps. These reduced images feed the Marko- erative conditional estimation (ICE) algorithm [18]. In prac- vian classifier, allowing us to isolate regions of similar spec- tice, one alternates model parameter estimation and segmen- tra. tation until convergence. The most adapted number of classes depends on the To summarize, one feeds a Markovian classifier with a complexity of the data cube. Selecting an overly small num- multicomponent image {zs = (a˜k(s))k=1,...,P}s∈S (Figure 3). ber of classes results in loss of information, selecting an The output of this classifier is a segmentation map {xs}s∈S overly large number leads to classes without physical mean- which contains spatially homogeneous classes of pixels with ing, which are sometimes made up of dispersed patches on similar spectrum behavior. the image. It is beyond the scope of this paper to discuss how to determine in general the most appropriate number of classes. However, the expert may have an expectation for the 4. RESULTS ON THE NGC 4254 CUBE number of classes of interest. One then may explore differ- In order to test our data processing method explained in ent classification solutions with a variable number of classes Sections 2, 3, and illustrated in Figure 3,wehaveappliedit around this expected value. In our case, we found that 7 to reduce the dimensionality of the NGC 4254 cube, which classes are sufficient to describe the main features of the NGC is composed of N = 42 bands of size H = W = 512 4254 data cube. Multiband Segmentation of Spiral Galaxy HI Data Cube 2551

20 20

40 40

60 60

80 80

100 100

120 120

140 140 20 40 60 80 100 120 20 40 60 80 100 120

(a) (b)

20 20

40 40

60 60

80 80

100 100

120 120

140 140 20 40 60 80 100 120 20 40 60 80 100 120

(c) (d)

20 20

40 40

60 60

80 80

100 100

120 120

140 140 20 40 60 80 100 120 20 40 60 80 100 120

(e) (f)

Figure 4: Maps of the weights of the 6 Gaussians with fixed mean values and variances. In the case of NGC 4254, this corresponds to a reduction of the original 42 channels to 6 effective channels. The x-andy-axes are the astronomical coordinates.

4.1. Physical interpretation of the classes of distinct spectral line profiles with our method. segmentation results The fine structure of the model line profiles is due to The final segmentation map of 7 classes, together with the limited number of Gaussians of the basis. Since the the average observed and model spectra for each class, is widths of the fine structure of the average model spectra shown in Figure 5. In general, the observed spectra are well are always smaller than the widths of the average observed fitted by a weighted combination of our Gaussian basis spectrum, we consider the fits acceptable. In the end, the functions. The fact that the average observed spectra for important information is the average observed line pro- all classes are single peaked shows that we actually obtain file. 2552 EURASIP Journal on Applied Signal Processing

Class 7 20 Class 6 40 Class 5 60 0.02 Class 4 80 0.015 Class 3 100 0.01 Class 2 120 0.005 Class 1 140 0 20 40 60 80 100 120 010203040

(a) (b)

0.02 0.02 0.02

0.015 0.015 0.015

0.01 0.01 0.01

0.005 0.005 0.005

0 0 0 010203040 010203040 010203040

(c) (d) (e)

0.02 0.02 0.02

0.015 0.015 0.015

0.01 0.01 0.01

0.005 0.005 0.005

0 0 0 010203040 010203040 010203040

(f) (g) (h)

Figure 5: Results of segmentation of the maps shown in Figure 4.(a)Segmentationmap.Thex-andy-axes are the astronomical coordinates. (5b)–(5h) Average spectra for each class (solid: observed spectrum, dashed: model spectrum). The velocity channels are represented on the x-axis, the intensity on the y-axis (in Jy). The peaks of the line profiles are indicated by an arrow.

The comparison between the segmentation map and the fied (Figure 5a): (i) a departure from the symmetric velocity velocity field (Figure 1b) shows that the segmentation is gradient near pixel (70,50), where the red/yellow region ex- mainly done according to velocity, that is, the position of the tends into the green region, and (ii) the blue region in the peak in the 1D spectrum. This is due to the nature of the basis north of the galaxy. Both regions are recognized as distinct (almost equidistant in velocity; see Section 4). Only the Peak classes (1 and 4) in Figure 5. of the Averaged line profile of class 4 lies within the full-width We conclude that our data processing method is operat- half-maximum (FWHM) of the averaged line profile of class ing successfully. For this relatively simple data cube, the seg- 5. Here the segmentation is also based on the peak value of mentation map (Figure 5) does not contain additional infor- the line profile which is much smaller for the line profiles mation to the velocity field (Figure 1b). However, our seg- of class 4 than for those of class 5. Thus for a sufficiently mentation map might be used to produce masks to isolate the large number of classes (equal to or larger than the number of kinematically interesting regions in the north of the galaxy Gaussians of the basis), the segmentation is also based on the (classes 1 and 4). Phookun et al. [1]derivedarotationcurve peak intensity of the line profiles. Two perturbations of the from the data cube, made position-velocity diagrams of the velocity field in the northern part of the galaxy can be identi- northern region, and subtracted the emission provided from Multiband Segmentation of Spiral Galaxy HI Data Cube 2553

Class 7 20 Class 6 40 Class 5 0.02 60 Class 4 0.015 80 Class 3 0.01 100 Class 2 120 0.005 Class 1 140 0 20 40 60 80 100 120 10 20 30 40

(a) (b)

0.02 0.02 0.02

0.015 0.015 0.015

0.01 0.01 0.01

0.005 0.005 0.005

0 0 0 10 20 30 40 10 20 30 40 10 20 30 40

(c) (d) (e)

0.02 0.02 0.02

0.015 0.015 0.015

0.01 0.01 0.01

0.005 0.005 0.005

0 0 0 10 20 30 40 10 20 30 40 10 20 30 40

(f) (g) (h)

Figure 6: Principal component analysis (PCA) of NGC 4254. Only the images corresponding to the six largest eigenvalues are kept. (a) seg- mentation map. The x-andy-axes are the astronomical coordinates. (6b)–(6h) Average spectra for each class (solid: observed spectrum, dashed: model spectrum). The velocity channels are represented on the x-axis, the intensity on the y-axis (in Jy). the rotating gas disk to isolate kinematically perturbed re- sentation of the data. This technique is an unsupervised one. gions. Our approach is not intended to replace or compete Since in our case the channels are almost uncorrelated, the with these sophisticated methods. PCA approximately ranks the channels according to their en- ergy contained in the signal. Thus, the six largest eigenvalues 4.2. Comparison with different approaches corresponding to the six channels of largest power feed the These results can be compared to different multiband image Markovian classifier. The classes of the final segmentation processing methods. The oldest and most popular method is map (Figure 6) simply reflect the flux contained in only 6 of principal component analysis (PCA) [8]. It calculates a lin- the original 42 channels. Since line information is contained ear mapping which maximizes the data scatter in the pro- in 25 of the 42 channels, this method loses the information jection subspace. The transformation matrix can be easily contained in 19 channels. In addition, the peaks of the line computed by taking the eigenvector decomposition based on profiles within one class can be different by more than the the data covariance matrix. In keeping only the eigenvectors width of the averaged line profiles. This leads to very broad corresponding to the largest eigenvalues, we project the data or even double-peaked averaged line profiles (classes 2, 4, 6 cube on this subspace in order to obtain a reduced repre- in Figure 6) without physical meaning. 2554 EURASIP Journal on Applied Signal Processing

×10−3

3.5 35 20 20 3 30 40 40 2.5 25 60 60 2 20 80 80 1.5 15 100 100 1 10

120 0.5 120 5

140 0 140 0 20 40 60 80 100 120 20 40 60 80 100 120

(a) (b)

Figure 7: NGC 4254 data cube after addition of a region with an artificial line creating a double-line profile (center at (60, 35)). (a) HI emission distribution (moment 0). The x-axis corresponds to right ascension, the y-axis to declination, the intensity is given in mJy. (b) HI velocity field (moment 1), the velocities are given in channel maps.

Another technique based on independent component cial line is maximum at the center and falls off radially (we analysis (ICA) [2] calculates a linear mapping such that the used a Gaussian profile with a width of 5 pixels). Figure 7 components of the reduced vectors are as independent as shows the moment maps of this new data cube. The addi- possible. ICA finds subspaces in which the new subimages tional lines produce a local maximum in the HI emission are independent instead of just uncorrelated as for the PCA. map (Figure 7a) and a pronounced asymmetry in the veloc- ThereisalargesetofalgorithmsperformingICAbasedon ity field (Figure 7b). tensorial techniques, mutual information minimization, and The basis for this modified data cube is σk = 1.5, 1.5, non-Gaussianity maximization, which are well presented in 1.6, 1.6, 1.5, 1.2 and µk = 12.3, 16.0, 20.2, 24.8, 29.0, 33.0. It [2]. is almost identical to that of the original data cube. Figure 8 The advantage of our approach compared to PCA and shows the multivariate image (6 images in inverse video of ICA is twofold: first, the spectral information is not mixed, the weights of the 6 Gaussians with fixed variances and mean and second, the segmentation classes are physically meaning- values) obtained with our reduction technique on the mod- ful, that is, their pixels have similar spectra. ified data cube. The final segmentation map of this modi- Recently, the separation of astrophysical source maps fied data cube, together with the average observed and model from multichannel observations has become of great inter- spectra, is presented in Figure 9. Again, in the inner part of est [19, 20]. A summary of different techniques for source the galaxy, a discrete velocity field is produced. The north- separation can be found in [19]. Nevertheless, in our opin- ern perturbed part of the velocity field is also recognized ion, these techniques suffer from the lack of physical meaning (class 1). The region of double-line profiles appears as a new of the resulting images since observations are not always the separate class (class 5). The averaged observed and fitted result of mixed independent images. line profiles clearly show a double peak with a component of class 1 (central channel number 17) and the new, artifi- 4.3. Double-line profiles cially added component (central channel number 28). This We have seen that our segmentation method produces a dis- double-line profile is clearly distinct from the fine structure crete velocity field and masks the regions of asymmetries of the modeled lines (see also Section 4.1), which is always in the velocity field. The principal result is thus the mask- contained within the width of the observed line profile. Thus, ing of different regions of interest. All these regions can al- our method is able to detect regions of double-line profiles. ready be approximately separated by eye on the moment The 42-channel maps of the original cube can then be in- maps (Figure 7). On the other hand, a feature that cannot spected in detail using the mask of the double-line region. be detected easily by the moment maps is double-line pro- The double-line profile could have been detected by the files. This information is lost by the velocity averaging pro- inspection of the channel maps by eye and/or by making cess when the moment maps are produced. In order to inves- multiple position-velocity plots. However, without a prior tigate if our segmentation method is able to detect regions of knowledge of the location of the double-line profile, this double-line profiles, we added an artificial line to the spec- can be quite long and painful, especially if the cube is more tra in a circular region north of the galaxy center (with its complex than our test cube. Despite the loss of spectral res- center position (60, 35), a radius of 10 pixels, and a center olution (the cube is reduced from 42 channels to 6 effec- velocity at channel number 28). The intensity of the artifi- tive channels), our method is still able to detect and mask Multiband Segmentation of Spiral Galaxy HI Data Cube 2555

20 20 40 40 60 60 80 80 100 100 120 120 140 140 20 40 60 80 100 120 20 40 60 80 100 120

(a) (b)

20 20 40 40 60 60 80 80 100 100 120 120 140 140 20 40 60 80 100 120 20 40 60 80 100 120

(c) (d)

20 20 40 40 60 60 80 80 100 100 120 120 140 140 20 40 60 80 100 120 20 40 60 80 100 120

(e) (f)

Figure 8: Maps of the weights of the 6 Gaussians with fixed mean values and variances. A region of artificial line profiles creating a region of double-line profiles is added (center at (60, 35)), which is clearly visible in map (8a). The x-andy-axes are the astronomical coordinates. multiple-line profiles. We think that this is very useful for dimensionality (Hughes phenomenon), the number of input an astronomer who has to handle complex 3D data cubes. parameters is limited to 9. In our approach, we chose to fix The further investigation is simplified by the knowledge and the number of Gaussians at 6 and to set their mean values and possible masking of regions of multiple line profiles, which variances to their 6 most representative values. The weights might be carried out with more sophisticated methods ap- of these 6 Gaussians with fixed mean value and variance are plied to the original data cube. determined by again fitting the observed spectra. In this way, the original number of channels of the data cube is reduced to6effective channels. A Markovian image segmentation is 5. SUMMARY AND CONCLUSION then done on the 6 maps of the weights of the Gaussians. The A new method for the segmentation of multiband images final result is a segmentation map where regions of similar of astronomical radio data cubes is presented. The observed spectral line profiles are assembled into different classes. The spectra are first fit by a weighted combination of Gaussian number of classes has to be determined by the user. The op- components. The parameters of the Gaussians are the input timum number of classes depends on the complexity of the of a Markovian segmentation algorithm. Due to the curse of data cube. 2556 EURASIP Journal on Applied Signal Processing

Class 7 20 Class 6 40 Class 5 60 0.02 Class 4 80 0.015 Class 3 . 100 0 01 Class 2 . 120 0 005 Class 1 0 140 20 40 60 80 100 120 010203040

(a) (b)

0.02 0.02 0.02

0.015 0.015 0.015

0.01 0.01 0.01

0.005 0.005 0.005

0 0 0 010203040 010203040 010203040

(c) (d) (e)

0.02 0.02 0.02

0.015 0.015 0.015

0.01 0.01 0.01

0.005 0.005 0.005

0 0 0 010203040 010203040 010203040

(f) (g) (h)

Figure 9: Results of segmentation of the maps shown in Figure 8.(a)Segmentationmap.Thex-andy-axes are the astronomical coordinates. (9b)–(9h) Average spectra for each class (solid: observed spectrum, dashed: model spectrum). The velocity channels are represented on the x-axis, the intensity on the y-axis (in Jy). The peaks of the line profiles are indicated by an arrow.

This procedure is applied to the HI 21 cm line data cube mum intensity. Increasing the number of classes leads to new of the Virgo cluster spiral galaxy NGC 4254 [1]. Due to classes of about the same central velocity with different max- the intensity structure in the data cube, that is, the na- imum intensities. We optimized the number of classes by ture of the object, the 6 most representative mean values comparing the segmentation map to the channel and mo- are almost equidistant and the 6 most representative vari- ment maps. We plan to investigate if there is a way to de- ances are almost constant. Thus, the multivariate image of termine the optimum number of classes in a more objective the weights of the 6 Gaussians corresponds to maps of 6 way. effective channels. For the segmentation, it turned out that In a second approach, we added a region of an artificial the main features of the data cube are visible in the final line to the data cube, creating a region of a double-line pro- segmentation map with 7 different classes. Since the final file. This region is clearly detected by our segmentation algo- segmentation map contains regions of similar spectral line rithm as a new class with a double-line. This makes us con- profiles, it resembles at first sight a binned velocity field. fident that our method can give useful information which is However, different classes contain lines of different maxi- complementary to the traditional moment maps. Multiband Segmentation of Spiral Galaxy HI Data Cube 2557

The proposed method aims at helping astronomers to [14] J.-N.Provost,C.Collet,P.Rostaing,P.Perez,´ and P.Bouthemy, handle complex data cubes where an inspection of the chan- “Hierarchical Markovian segmentation of multispectral im- nelmapsbyeyeisadifficult task. Once the region of interest ages for the reconstruction of water depth maps,” Computer is identified by our method, a mask can be easily produced to Vision and Image Understanding, vol. 93, no. 2, pp. 155–174, 2004. inspect only this region in the channel maps [15] W. Pieczynski, “Statistical image segmentation,” Machine The two free parameters of our method are the number Graphics and Vision, vol. 1, no. 1-2, pp. 261–268, 1992. of Gaussians (with a maximum number of 9) and the num- [16] J.-M. Laferte,´ P. Perez,´ and F. Heitz, “Discrete Markov image ber of classes. We plan to investigate the optimization of these modeling and inference on the quadtree,” IEEE Trans. Image parameters in an objective way by applying our method to Processing, vol. 9, no. 3, pp. 390–404, 2000. different radio data cubes of various complexity. [17] M. Luettgen, Image processing with multiscale stochastic mod- els, Ph.D. thesis, MIT Laboratory of Information and Decision Systems, Cambridge, Mass, USA, May 1993. ACKNOWLEDGMENTS [18] W. Pieczynski, “Champs de Markov caches´ et estimation con- ditionnelle iterative,”´ Traitement du signal,vol.11,no.2, We would like to thank the anonymous referees for helping pp. 141–153, 1994. us to improve this paper substantially. This research has ben- [19] E. Salerno, A. Tonazzini, E. E. Kuruoglu, L. Bedini, D. Her- efited from financial support from the French Ministere` de la ranz, and C. Baccigalupi, “Source separation techniques ap- Recherche (ACI-GRID IDHA (2001–2004)). plied to astrophysical maps,” in Proc. 8th International Confer- ence on Knowledge-Based Intelligent Information and Engineer- ing Systems (KES ’04),R.J.Howlett,M.Gh.Negoita,andL.C. REFERENCES Jain, Eds., vol. 3 of Lecture Notes in Artificial Intelligence 3215, pp. 426–432, Springer, Wellington, New Zealand, September [1] B. Phookun, S. N. Vogel, and L. G. Mundy, “NGC 4254: a spi- 2004. ral galaxy with an m = 1 mode and infalling gas,” Astrophysi- [20]J.-F.Cardoso,H.Snoussi,J.Delabrouille,andG.Patan- cal Journal, vol. 418, no. 1, part 1, pp. 113–122, 1993. chon, “Blind separation of noisy Gaussian stationary sources. [2] A. Hyvarinen,¨ J. Karhunen, and E. Oja, Independent Compo- Application to cosmic microwave background imaging,” in nent Analysis, John Wiley & Sons, New York, NY, USA, 2001. Proc. 11th European Signal Processing Conference (EUSIPCO [3] P. J. Huber, “Projection pursuit with discussion,” Annals of ’02), vol. 1, pp. 561–564, Toulouse, France, September Statistics, vol. 13, no. 2, pp. 435–525, 1985. 2002. [4] R. A. Perley, F. R. Schwab, and A. H. Bridle, “Synthesis imag- ing in radio astronomy,” in Astronomical Society of the Pa- cific Conference, vol. 6 of ASP Conference Series, San Francisco, Farid Flitti received the State Engineer de- Calif, USA, 1989. gree in 1996, the M.s. degree in 1999, both [5]G.J.McLachlanandT.Krishnan,The EM Algorithm and Ex- in electrical engineering from the Ecole´ Na- tensions, John Wiley & Sons, New York, NY, USA, 1997. [6] J. Bilmes, “A gentle tutorial on the EM algorithm and its ap- tionale Polytechnique d’Alger, Algiers, and plication to parameter estimation for Gaussian mixture and the DEA degree in signal processing from hidden Markov models,” Tech. Rep. ICSI-TR-97-021, Univer- the Universite´ Paris-Sud in 2001. He is cur- sity of Berkeley, Berkeley, Calif, USA, 1997. rently pursuing the Ph.D. degree in image [7] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum processing at the Universite´ Louis Pasteur likelihood from incomplete data via the EM algorithm,” Jour- (Laboratoire des Sciences de l’Image, de nal of the Royal Statistical Society: Series B, vol. 39, no. 1, pp. 1– l’InformatiqueetdelaTel´ ed´ etection,´ UMR- 38, 1977. CNRS 7005), Strasbourg, France. He is interested in signal and im- [8]R.O.Duda,P.E.Hart,andD.G.Stork,Pattern Classification, age processing, mainly in statistical models, multispectral image re- John Wiley & Sons, New York, NY, USA, 2001. duction, image segmentation and fusion. [9] E. Walter and L. Pronzato, Identification of Parametric Models from Experimental Data, Communications and Control Engi- Christophe Collet was born in 1966 in neering Series, Springer, London, UK, 1997. France. He graduated from the Universite´ [10] D. A. Van Dyk, “Hierarchical models, data augmentation and Paris-Sud, Orsay, in 1989. He received the MCMC,” in Statistical Challenges in Modern Astronomy III,G. M.S. degree in signal processing, and re- J. Babu and E. D. Feigelson, Eds., pp. 41–56, Springer, New ceived a Ph.D. degree in image processing York, NY, USA, 2002. from the University of Toulon in 1992. He [11] M. Mignotte, C. Collet, P. Perez,´ and P. Bouthemy, “Bayesian spent 8 years at the French Naval Academy inference and optimization strategies for some detection and and was the Chairman of the Laboratory classification problems in sonar imagery,” in Nonlinear Image GTS “Groupe de Traitement du Signal” Processing X, vol. 3646 of Proceedings of SPIE, pp. 14–27, San from 1994 to 2000, where he developed hi- Jose, Calif, USA, January 1999. [12] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distri- erarchical Markovian approaches for sonar image analysis. Since butions and Bayesian restoration of images,” IEEE Trans. Pat- 2001, he has held a Full Professor position at Strasbourg Univer- tern Anal. Machine Intell., vol. 6, no. 6, pp. 721–741, 1984. sity (LSIIT, UMR-CNRS 7005). His major research interests in- [13] C. Graffigne, F. Heitz, P. Perez,´ F. J. Preteux,ˆ M. Sigelle, and clude multi-image segmentation and classification with hierarchi- J. B. Zerubia, “Hierarchical Markov random field models ap- cal approaches (wavelet decomposition, multigrid optimization, plied to image analysis: a review,” in Neural, Morphologi- multiscale modeling), Bayesian inference, Markovian approaches cal, and Stochastic Methods in Image and Signal Processing, for pattern recognition, Bayesian networks, with a particular fo- vol. 2568 of Proceedings of SPIE, pp. 2–17, San Diego, Calif, cus on astronomy (hyperspectral) and medical (multimodal) im- USA, July 1995. ages. 2558 EURASIP Journal on Applied Signal Processing

Bernd Vollmer He received his Ph.D. de- gree in 2000 at Paris Observatory. Between 2000 and 2003, he was a Postdoc at the Max- Planck-Institut fur¨ Radioastronomie, Bonn, Germany. Since 2005, he has been a Staff Astronomer at the Centre de Donnees´ As- tronomiques de Strasbourg (CDS), France. He has been a Theorist and Numerical Modeler, as well as an Observer. He has done numerous N-body models of the in- teraction of cluster spiral galaxies with their environment, includ- ing both gravitational interactions and ram pressure stripping. He compares simulations with observed HI gas distributions and ve- locity fields of Virgo cluster galaxies. He works also on the gas dy- namics in the Galactic Center. Franc¸ois Bonnarel studied astronomy at Marseilles’ Observatory where he got a Ph.D. degree in 1986. After being a Post- doc in Max-Planck-Institut fur¨ Astronomie in Heidelberg (FRG), he got a position of Computer Engineer in Strasbourg Obser- vatory in 1987 where he has been work- ing so far. His main interests and activi- ties deal with astronomical databases, im- age databases, astronomical images man- agement, image compression, image processing for astronomy, vir- tual observatory, and image services interoperability. He is the main author, or coauthor of more than seventy publications in astron- omy. EURASIP Journal on Applied Signal Processing 2005:15, 2559–2572 c 2005 Hindawi Publishing Corporation

Adaptive DFT-Based Interferometer Fringe Tracking

Edward Wilson Intellization, 454 Barkentine Lane, Redwood Shores, CA 94065, USA Email: [email protected]

Ettore Pedretti Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138, USA Astronomy Department, University of Michigan, 914 Dennison Building, Ann Arbor, MI 48109, USA Email: [email protected] Jesse Bregman NASA Ames Research Center, Mail Stop 269-1, Moffett Field, CA 94035, USA Email: [email protected]

Robert W. Mah NASA Ames Research Center, Mail Stop 269-1, Moffett Field, CA 94035, USA Email: [email protected]

Wesley A. Traub Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138, USA Email: [email protected]

Received 1 June 2004; Revised 29 October 2004

An automatic interferometer fringe tracking system has been developed, implemented, and tested at the Infrared Optical Telescope Array (IOTA) Observatory at Mount Hopkins, Arizona. The system can minimize the optical path differences (OPDs) for all three baselines of the Michelson stellar interferometer at IOTA. Based on sliding window discrete Fourier-transform (DFT) calculations that were optimized for computational efficiency and robustness to atmospheric disturbances, the algorithm has also been tested extensively on offline data. Implemented in ANSI C on the 266 MHz PowerPC processor running the VxWorks real-time operating system, the algorithm runs in approximately 2.0 milliseconds per scan (including all three interferograms), using the science camera and piezo scanners to measure and correct the OPDs. The adaptive DFT-based tracking algorithm should be applicable to other systems where there is a need to detect or track a signal with an approximately constant-frequency carrier pulse. One example of such an application might be to the field of thin-film measurement by ellipsometry, using a broadband light source and a Fourier-transform spectrometer to detect the resulting fringe patterns. Keywords and phrases: fringe tracking, DFT, interferometry, IOTA, real time.

1. INTRODUCTION path differences (OPDs) for the three baselines (A-B, A-C, and B-C) provided by IOTA’s three apertures. The infrared-optical telescope array (IOTA), shown in Fig- ure 1, is a 3-aperture-long baseline Michelson stellar inter- 1.1. Fringe tracking goals ferometer located on Mount Hopkins near Tucson, Arizona. Three 45 cm collectors can be located along a 15 m by 35 m Details of the relevant interferometric derivations are cov- L-shaped array, supplying visible and near-IR light to pupil- ered thoroughly in other references such as [1, 2]. From a plane beam combiners. The operational details and scien- signal processing perspective, it is important to know that the tific accomplishments of IOTA have been well documented governing physics of stellar pupil-plane interferometry result in [5, 6]andathttp://cfa-www.harvard.edu/cfa/ oir/IOTA. in an ideal signal that looks like that shown in Figure 2. This paper reports on the development of an algorithm The idealized fringe packet function is a sinc function designed and used to simultaneously minimize the optical multiplied by a sinusoid, and can be represented with (1), 2560 EURASIP Journal on Applied Signal Processing

250 200

150 −CA

] 100 · 50 − 0 B 1 −50 Intensity [ −100 2π/D −150 −200 −250 0 50 100 150 200 250 Sample number [·] Figure 1: Infrared-optical telescope array (IOTA). Figure 2: Idealized interferogram, sinc-function envelopes, and center shown. (In this paper, the [·] in the figure axis labels indi- where y is the mean-subtracted intensity,1 x is the sample cates that the variables are unitless.) number, and A, B, C, D, E are parameters defining, respec- tively, the amplitude, sinc-function width, sinc function cen- ter, sinusoid (fringe) frequency, and sinusoid phase shift: tation,aswellasfasteffects such as atmospheric turbulence.     Ideally, with perfect compensation and no atmospheric dis- y = A sinc B(x + C) sin D(x + E) . (1) tortion, the fringe packet would be fixed in the center of the scan. The interferogram represents the samples taken as Many equivalent variants on this functional form are of a piezo-driven mirror is driven through a stroke of typical course possible (e.g., substituting cos(dx − e)forsin(D(x + length 25 microns over a period of typically 333 milliseconds. ffi E))); this one was chosen to facilitate the gradient-based op- If the center of this mirror scan stroke is not su ciently close timization procedure described in [3, 4]. The sinusoid in to the true OPD zero point, the fringe packet will be lost from this function comes from the interferometric combination of view, and no science data will be available. light from two apertures (a telescope pair), and the sinc func- In practice, an idealized sinc function such as this is not tion enters due to the Fourier transform of the instrument’s seen. The band edges are often obscured by noise. However, spectral response (which is uniform over a fixed range). A is the fringe tracking algorithm was designed to work on actual related to fringe visibility (or contrast) and varies from object data, so it is very robust to significant deviations from this to object. Visibility is the most important measured quantity idealized form. because it is related to the object brightness distribution. B Thus the goal of an interferometer fringe tracker is to an- depends on the filter used and on the composition and tem- alyze incoming interferograms and provide on-line adjust- perature of the object measured. C reflects the optical path ments to the piezo-scanning delay-line mirror to keep the difference (OPD). D, the fringe frequency, is related to the fringe packet centered within the scan window. It should be central wavelength of the light passing through the filter and as follows. to the length of the scan. Although E is not an independent (1) Robust to noise and anomalies in the data: absolute ac- variable in theory, in practice, dispersion, noise, atmospheric curacy is not as important as keeping the fringe within disturbances, and the object itself cause it to vary from this the scan window. idealized case. The relative shift between the fringes obtained (2) Require few if any manual adjustments: autonomous from three baselines, or “phase closure” enables partial re- adaptability is needed to cover widely varying seeing trieval of this information and, consequently, the possibility conditions and object intensities. of obtaining high-resolution images of distant astronomical (3) Computationally efficient: a minimal amount of com- sources. putation time is available due to the limited resources The center of the fringe packet, or interferogram, corre- and need for fast scanning, typically 3 Hz. sponds to the point at which the OPD between each of the two collectors and the source (star) is zero. The path lengths Maximum accuracy is less important than robustness, are adjusted with slow- and fast-moving mirrors, driven to since as long as the fringe packet is in the scan window, it can account for slow (and well calculable) effects such as earth ro- be analyzed in postprocessing. As fringe tracking accuracy increases, however, it becomes possible to reduce the stroke length of the piezo scanner, thus increasing the overall rate 1For the IOTA detector, raw data from each channel is divided by the of data collection. There is a secondary benefit in reducing mean across the full scan. Then the complementary channels are subtracted. stroke length in that, for a constant scan velocity, a shorter Normalization by the sum of the complementary channels is performed for data analysis, but not for fringe tracking. However, fringe tracking algorithm stroke will mean less time between scans, which reduces the performance is independent of the specifics of this mean-subtraction and time during which the atmosphere may have changed, reduc- normalization process. ing the average size of the fringe packet random motion. Adaptive DFT-Based Interferometer Fringe Tracking 2561

2 randomly between one scan and the next. Although these 0 interferograms appear to have relatively consistent quality

Intensity (with the exception of scan #14, which has drifted almost − 2 completely out of the window), it is not uncommon to have 0 50 100 150 200 250 significant changes in quality (noise level, jump size, fringe Sample number [·] clarity) from one scan to the next. (a) Shown along with the raw data, the solid vertical lines indicate the identified fringe packet centers that could have 2 been used by the fringe tracking software to recenter the 0 piezo scan, had fringe tracking been turned on. The relative Intensity −2 confidence in the fringe-center identification is indicated in 0 50 100 150 200 250 the title of each subplot and will be discussed later. Sample number [·] Significant sources of noise include atmospheric turbu- lence, vibration, photon noise, and detector noise. The goal (b) of the fringe tracking system is to perform coherencing (ver- 2 sus cophasing) by controlling the OPD to allow the interfer- ogram to be captured in the presence of bad seeing condi- 0 tions and fainter objects. The controller works by identify- Intensity −2 ing the fringe-center locations on all 3 interferograms follow- 0 50 100 150 200 250 ing each scan and then adjusting the centers of travel of the Sample number [·] piezo-driven scanning mirrors, attempting to keep the fringe (c) packets centered in all 3 scan windows. The computing and actuation aspects of the control system are described by Pe- 2 dretti and Traub [2, 6]; the present article details the fringe tracking algorithm and aspects of its software implementa- 0 tion. Intensity −2 Due to the noise sources present and the lack of a suf- 0 50 100 150 200 250 ficiently representative simulation, the fringe tracking algo- · Sample number [ ] rithms presented here were developed through extensive test- (d) ing on actual data sets from IOTA, dating back to 1997 (as opposed to working with simulated data). 2

0 1.2. Related research Intensity −2 Observations performed with long-baseline ground-based 0 50 100 150 200 250 optical/infrared interferometers are strongly affected by the Sample number [·] turbulent atmosphere. Turbulence can reduce the visibility of (e) fringes in many ways as described in [7] for pupil-plane (or coaxial) beam combination and in [8] for image-plane beam Figure 3: Typical sequence of scans ((a) scan no. 12, confidence = combination. Turbulence randomly modulates the phases of . = . = 4 6; (b) scan no. 13, confidence 6 1; (c) scan no. 14, confidence the fringes which can then become unusable for image re- . = . 2 2; (d) scan no. 15, confidence 7 0, and (e) scan no. 16, confi- construction. Using three or more telescopes enables reduc- dence = 6.4). tion of this atmospheric phase contamination. This is done through the closure-phase technique pioneered in radio as- Figure 3 shows a typical sequence of scans from IOTA. tronomy by [9] and recently applied to long-baseline opti- This data was taken on April 20, 2004, on the B-C telescope cal interferometry [10] allowing the first image of an astro- pair, targeting star HD126035, with the fringe tracker turned nomical source (the Capella) to be obtained by off. These are 5 consecutive 256-point scans out of the 200- an optical interferometer. More recently, optical and infrared scan data set, taken at a scan rate of 3 Hz, with a 25-micron interferometry has been able to provide information on the scan length. The IONIC beam combiner works in H band morphology of stellar sources [11] and extragalactic sources which translates to a wavelength of 1.65microns.TheHband [12]. filter used has a bandwidth of 0.35 µm. Depending on the The necessary condition for obtaining meaningful scan length and number of samples per scan, sampling can closure-phases is that the three fringe packets must all be range from about 4–10 samples per fringe. present in the same temporal interval. This is achieved by Although the idealized form of the sinc-sinusoid from (1) keeping the optical path difference (OPD) to a minimum. can be seen, there is significant background noise, variabil- Fringe tracking was used in interferometry since the very ity of the sinusoid (fringe) frequency from scan to scan and beginning of the field, when Michelson and Pease [13] used within each scan, and the packet center can be seen to move a prism for dispersing and acquiring fringes visually at the 2562 EURASIP Journal on Applied Signal Processing

20-foot interferometer. Labeyrie [14] used the same system whereas [4] focuses on the IOTA implementation issues, and demonstrated fringe acquisition on a two-telescope in- envelope-based tracking, and offline gradient-based opti- terferometer. Several systems have been proposed since then, mization of all packet parameters. for correcting the optical path [15, 16]. GDT (also called dis- persed fringe tracking when applied to image plane interfer- 1.3. Approach ometry) has been routinely used at several interferometric Guided by a background in signal processing and system facilities [17, 18, 19]. identification (ID), the original approach taken towards When IOTA relied on a single baseline, the fringes were fringe tracking was to fit the parameters in (1) to the data usually kept inside the scan interval manually by the ob- on each scan, with the fringe center then contained in C. servers. The installation of the third telescope at IOTA re- A nonlinear, gradient-based optimization was developed to quired an increase in the level of automation in the instru- perform this, with extensive testing and tuning on represen- ment, because manual tracking is not practical with three tative IOTA data sets from 1997. This nonlinear optimiza- baselines to adjust. In particular, the requirement to mea- tion required a reasonably close initial estimate for C,which sure closure-phases necessitated a system capable of keeping was provided by processing the fringe packet envelope. As it the fringe packets in the center of the scan using the existing turned out, the accuracy of this initial estimate was generally hardware dedicated to acquiring science data. Fringes must within a sample or two (out of 256 points in a scan, typically) be acquired in the same coherence time in order to measure of the result following the full nonlinear ID. Given imple- a closure-phase. A coherencing algorithm is very useful to lo- mentation constraints and the existence of other more sig- calize the position of a fringe packet and correct the OPD in nificant error sources, it was decided that this initial estimate order to compensate metrology errors and atmospherically processing could serve as the online fringe ID algorithm. This induced fluctuations in the optical path. This maximizes the was tested online in 1999 and 2000 [21]. superposition of the fringe packets and the signal-to-noise- In 2002, following the instrument control hardware up- ratio (SNR) of the closure-phase signal when this is averaged grades and in preparation for a second implementation at- in the complex plane as shown in [20]. tempt, the algorithm was updated. The original envelope- The remainder of this section summarizes fringe tracking based algorithm basically drew an envelope around the data developments at IOTA. Wilson developed a method, sum- and found the hump, thereby completely ignoring the fringe marized in this paper, that used the envelope of the inter- frequency, D. As can be seen in the example data given pre- ferogram to identify the packet center, and a gradient-based viously, the fringe frequency is visible in the fringe packet, optimization method for refinement of this estimate [3]. Al- and is relatively obscured by noise outside the center (due to though fast and robust, it did not make use of the fringe fre- the smaller envelope). The improvement looks for intensity quency, leading to the present research which makes this im- amplitude at the fringe frequency, rather than at all frequen- provement. This was an offline study using IOTA data taken cies (as the envelope-based ID did). See, for example, scan in 1997. #14 in Figure 3. In that case, the envelope would not be a Morel and others in the IOTA team worked to implement clear signal, but focusing on the expected fringe frequency the core aspect of Wilson’s 1999 algorithm on the IOTA scan- leads to an accurate identification even with very little of ning hardware [21]. The fringe-center identification aspect the fringe packet in the window. This is accomplished with of the system was found to be very robust and accurate even an efficiently implemented sliding window discrete Fourier with very noisy signals, but the slow response of the con- transform (DFT). This updated algorithm was implemented trol communications and actuation hardware made the over- in February 2002 at IOTA, with testing on simulated fringes all control system ineffective. The control computing, com- through the instrument, and later on-the-sky testing with all munications, and actuation hardware was subsequently up- 3 apertures in May 2002. Being more physically based, the graded to permit further implementation efforts [6]. change was made with the expectation that it would be more Pedretti developed a fringe tracking algorithm taking a robust for future data and algorithm changes. completely different approach, based on double Fourier in- terferometry (DFI) [2, 22]. This method calculates the group delay of fringes dispersed with DFI, which is used to obtain 2. DFT-BASED TRACKING the wavelength-dependent phase from the fringe packet. This method has also been implemented at IOTA on the current The algorithm is summarized in Section 2.1, and then the hardware, and is used there regularly. A performance com- individual steps are outlined in subsequent sections. parison of the different approaches at IOTA is presently un- derway. 2.1. DFT-based tracking algorithm summary Thureau developed a fringe envelope tracking algorithm (1) A window (nominally of a length containing two at COAST, which was subsequently implemented for testing fringe periods, but can be set to any integer) is at IOTA [23]. passed over the data, where a single-frequency discrete Gradient-based optimization, motion prediction, and Fourier transform (DFT) is calculated to try to detect other offline analyses are discussed in [4]. As compared to the expected fringe frequency (this frequency is adap- that publication, the present article uses data from 2004 tively updated—by changing the window size—after and focuses on the adaptive DFT-based tracking algorithm, each scan). The DFT is calculated 5 times for each Adaptive DFT-Based Interferometer Fringe Tracking 2563

scan, using window sizes of nominal plus [−4, −2, 2 0, +2, +4]. For example, if the nominal size is a 17- 1.5 sample window covering 2 cycles, then the DFT is cal- ] · culated for 13-, 15-, 17-, 19-, and 21-point windows. 1 The number of points in the window is odd, so the . center lands on a point. The magnitudes (i.e., the root 0 5 sum of squares of the real and imaginary parts is taken, 0 although that may not be essential) of these DFT re- sults are used to determine the nominal window size −0.5

for the next scan. Normalized intensity [ −1 (2) Each of the 5 DFT results is smoothed using a rectan- gular averaging filter. −1.5 (3) The point-by-point maximum of the 5 smoothed DFT 0 50 100 150 200 250 · results, referred to as the composite DFT magnitude, Sample number [ ] is taken for further processing. Steps (1), (2), and (3) Raw scan data make this result more robust to intra-packet fringe fre- Identified center quency variations than a single-frequency DFT scan would be. The frequency corresponding to the largest Figure 4: Normalized raw data. DFT magnitude is chosen as the nominal frequency for the following scan—providing adaptive response 2 to changing interferogram properties, and eliminating the need to initially set this carefully. 1.5 ] · (4) A fringe-packet-finding template is convolved with the 1 composite DFT magnitude, providing a peak when the composite DFT magnitude matches the template 0.5 shape. For computational efficiency, a rectangular 0 template is used in place of a sinc-shaped template. (5) The sample corresponding to the maximum value of −0.5

the previous step is used as the identified fringe-packet Normalized intensity [ center. −1 (6) A confidence metric is calculated based on the relative −1.5 magnitudes of the composite DFT magnitude near the 140 145 150 155 160 165 170 175 180 185 ID’ed center and the background. Sample number [·] (7) The previous steps are performed on all aperture pairs Raw scan data (3 in the case of IOTA), and the ID results and cor- DFT window for i = 163 responding confidence metrics are combined to deter- DFT window for i = 162 mine the scan centers for the next scan (to begin within Identified center a couple of milliseconds). Figure 5: DFT sliding window. 2.2. Example data The algorithm steps are presented in detail using actual data, as an FFT. This DFT calculates the magnitude of the sig- as shown in the following figures. They were generated using nal in only one frequency bin—that nominally correspond- data collected from the BC-fringe (from apertures B and C) ing to the fringe frequency. Also, a rectangular window is of the 16th scan of the IOTA-25 dataset on April 20, 2004; tar- used, which enables very fast computation as the window geting star HD126035; RA (J2000): 14.390278; Dec (J2000): is passed over the data. Calculating each new data point re- −11.713889. This interferogram is also shown in Figure 3. quires adding a term for the incoming sample and subtract- Figure 4 shows the normalized raw data (the complementary ing a term for the leaving sample. So, for example, to calcu- pair, B-C), as well as the result of the center identification late the DFT for the fringe frequency centered at sample #163 that came after all steps were completed. in the figure uses the DFT result for sample #162, then adds a term for point #171 (= 163 + (17 − 1)/2, the new point 2.3. DFT calculation in the sliding window), and subtracts a term for point #154 Figure 5 shows how the DFT window—in this case a 17-point (= 162 − (17 − 1)/2, the point dropping out of the sliding window nominally containing two fringe wavelengths—is window). passed over the raw data. The purpose of the DFT is to lo- Whileitisefficient, this implementation of the DFT calcu- cate areas in the scan where the expected fringe frequency lation is also exact. By doing this calculation only for the fre- is present. A few things are done to greatly improve the ef- quency component of interest, this is faster than an FFT. FFT N N N ficiency of the DFT calculation—note it is not calculated compute time is proportional to log2 ,where is the 2564 EURASIP Journal on Applied Signal Processing window length. A full-spectrum DFT would be N2,whereas nary parts), but the magnitude is still exact. The phase this single frequency DFT compute time is proportional to imparted by this step may easily and exactly be re- N. But a second and far more significant level of optimiza- moved if phase information were important. So tion is achieved by using a rectangular window and sliding it across the scan, one sample at a time, as follows.     Xreal(k) = Xreal(k −1)+Yreal k +N2 −Yreal k −N2 −1 . (5) (1) For implementation in C, the real and imaginary parts are calculated separately. So sine and cosine functions (5) Since only the magnitude (versus the phase) is used, corresponding to the expected fringe frequency (as de- the real and imaginary results are combined accord- fined by the window size and bin number) are calcu- ingly (the square root could probably be omitted to lated as, for example, improve speed if needed). Division by the window length also occurs at this point, to enable meaningful     ff π b − − π b − comparison between di erent DFT results. 2 ( 1)k 2 ( 1)k cos N ,sin N ,(2) (6) All computations are done using floats (versus dou- bles),sincethisprovidessufficient accuracy. b (7) The DFT calculation does not depend at all on whether where is the bin number (e.g., 3 for a window nom- the window or scan size is a power of 2 (as the FFT does inally containing two wavelengths), N is the window k to some extent—although the non-power-of-two inef- length, and ranges from 0 to 255 for a 256-point scan. ficiency is very slight for some FFT implementations To further improve run-time performance, these sine such as FFTW [24]). and cosine operations could be made once on startup only and storing the 256-long vectors corresponding The DFT window size is chosen in this case to nominally to each potential window size to be considered (e.g., contain exactly two fringe wavelengths. The algorithm re- 7, 9, 11, ..., 31). y quires the window size to be an odd integer whose selection (2) The scan intensity vector, , is multiplied point by is discussed later. Because of the way the DFT-calculation al- point with these sine and cosine vectors—once per Y Y gorithm has been implemented, window sizes of 3, 4, 5, and scan, resulting in 256-long vectors, real and imag, so forth wavelengths could be calculated without changing   the computation time. However, for larger windows, there 2π(b − 1) Yreal(k) = y(k)cos k , is a possibility of the frequency changing during the win- N dow, which would distort the DFT calculation (i.e., when the   (3) − π b − time corresponding to the window size is less than the co- Y k = y k 2 ( 1)k . imag( ) ( )sin N herence time). Two wavelengths appears to be a good com- promise between accuracy on clean scans (with higher co- (3) The real and imaginary parts of the DFT result for herence time, more accuracy would be possible with a larger ffi only the first sample in the scan (e.g., Xreal(N2), where window, but on clean scans tracking accuracy is not di cult) N2 = (N −1)/2) are calculated by summing the above- and noisy scans (if the coherence time is much below two calculated real and imaginary vectors over a window of wavelengths, there are probably no fringes to be seen). length N,2 for example, For calculation of the DFT over the full spectrum, the first component (“bin”) would correspond to the average N   2   value, the second component would correspond to a full Xreal N2 = Yreal N2 + i , wavelength extending across the full window, and the third i=−N2 component would correspond to two full wavelengths. Since (4) the window size was chosen to cover two full fringe wave- N   2   lengths, we calculate only the third component of the full- Ximag N2 = Yimag N2 + i . i=−N spectrum DFT. The real and imaginary parts are computed 2 and then combined to produce the magnitude. The phase in- (4) Then, to calculate the real and imaginary DFT re- formation is not used. This result is then smoothed using a sults at successive samples throughout the middle of sliding window having the same width as the DFT window the scan, the calculation is just one add and one sub- (two wavelengths in this case), with the mean over the win- tract for the real and imaginary parts, corresponding dow producing the result shown in Figure 6.Thisstepiscom- ffi to points entering and leaving the window. In partic- putationally e cient, and reduces the variability in the DFT ular, this is possible because a rectangular window is results. ff used. This computational optimization will change the Even though it is calculated very di erently, the result is phase of the result (and therefore the real and imagi- very similar to that resulting from the envelope-finding cal- culations described in [3]. While the envelope-finding cal- culations provided excellent results, this DFT calculation is 2The details of handling the beginning and ending of the scan are not more physically based and is expected to be more robust for coveredhere,butofcoursemustbeaddressed. noisy signals. Adaptive DFT-Based Interferometer Fringe Tracking 2565

0.6 0.7

. ] ] 0 6 · · 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 Smoothed DFT magnitude [ Smoothed DFT magnitude [ 0.1

0 0 0 50 100 150 200 250 0 50 100 150 200 250 Sample number [·] Sample number [·]

DFT result N = 13 N = 19 Identified center N = 15 N = 21 N = 17 Figure 6: Smoothed DFT magnitude, using, in this case, a 17-point Figure 7: Smoothed DFT results for 13-, 15-, 17-, 19-, and 21-point sliding window. windows.

The DFT can be thought of as a sampled version (ex- 2.4. Combination of DFT results at multiple ists only at the bin frequencies) of the discrete-time Fourier frequencies transform (DTFT) of the continuous signal convolved with the DTFT of the window function (in the simplest case, as we Figure 7 summarizes the results of the smoothed DFT cal- ff have here, this is a uniform window of finite length). With an culations using five di erent window sizes. Since the fringe infinitely wide window, the DTFT of the window would be frequency is not known exactly, and may change, this algo- an impulse, so the convolution would not distort the DTFT rithm adapts to use the best window size. For every scan, the of the signal. With a finite window, two effects occur. smoothed DFT calculations discussed above run five times: once for the window size used on the previous scan (e.g, 17 (1) Reduced resolution: unlike an impulse, the mainlobe of samples); once each for 2 and 4 samples larger; and once each the window function has some finite width. Convolv- for2and4smaller.Thewindowsize(N, as shown in the plot ing this with the signal DTFT may make it impossible legend, is the number of samples) corresponding to the max- to resolve between two frequency components. imum smoothed DFT value is chosen and carried through to (2) Leakage: the component at one frequency leaks into the next scan as the nominal window size. To prevent N from that at another component due to the spectral smear- increasing or decreasing too rapidly during periods of low ing. signal-to-noise ratio, only one step up or down is permitted If the goal is to measure the DTFT of the signal, then ide- per scan. Also, to prevent unnecessary changes, the nomi- ally one would like a window with a DTFT of a thin main- nal N is changed only if the maximum exceeds that of the lobe and small sidelobes, but usually there is a tradeoff be- nominal by 5%. So in the example shown, where the high- tween the two. A rectangular window has a relatively nar- est DFT result occurs for N = 13 at sample #157, it was row mainlobe, while Hanning, and so forth, windows have within that threshold, so the nominal N carried forward was a wider mainlobe (worse), but have smaller sidelobes (bet- 17. ter). No matter the shape of the window, the mainlobe gets The “composite DFT magnitude” is formed by taking a narrower as the number of points in the DFT increases. For point-by-point maximum over the 5 smoothed DFT results. nonstationary signals as we have here, at some point you do This is the vector that is passed on for further processing. As not want to increase the window size because the frequency can be seen by the 5 individual curves here, the composite content is changing. In this case, the length and shape of the DFT magnitude is more representative of the fringe packet window are chosen so that the Fourier transform of the win- than any single smoothed DFT result would be. This is due dow is narrow in frequency compared with changes in the FT to the intra-packet variations in fringe frequency, caused pri- of the signal [25]. marily by atmospheric distortion. Since it uses 5 different However, in this application, the main concerns are not frequencies, this composite DFT magnitude does not have a with resolution or leakage since our target, the fringe fre- physical meaning, although it efficiently captures the essen- quency, is changing from scan to scan and within a scan. Re- tial information needed for this fringe tracking application. duced resolution actually helps insulate the result from these The use of 5 frequencies, versus some other number, depends variations. By adapting to these fringe frequency changes, the on the level of fringe frequency distortion present in the sig- DFT result is useful for fringe tracking. nal, and may be tuned accordingly. 2566 EURASIP Journal on Applied Signal Processing

] 0.7 1 · ]

· . 0 9 0.6 0.8 . 0.7 0 5 . 0 6 0.4 0.5 0.4 0.3 0.3 0.2 0.2 Composite DFT magnitude [ 0.1 0.1 0 Convolved composite DFT magnitude [ 0 0 50 100 150 200 250 0 50 100 150 200 250 Sample number [·] Sample number [·]

Composite DFT magnitude Template convolution result Template Identified center Identified center

Figure 8: Composite DFT magnitude prior to convolution with a Figure 9: Composite DFT magnitude after convolution with a rect- rectangular packet-finding template. angular packet-finding window.

2.5. Convolution with packet-finding template 0.7 Mean (DFT) = 0.54

Figure 8 illustrates how a template is convolved with the com- ] .

· 0 6 posite DFT magnitude. The template, shown here at packet center, is very simple, composed of ones and zeros, leading to 0.5 DFT ratio = center/min (left, right)−1 = . . , . − very efficient computation: additions and subtractions at the DFT ratio 0 54/ min (0 074 0 1) 1 0.4 DFT ratio = 6.2 template transitions only, as the template is passed over the composite DFT magnitude—as opposed to an arbitrary con- 0.3 Mean (DFT) = 0.101 volution, which would require multiplications across the full template, repeated when centered at each point in the scan 0.2 Mean (DFT) = 0.074 window. Composite DFT magnitude [ 0.1 The template is very loosely modeled after what the ideal DFT result should look like (abs(sinc)-like). The compu- 0 tation is efficient, since after the initial computation for 0 50 100 150 200 250 the first sample, each additional sample calculation involves Sample number [·] only one add and one subtract (no multiplies or divides)— Composite DFT magnitude corresponding to the two vertical edges on the template. Identified center The +1 region is chosen to correspond approximately to the width at half the composite DFT magnitude. The software Figure 10: Confidence metric calculation. implementation allows this width to be set easily, and an ex- tension to make it adaptive should be feasible, although per- which it should be used to update the scan-center position. In formance appears to be very robust to this number. For ex- cases where the fringe packet disappears momentarily, it may ample, a value of 20 for the half-width (meaning the template be better to do nothing (keep scanning in the same location) spans 41 samples) was used to successfully track simulated than to chase the noise. fringes at IOTA with the scan set to both 30 and 15 microns, This calculation is shown graphically in Figure 10.The even though the 15-micron scan had a fringe packet twice as concept is that the DFT calculation near the identified cen- wide as that of the 30-micron scan. ter should have a measurably higher value than the DFT cal- Figure 9 shows the result of convolution with the rect- culation on the background noise. The mean of the DFT in angular template. The index corresponding to the maximum windows spanning 20% of the scan width is calculated at the value of this result is used as the packet center estimate. The left edge of the scan, right edge, and at the identified fringe index corresponds to the point where the correlation be- packet center—as shown by the blue rectangles in the fig- tween the composite DFT magnitude and the template is ure. The ratio of the mean DFT at the identified center to maximized. the smaller of the two edge measurements (minus one) is taken as the confidence metric. The reason to take both edges 2.6. Confidence metric calculation and then use the minimum is that this will give a valid back- Once the fringe packet center has been identified, a decision ground measurement even if the fringe packet falls at one must be made as to the result’s level of validity and degree to edge of the scan. Adaptive DFT-Based Interferometer Fringe Tracking 2567

One approach to using this confidence metric would be The sequence in Figure 13 shows how the symmetry discrete: set a confidence threshold and either use or ig- checking step was applied. The top plot shows the output of nore the result based on that comparison. Setting the thresh- the convolution of the packet-finding template with the com- old value will depend on the level of tracking accuracy posite DFT magnitude. The middle plot shows the symmetry desired, the relative scan width, and other factors. It is a bal- calculation (smaller values indicate greater symmetry). This ance between the cost of accepting a wrong estimate and ig- metric calculates the mean absolute value of the difference noring a valid one—these costs vary depending on the ap- between the points on the left side of the window (sliding plication. Some additional complications with this method window of approximately the fringe packet half-width) and are raised when considering multiaperture interferometry, their reflected counterparts on the right side of the window. where changing the scan center on a single delay line affects The bottom plot combines the two, dividing the left result by two interferograms. the middle. The index corresponding to the maximum value of this result is used as the packet center estimate. This se- 2.7. Results on scan with lower SNR quence illustrates the sharpening possible with this symme- While the preceding figures have illustrated the algorithm’s try calculation. functional steps, it is also useful to see how the algorithm performs on data with a lower SNR, as is shown in Figure 11 2.9. Potential algorithm improvements (this scan is also shown in Figure 3). To reduce clutter, some There are some areas where known improvements in accu- titles and axis labels are omitted—see the preceding figures racy could be made at the cost of code complexity and devel- for clarification. The DFT analysis clearly and effectively de- opment effort. tects the fringe frequency in this noisy signal. The confidence (i) A priori expected position. Using the expected fringe metric calculation provides a meaningful comparison be- motion, and knowing whatever delay line motion has tween the detected fringe packet and the background noise. occurred, there will be an expected fringe center be- 2.8. Optional operations fore the scan is analyzed. For example, if zero motion is assumed, and the control drives exactly to the previ- Nonrectangular template ous center, then the expected fringe center would be in If the approximate width of the sinc function is known, a the middle of the scan. Some benefit should be gained template of such a shape could be used in place of the rect- from factoring this information in to the tracking al- angular template used in Section 2.5. A simpler version that gorithm, perhaps by multiplying a weighting function is far more computationally efficient than a sinc-shaped tem- (perhaps exponentially decaying away from the center, plate, and only slightly less efficient than the rectangular tem- with the decay rate based on the observed volatility of plate, would have 3 rectangular regions as shown in Fig- the motion and the confidence in the prior estimate) ure 12. This was used as part of the implementation in Febru- by the composite DFT magnitude. This would hope- ary 2002, but was later changed (to the rectangular) since fully generally have little additional effect, but when distortion due to edge effects was found to occur more fre- faced with a choice between two weak peaks, would quently than expected due to the narrower scan width. How- choose the one closer to the expected location. ever, it should be considered a viable option. (ii) Prediction. Add prediction to the a priori estimate In summary, switching to such a nonrectangular tem- above, as well as to the fringe tracker output, that is, plate would produce a negligible increase in computation present outputs are the identified position of the cur- time (both are extremely efficient), provide slightly better ID rent scan and the associated confidence. What are re- accuracy for fringe packets away from the ends of the scan, ally wanted are the expected position of the next in- but possibly significantly worse ID performance for fringe terferogram and the confidence of that expectation. packets at or near the edge of the scan window. Although the improvement is expected to be slight, the linear proof of concept [4] shows measurable pre- Symmetry weighting dictability of the fringe motion, even with a 3 Hz The interferogram should ideally be symmetric about its cen- scan frequency (would be more predictable with faster ter. The algorithm originally contained a step that would cal- scanning). The linear tests also indicate that the jump culate the symmetry and weight the DFT (or envelope) result variance may be somewhat predictable—this could be accordingly. Unlike virtually all other signal processing steps, incorporated into the exponential weighting function. the symmetry calculation could not be implemented with ex- (iii) Check jump size. A simple check could be added to de- ceptional efficiency, and ended up taking about double the tect a large jump with low confidence—in which case, processing time of all other operations combined. Problems the search space could be narrowed to a smaller region were also encountered with edge effects; distortion would re- about the expected center. This is a simpler version of sult when trying to calculate the symmetry when the fringe the previous two algorithm improvements, and may was partially off the scan window. The symmetry calculation not be relevant if those are implemented. was useful for gaining slightly more accuracy when the fringe (iv) Edge effect handling. These are not handled very accu- was centered, but the combined issues of compute time and rately in the present algorithm, since it is considered edge distortion led to its removal. that they are less important (i.e., when scanning near 2568 EURASIP Journal on Applied Signal Processing ]

· 1

0.5

0

−0.5

Normalized intensity [ −1 0 50 100 150 200 250 Sample number [·]

Raw scan data Identified center (a) ] · ] · 1 0.4

0 0.2

−1 0 Normalized intensity [ 0 50 100 150 200 0 50 100 150 200 250

Sample number [·] Smoothed DFT magnitude [ Sample number [·] (b) (c) ] ] · · 0.4 1

0.2 0.5

0 0 050100 150 200 250 0 50 100 150 200 250

Smoothed DFT magnitude [ Sample number [·] Composite DFT magnitude [ Sample number [·] (d) (e)

0.25 0.35 ] · 0.3

. ]

0 2 · 0.25

0.15 = − 0.2 DFT ratio center/min (left, right) 1 DFT ratio = 0.54/ min (0.074, 0.1) − 1 DFT ratio = 6.2 0.15 0.1

0.1

0.05 Composite DFT magnitude [ 0.05 Convolved composite DFT magnitude [

0 0 0 50 100 150 200 250 0 50 100 150 200 250 Sample number [·] Sample number [·]

Template convolution result Composite DFT magnitude [·] Identified center Identified center (f) (g)

Figure 11: Result for scans with lower SNR ((a) scan no. 14, BC fringe, 20 April 2004; (b) DFT sliding window; (c) DFT result, 17-point window; (d) DFT results; (e) composite DFT results, template; (f) convolutions of template with composite DFT; and (g) confidence metric). Adaptive DFT-Based Interferometer Fringe Tracking 2569

] 0.7

1 · 0.8 ] 0.6 · 0.6 . 0.4 0 5 0.2 0.4 0 0.3 −0.2 −0.4 0.2 −0.6 Composite DFT magnitude [ 0.1 −0.8 Convolved composite DFT magnitude [ − 0 1 0 50 100 150 200 250 0 50 100 150 200 250 Sample number [·] Sample number [·] Composite DFT magnitude Template convolution result Template Identified center Identified center (a)

Figure 12: Nonrectangular template. 1

0.8 the edge, it is important to take a good jump towards ] the middle, but hitting it exactly is not important). · . (v) Adapt nominal interferogram width.Theexpected 0 6 width of a fringe packet is presently a fixed parame- ter, and the algorithm is fairly robust to adjustments to 0.4 this setting. However, if adapted, the algorithm could be even more robust. If the full ABCDE identification Symmetry factor [ 0.2 is performed, the B (spread of sinc function) could be used to adaptively update this. 0 (vi) Startup. Since the algorithm is adaptive (the size of the 0 50 100 150 200 250 DFT window is adapted from scan to scan) and the Sample number [·] amount of adaptation permitted in a single scan is pur- posely limited, the performance on the first couple of Symmetry factor scans in a new data set is sometimes not as good as Identified center it could be. If needed, allowing some extra adaptation (b) steps on the first scan of a new data set could address this. 60 ]

· 50

3. IMPLEMENTATION AT IOTA 40 Since the tracker needs to run on a real-time processor (Vx- Works operating system on a Motorola PowerPC 604 pro- 30 cessor on an MVME-2431 card), after the initial develop- ment and prototyping in Matlab, the algorithm was con- 20 verted (manually) to ANSI C. As the algorithm evolved dur- ing this implementation and testing process, the Matlab and Symmetry-weighted result [ 10 C versions were continually updated to maintain the same 0 variable names, function names, and structure to the extent 0 50 100 150 200 250 possible. The two versions produce results that are identical Sample number [·] when compared to the limit of floating point precision. Symmetry-weighted result 3.1. Testing Identified center Initial testing was performed during February 2002 on the (c) IOTA system, tracking fringes generated by a light source. Tracking performance was very good, even with temporary Figure 13: Symmetry-weighted calculation. 2570 EURASIP Journal on Applied Signal Processing

Table 1: Computation-time summary for IOTA implementation. Time(%) Time (ms) Algorithm step(s) 75% 0.50 4 extra DFT calculations for window size adaptation 19% 0.13 Required DFT calculation 6% 0.04 Everything else (template, confidence, etc.) 100% 0.67 Total time per interferogram per scan loss of fringe data (e.g., caused by banging the table)—in IOTA configuration as of February 2002, and actual on-the- these cases, the system correctly decided that confidence was sky fringes from 2002 and 2004. The adaptive nature of the low and did not try to track until the fringe packet reformed. DFT-based algorithm virtually eliminates the need to set any Also, the system performed very well with the scan travel set target-dependent parameters, and provides robust, accurate at both 15 and 30 microns, and with no manual adjustment tracking in the presence of significant atmospheric distor- of parameters. The fringe packet appears twice as wide for the tion. 15-micron scan, further indicating the robustness of the al- Initial online implementation of this algorithm at IOTA gorithm. Unfortunately, due to poor weather conditions, we was completed in May 2002, using all three telescopes. The were unable to test it on the sky. efficient algorithm design resulted in a compute time for all three interferograms of 2.0 milliseconds, when implemented 3.2. Speed in ANSI C on the PowerPC 266 MHz real-time processor. The computation-time results, summarized in Table 1,were Fringe tracking was considered successful, and compared fa- obtained by running the function for 10 000 or 100 000 times vorably with alternate fringe tracking approaches in a series and measuring the total elapsed time, both on the real-time of online experiments in May–June 2004 at IOTA. processor and on a PC used for testing. The algorithm gener- The fringe-tracking algorithm described here may have ated an ideal fringe packet (time to compute this is included) use in fields outside astronomical interferometry, for exam- and then identified the center. Running the full algorithm ple, in the area of thin-film ellipsometry where a white light presented in Section 2 took 0.67 millisecond per cycle on the source and Fourier transform spectrometer can be used to PowerPC. Including all three interferograms for each scan, measure interference fringes formed by reflection from thin the total compute time is 2.0 milliseconds. As noted previ- films and substrates. ously, there are aspects of the algorithm that could have been further optimized (e.g., pre-calculation of the sinusoidal vec- ACKNOWLEDGMENTS tors) if this compute time had been excessive. As imple- The algorithm development work presented here was funded ffi mented, the speed of the algorithm was su ciently fast that through NASA Ames Research Center Director’s Discre- compute time was negligible as compared to the scanning pe- tionary Fund Awards. The authors wish to thank the staff and riod [22]. other researchers at IOTA for their invaluable contributions to the research facility. The IOTA is operated by the Smithso- 3.3. Performance nian Astrophysical Observatory, a member of the Harvard- The performance of the algorithm was tested extensively on Smithsonian Center for Astrophysics. The authors also wish offline data, although with the limitation that the true fringe to thank Dr. Bradley J. Betts, Dr. Jeff Scargle, and the anony- packet center is not known. It was then tested briefly in an mous reviewers for their careful reading and comments on online implementation, producing good stable fringe track- the paper. ing. Ettore Pedretti, in an attempt to quantitatively evalu- ate the online performance of this and two other algorithms REFERENCES [2, 23], developed an experimental procedure to do so, as de- [1] R. Millan-Gabet, Investigation of Herbig Ae/Be stars in the scribedin[22]. The basic result was that this adaptive DFT- near-infrared with a long baseline interferometer, Ph.D. thesis, based algorithm and one developed by Pedretti both were University of Massachusetts, Amherst, Mass, USA, 1999. found to perform well on both moderate and low photon flux [2] E. Pedretti, Syst`ems d’Imagerie Interf´erom´etriques (Imaging In- targets, as measured by the closure-phase error. The third al- terferometric Systems), Ph.D. thesis, UniversitedeProvence,´ Aix-Marseille I, Observatoire de Haute-Provence, France, gorithm compared did not perform well in the low photon 2003. flux case, but worked well on moderately faint stars. Details [3] E. Wilson and R. W. Mah, “Online fringe tracking and pre- on the testing are presented in [22]. diction at IOTA,” in Proc. 18th Congress of the International Commission for Optics, vol. 3749 of Proceedings of SPIE,pp. 691–692, San Francisco, Calif, USA, August 1999. 4. CONCLUSIONS [4]E.Wilson,E.Pedretti,J.Bregman,R.W.Mah,andW.A. Traub, “Adaptive DFT-based fringe tracking and prediction at An algorithm to perform online interferogram center iden- IOTA,” in Astronomical Telescopes and Instrumentation Sym- tification has been developed, implemented, and tested at posium: New Frontiers in Stellar Interferometry, vol. 5491 of IOTA. It works very well on the data sets tested so far, in- Proceedings of SPIE, pp. 1507–1518, Glasgow, Scotland, UK, cluding 1997 data, data generated by a light source using the June 2004. Adaptive DFT-Based Interferometer Fringe Tracking 2571

[5] W. A. Traub, N. P. Carleton, J. Bregman, et al., “The third tele- Instrumentation Symposium: New Frontiers in Stellar Interfer- scope project at the IOTA interferometer,” in Interferometry ometry, vol. 5491 of Proceedings of SPIE, pp. 540–550, Glas- in Optical Astronomy, vol. 4006 of Proc. SPIE, pp. 715–722, gow, Scotland, UK, June 2004. March 2000. [23] N. D. Thureau, R. C. Boysen, D. F. Buscher, et al., “Fringe [6] W. A. Traub, A. Ahearn, N. P. Carleton, et al., “New beam- envelope tracking at COAST,” in Interferometry for Optical combination techniques at IOTA,” in Interferometry for Opti- Astronomy II, vol. 4838 of Proceedings of SPIE, pp. 956–963, cal Astronomy II, vol. 4838 of Proceedings of SPIE, pp. 45–52, Waikoloa, Hawaii, USA, August 2003. Waikoloa, Hawaii, USA, August 2003. [24] M. Frigo and S. G. Johnson, “FFTW: an adaptive software [7]I.L.Porro,W.A.Traub,andN.P.Carleton,“Effect of tele- architecture for the FFT,” in Proc. IEEE Int. Conf. Acoustics, scope alignment on a stellar interferometer,” Applied Optics, Speech, Signal Processing (ICASSP ’98), vol. 3, pp. 1381–1384, vol. 38, no. 28, pp. 6055–6067, 1999. Seattle, Wash, USA, May 1998. [8] N. D. Thureau, Contribution a l’interferometrie optique a [25] A. V. Oppenheim and R. Schafer, Discrete-Time Signal Process- longue base en mode multi-tavelures, Ph.D. thesis, Faculte des ing, Prentice-Hall, Englewood Cliffs, NJ, USA, 1989. sciences, Universite de Nice-Sophia Antipolis, France, 2001. [9] R. C. Jennison, “A phase sensitive interferometer technique for the measurement of the Fourier transforms of spatial bright- Edward Wilson is the President of Intel- ness distributions of small angular extent,” Monthly Notices of lization, a consulting business applying and the Royal Astronomical Society, vol. 118, no. 3, pp. 276–284, extending intelligent systems technologies 1958. for optimization in the aerospace and met- [10] J. E. Baldwin, M. G. Beckett, R. C. Boysen, et al., “The first als industries since 1995. He attended MIT images from an optical aperture synthesis array: mapping from 1983 to 1987, receiving S.B. degrees in of Capella with COAST at two epochs,” Astronomy & Astro- mechanical engineering and physics and an physics, vol. 306, pp. L13–L16, February 1996. S.M. degree in mechanical engineering. He [11] G. T. van Belle, D. R. Ciardi, R. R. Thompson, R. L. Akeson, received his Ph.D. degree in mechanical en- and E. A. Lada, “Altair’s oblateness and rotation velocity from gineering from Stanford University in 1995, long-baseline interferometry,” Astrophysical Journal, vol. 559, no. 2, part 1, pp. 1155–1164, 2001. conducting his doctoral research in the Aerospace Robotics Labo- [12] M. Swain, G. Vasisht, R. Akeson, et al., “Interferometer ob- ratory, and receiving a Ph.D. minor in electrical engineering. Dr. servations of subparsec-scale infrared emission in the nucleus Wilson has worked at Hughes Aircraft Company; at the US Air of NGC 4151,” Astrophysical Journal, vol. 596, no. 2, part 2, Force Advanced Electronics Technology Center; as a Professor on pp. L163–L166, 2003. a US Navy aircraft carrier (CV-64); as Research Director at Neural [13] A. A. Michelson and F. G. Pease, “Measurement of the diam- Applications Corporation; and as a Visiting Scholar and Lecturer eter of alpha Orionis with the interferometer,” Astrophysical in the Stanford Aero-Astro Department, teaching a course on the Journal, vol. 53, pp. 249–259, May 1921. modeling and analysis of dynamical systems. Areas of interest and [14] A. Labeyrie, “Interference fringes obtained on VEGA with expertise include fault detection and isolation, process optimiza- two optical telescopes,” Astrophysical Journal, vol. 196, part 2, tion, identification, estimation, signal processing, and other appli- pp. L71–L75, March 1975. cations of advanced data analysis technologies. [15] F. Vakili and L. Koechlin, “Aperture synthesis in space— Computer fringe blocking,” in New Technologies for Astron- Ettore Pedretti earned a Ph.D. degree in omy, vol. 1130 of Proceedings of SPIE, Paris, France, April physics from the Universite de Provence 1989, (A90-37976 16-89). in 2003. Part of his Ph.D. work was done [16] Y. Rabbia, S. Menardi, J. Gay, et al., “Prototype for the Euro- in France on the topic of pupil densifica- pean Southern Observatory VLTI fringe sensor,” in Amplitude tion, and part was done at the Harvard- and Intensity Spatial Interferometry II, J. B. Breckinridge, Ed., Smithsonian Center for Astrophysics on in- vol. 2200 of Proceedings of SPIE, pp. 204–215, Kona, Hawaii, strumentation and observations at the CfA’s USA, March 1994. Infrared Optical Telescope Array in Ari- [17] S. Robbe, B. Sorrente, F. Cassaing, et al., “Active phase sta- zona. He is currently a NASA Michelson bilization at the I2T: implementation of the ASSI table,” in Postdoctoral Fellow in the Astronomy De- Amplitude and Intensity Spatial Interferometry II, J. B. Breckin- partment, the University of Michigan, Ann Arbor. His research in- ridge, Ed., vol. 2200 of Proceedings of SPIE, pp. 222–230, Kona, Hawaii, USA, March 1994. terests are in high angular resolution interferometry. He is actively [18] P. R. Lawson, “Group-delay tracking in optical stellar interfer- involved in instrument development at the IOTA and CHARA in- ometry with the fast Fourier transform,” Journal of the Optical terferometers. { } Society of America A , vol. 12, no. 2, pp. 366–374, 1995. Jesse Bregman is the Deputy Chief of [19] L. Koechlin, P. R. Lawson, D. Mourard, et al., “Dispersed the Astrophysics Branch, NASA Ames Re- fringe tracking with the multi-ro apertures of the Grand In- search Center. He received his B.S. degree terferometre a 2 Telescopes,” Applied Optics, vol. 35, no. 16, pp. 3002–3009, 1996. in physics from SUNY, Stony Brook, in [20] D. F. Buscher, Getting the most out of COAST, Ph.D. thesis, 1971, and his Ph.D. degree in astronomy Cambridge University, Cambridge, UK, 1988. from the University of California, Santa [21] S. Morel, W. A. Traub, J. Bregman, et al., “Fringe-tracking ex- Cruz, in 1976. He has been at NASA Ames periments at the IOTA interferometer,” in Interferometry in since 1976, concentrating on infrared spec- Optical Astronomy, vol. 4006 of Proceedings of SPIE, pp. 506– troscopy and imaging. He has helped de- 513, Munich, Germany, March 2000. velopseveralinfraredspectrometersand [22] E. Pedretti, N. D. Thureau, E. Wilson, et al., “Fringe tracking imagers that were used on ground-based and airborne telescopes. at the IOTA interferometer,” in Astronomical Telescopes and Research interests include determining the molecular content of 2572 EURASIP Journal on Applied Signal Processing stars, comets, planetary nebulae, and supernovae. His main re- search thrust has been and continues to be understanding the na- ture and physics of the large interstellar molecules known as poly- cyclic aromatic hydrocarbons, of which benzene and naphthalene are the smallest members. Robert W. Mah (a Ph.D., applied mechan- ics, Stanford University) has been the Group Lead in the Smart Systems Research Lab (SSRL), the Computational Sciences Divi- sion (code IC), NASA Ames Research Cen- ter since 1993. He has supervised devel- opment of a wide range of successful in- telligent system solutions including several aerospace and medical applications.

Wesley A. Traub earnedaPh.D.degreein physics at the University of Wisconsin in 1968, and has been on the staff of the Smith- sonian Astrophysical Observatory, a mem- ber of the Harvard-Smithsonian Center for Astrophysics, since that time. He has pub- lished many papers in the areas of high spectral resolution studies of the strato- sphere, planetary atmospheres, and the in- terstellar medium. He also has a strong in- terest in high angular resolution interferometry from the ground, using the IOTA interferometer and the Keck nulling interferometer, with publications on stellar diameters and dust disks around stars. In addition, he has published papers on detecting and character- izing extrasolar planets with coronagraphs and interferometers. In mid-2005, he will join the staff of NASA-JPL as Project Scientist for the Terrestrial Planet Finder Coronagraph. EURASIP Journal on Applied Signal Processing 2005:15, 2573–2584 c 2005 S. Zharkov et al.

Technique for Automated Recognition of Sunspots on Full-Disk Solar Images

S. Zharkov Department of Cybernetics, University of Bradford, Bradford, West Yorkshire BD7 1DP, UK Email: [email protected]

V. Zharkova Department of Cybernetics, University of Bradford, Bradford, West Yorkshire BD7 1DP, UK Email: [email protected]

S. Ipson Department of Cybernetics, University of Bradford, Bradford, West Yorkshire BD7 1DP, UK Email: [email protected]

A. Benkhalil Department of Cybernetics, University of Bradford, Bradford, West Yorkshire BD7 1DP, UK Email: [email protected]

Received 31 May 2004; Revised 22 February 2005

A new robust technique is presented for automated identification of sunspots on full-disk white-light (WL) solar images obtained from SOHO/MDI instrument and Ca II K1 line images from the Meudon Observatory. Edge-detection methods are applied to find sunspot candidates followed by local thresholding using statistical properties of the region around sunspots. Possible initial oversegmentation of images is remedied with a median filter. The features are smoothed by using morphological closing operations and filled by applying watershed, followed by dilation operator to define regions of interest containing sunspots. A number of physical and geometrical parameters of detected sunspot features are extracted and stored in a relational database along with umbra-penumbra information in the form of pixel run-length data within a bounding rectangle. The detection results reveal very good agreement with the manual synoptic maps and a very high correlation (96%) with those produced manually by NOAA Observatory, USA.

Keywords and phrases: digital solar image, sunspots, local threshold, edge-detection, morphological operators, sunspot area time series.

1. INTRODUCTION Manual sunspot catalogues in different formats are pro- duced at various locations all over the world such as the Sunspot identification and characterisation including loca- Meudon Observatory, France, the Locarno Solar Observa- tion, lifetime, contrast, and so forth, are required for a quan- tory, Switzerland, the Mount Wilson Observatory, USA and titative study of the solar cycle. Sunspot studies also play an many others. The Zurich relative sunspot numbers (or since essential part in the modelling of the total solar irradiance 1981 sunspot index data (SIDC)), compiled from these man- during the solar cycle. As a component of solar active regions, ual catalogues, are used as a primary indicator of solar activ- sunspots and their behaviour are also used in the study of ac- ity (Hoyt and Schatten [2, 3] and Temmer et al. [4]). tive region evolution and in the forecast of solar flare activity With the substantial increase in the size of solar image (Steinegger et al. [1]). data archives, the automated detection and verification of various features of interest is becoming increasingly impor- This is an open access article distributed under the Creative Commons tant for, among other applications, data mining and the reli- Attribution License, which permits unrestricted use, distribution, and able forecast of solar activity and space weather. This imposes reproduction in any medium, provided the original work is properly cited. stringent requirements on the accuracy of the automate 2574 EURASIP Journal on Applied Signal Processing

cessive brightness bins of the histogram [10], was applied to determine the umbral areas of sunspots observed in high- resolution images capturing a fragment of the solar disk [10, 11]. A method using sunspot contrast and contiguity, and based on a region growing technique was developed and described by Preminger et al. in [12]. There is also a Bayesian technique for active region and sunspot detection and labelling developed by Turmon et al. [13] that is rather computationally expensive. Moreover, the Figure 1: A segment of a high-resolution SOHO/MDI image show- methodismoreorientedtowardfaculaedetectionanddoes ing sunspots with dark umbras and lighter penumbras on a gray not detect sunspot umbras. Although the algorithm per- quiet Sun background. forms well when trained appropriately, the training process itself can be rather difficulttoarrangeonimageswithdiffer- ent background variations corresponding to varying observ- feature detection and verification procedures in comparison ing conditions. with the existing manual ones in order to create a fully auto- Another approach to sunspot area measurements utilis- mated solar feature catalogue. ing edge-detection and boundary gradient intensity was sug- A sunspot is a dark cooler part of the Sun’s surface. It is gested for high-resolution observations of individual sunspot cooler than the surrounding atmosphere because of the pres- groups, and/or non-full-disk segments by Gyori˝ [14]. The ence of a strong magnetic field that inhibits the transport of method is very accurate when applied to data with suffi- heat via convective motion in the Sun. The magnetic field ciently high resolution. However, in its current form, this is formed below the Sun’s surface, and extends out into the method is not suitable for the automated sunspot detection solar corona. Sunspots are best observed in the visible con- on full-disk images of the low and moderate resolutions that tinuous spectrum also known as “white light” (WL). Larger are available in most archives. Therefore, all the existing tech- sunspots can also be observed in Ca II K1 absorption line niques described above in their original form are not suitable images as well as in Hα and Ca II K3 absorption line im- for the automatic detection and identification of sunspots on ages. Sunspots generally consist of two parts: a darker, often full-disk images, since their performance depends on the im- circular central umbra, and a lighter outer penumbra (see ages with high resolution [14] and/or quality [12] that can- Figure 1). not be guaranteed for full-disk images. In white-light and Ca II K1 line digital images sunspots In the current paper a new hybrid technique for auto- can be characterised by the following two properties: they matic identification of sunspots on full-disk images using are considerably darker than the surrounding photosphere edge-detection is proposed, which is significantly improved and they have well-defined borders, that is, the change in in- by using image standardisation and enhancement proce- tensity from the quiet photosphere near the sunspot to the dures. The techniques presented are used for the detection sunspot itself occurs over a distance no more than 2-4 arc- of sunspots on white-light and Ca II K1 line full-disk im- seconds (or 1-2 pixels for the data used in this study). Most ages, extracting sunspot sizes, locations, umbra and penum- existing techniques for sunspot detection rely on these prop- bra areas and intensities with high accuracy restricted only by erties by using thresholding and/or edge-detection opera- pixel resolution. The techniques can provide fast automated tors. data processing online from ground-based and space-based The first thresholding methods for the extraction of instruments. The techniques applied for image preprocessing sunspot areas used an a priori estimated intensity thresh- and sunspot detection on white-light and Ca II KI images are old on white-light full-disk solar images [5, 6, 7]. Sunspots described in Section 2, the verification of detected features were defined as features with intensity 15% [6]or8.5% is presented in Section 3, and the conclusions are drawn in [7, 8] below the quiet Sun background and sunspot areas Section 4. were estimated by simply counting all pixels below these val- ues. Similar methods were applied to high-resolution im- ages of solar disk regions containing a sunspot or a group of 2. THE TECHNIQUE FOR SUNSPOT DETECTION sunspots using constant intensity boundaries for the umbra- 2.1. Observations and preprocessing techniques penumbra and the penumbra-photosphere transitions at 59% and 85% of the photospheric intensity, respectively 2.1.1. Observations and their synchronisation [8, 9]. The following two sets of solar full-disk images, provided For digital solar images the thresholding methods were in the flexible image transport system (FITS) file format improved by using image histograms to help determine the (http://fits.gsfc.nasa.gov/), were used for this study: the first threshold levels. Steinegger et al. [1] used the so-called dif- was supplied by the Meudon Observatory, and the second ference histogram method to determine the intensity bound- was obtained from the MDI instrument aboard the SOHO ary between the penumbra and the photosphere that was de- satellite. Both sets cover the time period spanning April 1– fined for each individual spot. Another method, based on 30, 2002, and July 1–31, 2002, while the SOHO/MDI data the cumulative sum of sunspot areas contained in the suc- were processed for the 8-year period from 1996–2003. Automated Recognition of Sunspots 2575

The Ca II K1 line spectroheliograms from Meudon pro- vide images of the solar photosphere in the blue wing of the Ca II K 3934 A˚ line, or K1 line taken at a given time TCa of solar rotation. These data are acquired once a day on film by performing scans of the solar disk using an entrance slit. The film image is then digitised, providing a pixel size of about 2.3 arcseconds. The other set of data used, from the SOHO/MDI instrument, provides almost continuous observations of the Sun in the white-light continuum in the vicinity of the Ni I Figure 2: A sample magnetogram fragment from SOHO/MDI. The 6768 A˚ line with a pixel size of about 2 arcseconds taken at darkest areas are regions of negative magnetic polarity (directed to- the time Twl. Intensities of all pixels outside the solar disk in wards the centre of the Sun) and the white areas are regions of pos- the SOHO/MDI WL images are set to zero. itive magnetic polarity (directed towards the observer). The gray These images were, in addition, complemented by mag- areas indicate regions of weak magnetic field. netic field measurements from the line-of-sight (LOS) mag- netograms captured by the same SOHO/MDI instrument at the moment TM, keeping the data consistent with the WL im- points fitted to a quadratic function by minimising the alge- ages. A magnetogram is an image obtained by an instrument, braic distance using singular value decomposition. The five which can detect the strength and location of the magnetic parameters of the ellipse-fitting the limb are extracted from fields from the Zeeman polarisation of the radiation in this the quadratic function. These parameters are used to define ffi field. In the magnetogram shown in Figure 2, gray areas indi- an a ne transformation that converts the image shape into cate low-magnetic-field regions, while black and white areas a circle. Transformed images are generated using bilinear in- indicate regions where there are strong negative and positive terpolation. magnetic fields, respectively. Often solar images require intensity renormalisation be- For the determination of the magnetic field inside de- cause of radial limb-darkening [18] caused by the radiation tected sunspot areas the white-light images and magne- projection from the spherical atmosphere onto a flat solar tograms were synchronised as follows. We rotate a solar image that increases the radiation’s optical depth towards the limb and results in pixel darkening. This is achieved by fit- magnetogram image to the time, Twl, and point of view corresponding to a WL image using standard IDL solar- ting a background function to a set of radial sample points soft libraries to allow pixel-by-pixel comparison of both im- having median radial intensities. The median filtering of the ages. Using sunspot detection results, defined on WL im- radial intensity starts by transformation of a standardised so- ages, as masks applied to corresponding synchronised mag- lar disk onto a rectangular image using a Cartesian-to-polar netograms allows us to extract pixel values from these areas coordinates transformation. The median value of each row in magnetic field units calibrated by the SOHO/MDI team is used to replace all the intensities in each row. The me- ff (Scherrer et al. [15]). dian transformation is a very e ective way of removing arte- facts often present in the images taken from ground-based observatories. However, the presence of nonradial illumi- 2.1.2. Preprocessing technique nation effects in an image such as stripes and lines caused The images from both data sets were preprocessed, with by dust present at the spectral slit may cause larger than FITS file header information checked and amended where sunspot length variations of the background intensity along necessary using the techniques described by Zharkova et al. each row of fixed radius and then the median of the row is [16]. These techniques include limb fitting; removal of ge- no longer an appropriate background estimate but would ometrical distortion; centre position and size standardisa- require a sophisticated segmentation procedure [16]. Such tion. a segmentation procedure is not implemented yet, so these The limb fitting method has three stages: (1) comput- images are automatically disregarded by the software if such ing an initial approximation of the disk centre and radius; nonradial variations are too severe. By removing the limb- (2) using edge-detection to provide candidate points for fit- darkening, one obtains a “flat,” sometimes called “contrast,” ting an ellipse using information from the initial estimate; image [12] of the solar photosphere using the procedure de- (3) fitting an ellipse to the candidate limb points using a least scribed in the first paragraph of Section 2.1.2 (see Zharkova squares approach to iteratively remove outlying points. The et al. [16]). procedure starts by making an initial estimate of the solar In order to compare the image quality in both data sets, centre and radius from image data thresholded at an inten- we have used three basic statistical moments of the digital im- sity obtained from an analysis of the image histogram, then age data values taken from the image headers (SOHO/MDI) smoothes the result by using Gaussian smoothing kernel of [15] or generated from images directly (Meudon). These in- size 5 × 5 that is recommended by the MDI team (Scherrer et clude mean (formula (1)), variance (formula (2), not plot- al. [15]) as the first stage of applying Canny edge-detection ted here), skewness (the lack of symmetry of pixel values to- routine (Canny [17]) to the original 12-bit data. Candidate wards the central pixel, formula (3)) and kurtosis (a measure edge points for the limb are selected using a radial histogram of whether the data are peaked or flat relative to a normal dis- method based on the initial centre estimate and the chosen tribution towards the central pixel, formula (4)) which were 2576 EURASIP Journal on Applied Signal Processing calculated for the full-disk pixel data xj ,(j = 1, N) as follows: ×104 1.2 0.8 N− 1 1 0.4 mean = x¯ = xj , 0 N / / / / / / / / / / j=0 14 02 02 06 03 02 26 03 02 15 04 02 05 05 02 N−1 (a) 1  2 variance = xj − x¯ , N − 1 j=0 1 (1) N−1   −1  xj − x¯ 3 = 1 √ −3 skewness N , 14/02/02 06/03/02 26/03/02 15/04/02 05/05/02 j=0 variance N−1   (b)  xj − x¯ 4 = 1 √ − . kurtosis N 3 j=0 variance 30 20 10 0 The results of the comparison between the Ca II K1 and / / / / / / / / / / MDI WL data sets for the period February–May 2002 are 14 02 02 06 03 02 26 03 02 15 04 02 05 05 02 presented in Figure 3, where (a), (b), and (c) are from the (c) MDI data and (d), (e), and (f) are deduced from the Meudon data. Figures 3a and 3d present the mean, Figures 3b and 3e 4000 present the skewness and Figures 3c and 3f present the kur- 2000 tosis calculated for every daily image. 0 / / / / / / / / / / The general quality of Meudon images is highly depen- 14 02 02 06 03 02 26 03 02 15 04 02 05 05 02 dent on atmospheric conditions at the time of the observa- (d) tion. A number of instrumental artefacts which are difficult to eliminate, such as dust lines, are often present in these im- −4 ages, thus making image unsuitable for automated detection. −8 Together, atmospheric conditions and instrumental artefacts −12 14/02/02 06/03/02 26/03/02 15/04/02 05/05/02 produce the variations shown in Figures 3d, 3e,and3f.The SOHOsatellitedataisnotsubjecttocloudsasisdemon- (e) strated in Figures 3a, 3b,and3c, though there are dropouts from time to time due to spacecraft problems. Hence, the 120 80 preprocessing for SOHO images consisted of limb-darkening 40 removal only, while for Meudon images it included noise fil- 0 14/02/02 06/03/02 26/03/02 15/04/02 05/05/02 tering with median and/or Gaussian filters. Hence, the pre- processed images containing quiet Sun pixels with darker (f) and, possibly (for the images in Ca II K1 line) brighter fea- Figure 3: Full-disk data statistics presented for the SOHO/MDI tures superimposed, which are suitable for sunspot detec- continuum (a) mean, (b) skewness, and (c) kurtosis data sets and tion. for the Meudon Observatory Ca II k1 line (d) mean, (e) skewness, For a full-disk solar image free of the limb-darkening, and (f) kurtosis data sets covering February–May 2002. Peaks and the quiet Sun intensity value is established from an image dips in the SOHO/MDI plots correspond to defective data (images) histogram as the intensity with the highest pixel count (see, that were automatically rejected by our software (the x-axis refers e.g., Figures 4a and 4b). Thus, in a manner similar to [1], by tothedateofobservationandthey-axis to the arbitrary intensity analysing the histogram of the flat image, an average quiet units). Sun intensity, IQSun, can be determined.

2.2. Description of the technique Basic (binary) morphological operators such as dila- 2.2.1. Automatic detection on the SOHO/MDI tion, closing, and watershed [19, 20]areusedinourde- white-light images tection code. Binary morphological dilation, also known as Minkowski addition, is defined as The technique developed for the SOHO/MDI data relies on    the good quality of the images evident from Figures 3a, A ⊕ B = x :(B)x ∩ A =∅ = Ax,(2) 3b,and3c. This allows a number of parameters, includ- x∈B ing threshold values as percentages of the quiet Sun inten- sity, to be set constant for the whole data set. Since sunspots where A is the signal or image being operated on and B is are characterised by strong magnetic field, the synchronised called the “structuring element.” This equation simply means magnetogram data is then used for sunspot verification by that B is moved over A and the intersection of B reflected and checking the magnetic flux at the identified feature location. translated with A is found. Dilation using disk structuring Automated Recognition of Sunspots 2577

8000 2000

6000 1500

4000 1000 Counts Counts

2000 500

0 0 0 200 400 600 800 1000 05× 103 1 × 104 1.5 × 104 Intensity Intensity

(a) (b) Figure 4: Flat image histograms for the (a) Meudon Observatory Ca II K1 line and (b) SOHO/MDI white-light images taken on April 2, 2002 and April 1, 2002, respectively. elements corresponds to isotropic swelling or expansion al- Sun’s rotation around its axis, a SOHO/MDI magnetogram, gorithms common to binary image processing. M, taken at the time TM, is synchronised to the continuum Binary morphological erosion, also known as Minkowski image time TWL via a spatial displacement of the pixels to the subtraction, is defined as position they had at the time TWL in order to obtain the same   point of view as those for the continuum. AΘB = x B ⊆ A =∅ = A . :( )x x (3) The technique presented in the current paper uses edge x∈B detection with threshold applied on the gradient image. This The equation simply means that erosion of A by B is the set of technique is significantly less sensitive to noise than the points x such that B translated by x is contained in A. When global threshold since it uses the background intensity in the the structuring element contains the origin, erosion can be vicinity of a sunspot. We consider sunspots as connected fea- seen as a shrinking of the original image. tures characterised by strong edges, lower than surrounding Morphological closing is defined as dilation followed by quiet Sun intensity, and strong magnetic field. Sunspot prop- erosion. Morphological closing is an idempotent operator. erties vary over the solar disk, so a two-stage procedure is Closing an image with a disk structuring element eliminates adopted. First, sunspot candidate regions are defined. Sec- small holes, fills gaps on the contours, and fuses narrow ond, these are analysed on the basis of their local properties breaks and long, thin gulfs. to determine sunspot umbra and penumbra regions. This is The morphological watershed operator segments images followed by verification using magnetic information. A de- into watershed regions and their boundaries. Considering tailed description of the procedure is provided in the pseu- the gray scale image as a surface, each local minimum can docode presented in Algorithm 1. be thought of as the point to which water falling on the sur- Sunspot candidate regions are determined by combining rounding region drains. The boundaries of the watersheds two approaches: edge-detection and low-intensity-region de- lie on the tops of the ridges. This operator labels each water- tection (steps 1–3). First, we obtain a gradient gray-level im- shed region with a unique index, and sets the boundaries to age, ∆p, from the original preprocessed image, ∆ (Figure 5a) zero. We apply the watershed operator provided in the IDL by applying Gaussian smoothing with a sliding window (5 × library by Research Systemic Inc. to binary image where it 5) followed by Sobel gradient operator (step 1). Then (step 2) floods enclosed boundaries and thus is used in a filling algo- we locate strong edges via iterative thresholding of the gradi- rithm. For a detailed discussion of mathematical morphol- ent image starting from initial threshold, T0, whose value is ogy see the references within the text and numerous books not critical but should be small. The threshold is applied fol- on digital imaging. lowed by 5 × 5 median filter and the number of connected The detection code is applied to a “flattened” full-disk components Nc and the ratio of the number of edge pixels to SOHO/MDI continuum image, ∆ (Figure 5a), with esti- the total number of disk pixels R are determined. If the ratio mated quiet Sun intensity, IQSun (Figure 4b), image size, solar is too large or the number of components is greater than 250, disk centre pixel coordinates, disk radius, date of observa- the threshold is incremented. The number 250 is based on tion, and resolution (in arcseconds per pixel). Because of the the available recorded maximum number of sunspots which 2578 EURASIP Journal on Applied Signal Processing

(a) (b)

(c) (d)

(e) (f)

Figure 5: A sample of the sunspot detection technique applied to the WL disk SOHO/MDI image from April 19, 2002 (presented for a cropped fragment for better resolution): (a) a part of the original image; (b) the detected edges; (c) candidate map; (d) the regions of interest after filtering as masks on original; (e) the detection results before magnetogram verification, false identification is circled; and (f) the final detection results. is around 170.1 Since at this stage of the detection we are deal- into a single candidate region, this limit is increased 250 to ing with noise and the possibility of several features joined ensure that no sunspots are excluded. The presence of noise and fine structures in the original flat image will contribute many low-gradient-value pixels resulting in just a few very 1The number varies depending on the observer. large connected regions, if the threshold is too close to zero. Automated Recognition of Sunspots 2579

complete and incomplete sunspot boundaries as well as noise × (1) Apply Gaussian smoothing with sliding window 5 5 and other features such as the solar limb. followed by the Sobel operator to a copy of ∆. (2) Using the initial threshold value, T0, threshold the edge Similarly, the original flat image is iteratively thresholded map and apply the median 5 × 5 filter to the result. to define dark regions (step 3). The resulting binary image Count the number of connected components, Nc,and contains fragments of sunspot regions and noise. The two the ratio of the number of edge pixels to the total binary images are combined using the logical OR operator number of disk pixels, R (feature candidates, Figures 5b (Figure 5c). The image will contain feature boundaries and and 6b). If Nc is greater than 250 or R is larger than 0.7, blobs corresponding to the areas of high gradient and/or low increase T0 by set value (1 or larger depending on Nc R intensity as well as the limb edge. After removing the limb and ) and repeat step 3 from the beginning. × (3) Similarly, iteratively threshold original flat image to boundary, a 7 7 morphological closure operator [19, 20]is define less than 100 dark regions. Combine (using OR applied to close incomplete boundaries. Closed boundaries operator) two binary images into one binary feature are then filled by applying a filling algorithm, based on the candidate map (Figure 5c). IDL watershed function, [21] to the binary image. A 7 × 7 (4) Remove the edge corresponding to the limb from dilation operator is then applied to define the regions of in- candidate map and fill the possible gaps in the feature terest which possibly contain sunspots (step 4, Figure 5d,re- outlines using IDL’s morphological closure and gions masked in dark gray). watershed operators (Figures 5d and 6c). In the second stage of detection, these regions are (5) Use blob colouring to define a region of interest, Fi,asa set of pixels representing a connected component on the uniquely labeled using blob colouring algorithm [22](step resulting binary image, B¯∆. 5) and individually analysed (steps 6-7, Figure 5e). Penum- (6) Create an empty sunspot candidate map, B∆,abyte bra and umbra boundaries are determined by thresholding mask which will contain the detection results with pixels at values Tu and Ts which are functions of the region’s sta- belonging to umbra marked as 2, penumbra as 1. tistical properties and quiet Sun intensity defined in step 7. (7) For every Fi extract a cropped image containing Fi and Practically, the formulae for determining Tu and Ts (step 7), define Ts and Tu: including the quiet Sun intensity coefficients, 0.91, 0.93, 0.6, F |≤ (i) if | i 5 pixels, assign the thresholds: for 0.55, are determined by applying the algorithm to a train- T = . I T = . I penumbra s 0 91 QSun; for umbra u 0 6 QSun, ing set of about 200 SOHO/MDI WL images. Since smaller (ii) if |Fi| > 5 pixels, assign the thresholds: for ∗ regions of interest (step 7(i)) carry less statistical intensity penumbra Ts = max{0.93IQSun;( Fi −0.5 ∆Fi)}; information, lower Ts value reduces the probability of false for umbra Tu = max{0.55IQSun;( Fi −∆Fi)},where Fi is a mean intensity and ∆Fi a standard deviation for identification. In order to apply this sunspot detection tech- Fi. nique to data sets from other instruments, the values of the (8) Threshold a cropped image at this value to define the constants appearing in the formulas for Tu and Ts should candidate umbral and penumbral pixels and insert the be determined for each data set. As mentioned in the in- results back into B∆ (Figures 5e and 6d). Use blob troduction, other authors [6, 7, 8, 9] apply global thresh- S colouring to define a candidate sunspot, i,asasetof olds at the different values of 0.85 or 0.925 for Ts and 0.59 pixels representing a connected component in B∆. for Tu. (9) To verify the detection results, cross-check B∆ with the synchronised magnetogram, M, as follows: for every In the final stage of the algorithm (steps 8-9), we verify sunspot candidate Si of B∆ extract the resulting sunspot candidates by determining the maxi- B S = M p | p ∈ S (i) max( i) max( ( ) i), mum magnetic field within the candidate region (Figure 5f). (ii) B (Si) = min(M(p) | p ∈ Si), min This information is extracted from synchronised magne- if max(abs(B (Si)), abs(B (Si))) < 100, then max min togram M, as described in step 9 when we can verify a disregard Si as noise. (10) For each Si extract and store the following parameters: sunspot candidate as a sunspot, if this magnetic field is higher gravity centre coordinates (Carrington and projective), than the magnetic field threshold. The latter is chosen to be area, diameter, umbra size, number of umbras detected, equal to 100 Gauss, that is appropriate for smallest sunspots, maximum-minimum-mean photometric intensity (as or pores [18], and is a factor 5 higher than the noise in mag- related to flattened image), maximum-minimum netic field measurements by the MDI instrument [15]. magnetic flux, total magnetic flux, and total umbral flux. The method works particularly well for larger features. It also detects a number of smaller features (under 5 pix- Algorithm 1: The pseudocode describing the sunspot detection al- els) for which there is often not enough information in gorithm in SOHO/MDI white-light images. the continuum image to make a decision whether the de- tected feature corresponds to a true feature or an artefact (Figure 5e). This detection is verified with great accuracy by Imposing an upper limit of 0.7ontheratioofnumberofedge using the magnetic field information, extracted from the syn- pixels to disk pixels excludes this situation. A lower value chronised magnetograms (Figure 5f). By comparing Figures increment in the iterative thresholding loop ensures better 5e and 5f we can see that the false detection (marked by a accuracy, at the cost of computational time. The increment circle) between the two top and bottom sunspot groups has value can also be set as a function of the intermediate values been remedied. Detected feature parameters are extracted of Nc and R. The resulting binary image (Figure 5b) contains and stored in the ASCII format in the final step 10. 2580 EURASIP Journal on Applied Signal Processing

(a) (b)

(c) (d)

(e) Figure 6: Sunspot detection on the Ca II K1 line full-disk image obtained from Meudon Observatory on April 1, 2002: (a) the original image cleaned; (b) the detected edges; (c) the regions found (dilated); (d) the final detection results superimposed on original image; and (e) the extract of a single sunspot group from (d).

2.2.2. Technique modifications for Ca II K1 images The examples of sunspot detection with this technique applied to a Ca II K1 image taken on 2/04/02 are pre- The technique described above can also be applied to Ca II sented in Figure 6. First, the full-disk solar image is pre- K1 images with the following modifications. First, these im- processed (Figure 6a) by correcting, if necessary, the shape ages contain substantial noise and distortions owing to in- of the disk to a circular one (via automated limb ellipse- strumental and atmospheric conditions, so their quality is fitting) and by removing the limb-darkening as described much lower than the SOHO/MDI white-light data (see Fig- in Section 2.1.2. Then Sobel edge-detection (similar results ures 3d, 3e,and3f). Hence, the threshold in step 7 (i.e., were also achieved with morphological gradient operation item 7 in the pseudocode) has to be set lower, that is, Ts = [19] defined as the result of subtracting an eroded version ∗ max{0.91IQSun;( Fi −0.25 ΞFi)} for penumbra and Tu = of the original image from a dilated version of the origi- max{0.55IQSun;( Fi −ΞFi)},forumbra,whereΞFi is the nal image) is applied to the preprocessed image, followed mean absolute deviation of the region of interest Fi. by thresholding in order to detect strong edges (Figure 6b). Automated Recognition of Sunspots 2581

This over segments the image; and then a 5 × 5 median fil- to be a true feature. In Table 1,case1,wehaveincludedfea- ter and an 8 × 8 morphological closing filter [19, 20]are tures with sizes over 5 pixels, mean intensities less than the applied to remove noise, to smooth the edges and to fill quiet Sun’s, mean absolute deviations exceeding 20 (which is in small holes in edges. After removing the limb edge, the about 5% of the quiet Sun intensity), principal component watershed transform [21] is applied to the thresholded bi- ratios less than 2.1. In case 2, we include practically all de- nary image in order to fill-in closed sunspot boundaries. tected candidate features by setting the deviation and princi- Regions of interest are defined, similar to the WL case, pal component ratio thresholds to 0.05. via morphological closure and dilation [19, 20]. Candidate The differences between the manual and automatic sunspot features are then detected by local thresholding us- methods are expressed by calculating the false acceptance rate ing threshold values in the previous paragraph. The can- (FAR) (where we detect a feature and they do not) and the didate features’ statistical properties such as size, principal false rejection rate (FRR) (where they detect a feature and we component coefficients and eigenvalues, intensity mean, and do not). FAR and FRR were calculated for the available ob- mean absolute deviation are then used to aid the removal servations for the two different classifier settings described in of false identifications such as the artefacts and lines, often the previous paragraph. The FAR is lowest for the classifier present in the Meudon Observatory images (Figures 6d and case 1 and does not exceed 8.8% of the total sunspot number 6e). detected on a day. By contrast, FRR is lowest for the classi- It can be seen that on the Ca II K1 image shown in fier case 2 and does not exceed 15.2% of the total sunspot Figure 6 the technique performs as well as on the white-light number. data (Figure 5). However, in many other Ca images the tech- The error rates in Table 1 are the consequences of sev- nique can still produce a relatively large number of false iden- eral factors related to the image quality. First, different seeing tifications for smaller features under 10 pixels where there is conditions can adversely influence automated recognition re- not enough statistical information to differentiate between sults; for example, a cloud can obstruct a significant portion the noise and sunspots. This raises the problem of verifica- of the disk, thus greatly reducing the quality of that segment tion of the detected features that is discussed below. of the image, making it difficult to detect the finer details of that part of the solar photosphere. Second, some small (less 3. VERIFICATION AND ACCURACY than 5–8 pixels) dust lines and image artefacts can be virtu- ally indistinguishable from smaller sunspots leading to false There are two possible means of verification. The first option identifications. assumes the existence of a tested well-established source of Also, in order to interpret the data presented in Table 1, the target data that is used for a straightforward compari- the following points have to be clarified. Sunspot counting son with the automated detection results. In our case, such methods are different for different observatories. For exam- data would be the records (sunspot drawings) produced by ple, a single large sunspot with one umbra is counted as a a trained observer. However, the number (and geometry) of single feature at Meudon, but can be counted as 3 or more visible/detectable sunspots depend on the time (date) of ob- sunspots (depending on the sunspot group configuration) at servation (sunspot lifetime can be less than an hour), loca- the Locarno Observatory. Similarly, there are differences be- tion of the observer, wavelength, and resolution (in case of tween the Meudon approach and our approach. For exam- digital imaging). Therefore, this method works best when ple, a large sunspot with several umbras is counted as one the input data for both detection methods is the same. Oth- feature by us, but can be counted as several features by the erwise,anumberofdifferences can appear naturally when Meudon observer. Furthermore, interpretation of sunspot comparing the two methods. data at Meudon is influenced by the knowledge of earlier The second option is comparing two different data sets de- data and can sometimes be revised in the light of the sub- scribing the same sunspots from images taken on the same sequent observations. Hence, for instance, on April 2, 2002 dates by different instruments, and extracting from each data there are 20 sunspots detected at Meudon Observatory. The set a carefully chosen invariant parameter (or set of parame- automated detection applied to the same image yielded 18 ters), such as sunspot area, and looking at its correlation. For sunspots corresponding to 17 of the Meudon sunspots with our technique, both verification methods were applied and one of the Meudon sunspots detected as two. Thus, in this the outcome is presented below. caseFARiszero,andFRRis3. Currently, our automated detection approach is based on 3.1. Verification with drawings and synoptic maps extracting all the available information from a single obser- The verification of the automated sunspot detection results vation and storing this information digitally. Further analysis started by comparison with the sunspot database produced and classification of the archive data is in progress that will al- manually at the Meudon Observatory and published as syn- low us to produce sunspot numbers identical to the existing optic maps in ASCII format. The comparison is shown in spot counting techniques. Table 1. The two cases presented in Table 1 correspond to two For the verification of sunspot detection on the ways of accepting/rejecting detected features. In general, by SOHO/MDI images, which have better quality (see Figures considering feature size, shape (i.e., principal components), 3a, 3b,and3c), less noise, and better time coverage (4 per mean intensity, variance, quiet sun intensity, and proximity day), we used the daily sunspot drawings produced man- to other features, one can decide whether the result is likely ually since 1965 at the Locarno Observatory, Switzerland. 2582 EURASIP Journal on Applied Signal Processing

Table 1: The accuracy of sunspot detection and classification for Ca II K1 line observations in comparison with the manual set obtained from Meudon Observatory (see the text for description of defined cases 1 and 2).

Number of spots Number of spots FAR FRR Number of spots FAR FRR Date (Meudon) (case 1) (case 1) (case 1) (case 2) (case 2) (case 2) 01-Apr-02 16 17 2 1 19 4 1 02-Apr-02 20 18 0 3 18 0 3 03-Apr-02 14 13 0 3 24 10 2 04-Apr-02 13 15 2 2 20 6 1 05-Apr-02 16 18 0 2 18 1 2 06-Apr-02 10 10 0 5 15 5 5 07-Apr-02 11 9 1 5 13 4 4 08-Apr-02 14 17 3 2 22 7 0 09-Apr-02 16 17 0 2 17 0 2 10-Apr-02 12 12 0 4 14 1 3 11-Apr-02 12 9 0 7 10 1 7 12-Apr-02 18 20 2 0 21 3 0 14-Apr-02 20 23 2 2 34 13 2 15-Apr-02 13 16 1 4 18 2 3 16-Apr-02 10 13 3 1 19 9 1 17-Apr-02 11 11 1 1 13 2 0 18-Apr-02 12 11 1 1 12 1 0 19-Apr-02 11 14 0 0 15 1 0 20-Apr-02 13 10 0 2 11 1 2 21-Apr-02 9 8 1 1 15 7 0 22-Apr-02 12 13 1 0 15 3 0 23-Apr-02 14 13 0 1 15 1 0 24-Apr-02 18 15 0 3 17 0 1 25-Apr-02 17 13 0 3 17 2 1 27-Apr-02 9 7 0 1 9 2 1 28-Apr-02 9 10 1 0 11 2 0 29-Apr-02 8 12 5 0 20 13 0

The Locarno manual drawings are produced in accordance presented in Figure 7b with those available as ASCII files ob- with the technique developed by Waldmeier [23] and the tained from the drawings of about 365 daily images obtained results are stored in the solar index data catalogue (SIDC) in 2003 at the US Air Force/NOAA (taken from National at the Royal Belgian Observatory, Brussels [24]. While the Observatory for Astronomy and Astrophysics National Geo- Locarno observations are subject to seeing conditions, this physical Data Centre, US [25]), revealed a correlation coeffi- is counterbalanced by the higher resolution of live obser- cient of 96% (Figure 7a). This is a very high accuracy of de- vations (about 1 arcsecond). We have compared the re- tection that ensures a good quality of extracted parameters sults of our automated detection in white-light images with within the resolution limits defined by a particular instru- the available drawings in Locarno for June–July 2002, as ment. well as for January–February 2004 along with a number Further verification of sunspot detection in WL images of random dates in 2002 and 2003 (about 100 daily draw- can be obtained by comparing sunspot area statistics with so- ings with up to 100 sunspots per drawing). The compari- lar activity index such as sunspot numbers. Sunspot numbers son has shown a good agreement (∼ 95%–98%). The auto- are generated manually from sunspot drawings [24]andare mated method detects all sunspots visually observable in the related to the number of sunspot groups. The first attempt SOHO/MDI WL observations. The discrepancies are found to compare the sunspot area statistics detected by us with the at the level of sunspot pores (smaller structures), and can sunspot numbers revealed a correlation of up to 86% (see be explained by the time difference between the observa- Zharkov and Zharkova [26]). For accurate comparison clas- tions and by the lower resolution of the SOHO/MDI im- sification of sunspots into groups is required. Manually this ages. is done using sunspot magnetic field polarity tags and the property that neighbouring sunspots with opposite magnetic 3.2. Verification with the NOAA data set field polarities are paired into groups. Implementation of the Comparison of temporal variations of the daily sunspot ar- automated classification of sunspots into groups is the scope eas extracted from the EGSO solar feature catalogue in 2003 of a future paper. Automated Recognition of Sunspots 2583

Anumberofphysicalandgeometricalparametersof ×104 sunspot features are extracted and stored in the rela- 2 tional database along with run-length encoded umbra and 1.5 penumbra regions within the bounding rectangle for each 1 sunspot. The database is accessible via web services and 0.5 (http://solar.inf.brad.ac.uk) website. In order to significantly reduce the errors contained in 0 Total sunspot areas in 09/02/03 10/04/03 09/06/03 08/08/03 07/10/03 06/12/03 acceptance and rejection rate coefficients FARs and FRRs and

millionth of solar hemishphere Date of observation validate the detection with the existing activity index [25], the sunspot candidate classification into groups has to be im- (a) plemented. This can be done by examining the adjacent ob- servations, their magnetic polarity, and values, and that is, 120 the scope of a forthcoming paper. 100 80 60 ACKNOWLEDGMENT 40 sq. degrees 20 The research has been done for European Grid of Solar Ob- 0 servations (EGSO) funded by the European Commission Total sunspot areas in 09/02/03 10/04/03 09/06/03 08/08/03 07/10/03 06/12/03 within the IST fifth framework, Grant IST-2001-32409. Date of observation

(b) REFERENCES

Figure 7: A comparison of temporal variations of daily sunspot ar- [1]M.Steinegger,M.Vazquez,J.A.Bonet,andP.N.Brandt,“On eas extracted in 2003 from the (a) USAF/NOAA archive, US, and the energy balance of solar active regions,” Astrophysical Jour- from the (b) solar feature catalogue with the presented technique. nal, vol. 461, pp. 478–498, April 1996. [2] D. V. Hoyt and K. H. Schatten, “Group sunspot numbers: a new solar activity reconstruction,” Solar Physics, vol. 179, no. 1, pp. 189–219, 1998. 4. THE CONCLUSIONS [3] D. V. Hoyt and K. H. Schatten, “Group sunspot numbers: a new solar activity reconstruction,” Solar Physics, vol. 181, no. 2, pp. 491–512, 1998. Newimprovedtechniquesarepresentedforautomated [4] M. Temmer, A. Veronig, and A. Hanslmeier, “Hemispheric sunspot detection on full-disk solar images obtained in the sunspot numbers Rn and Rs: Catalogue and N-S asymmetry Ca II K1 line from the Meudon Observatory (∼ 300 im- analysis,” Astronomy & Astrophysics, vol. 390, pp. 707–715, ages) and in WL from the MDI instrument aboard the SOHO August 2002. satellite (10 082 images). [5] G. A. Chapman and G. Groisman, “A digital analysis of The technique applies automated image cleaning pro- sunspot areas,” Solar Physics, vol. 91, pp. 45–50, March 1984. [6]G.A.Chapman,A.M.Cookson,andJ.J.Dobias,“Observa- cedures for elimination of limb-darkening and noncircular tions of changes in the bolometric contrast of sunspots,” As- image shape. Edge-detection methods and global threshold- trophysical Journal, vol. 432, no. 1, pp. 403–408, 1994. ing methods are used to produce initial image segmentation. [7]G.A.Chapman,A.M.Cookson,andD.V.Hoyt,“Solarir- The resulting oversegmentation is remedied using a median radiance from Nimbus-7 compared with ground-based pho- filter followed by morphological closing operations to close tometry,” Solar Physics, vol. 149, no. 2, pp. 249–255, 1994. boundaries. A watershed region filling operation, 7 × 7mor- [8] M. Steinegger, P. N. Brandt, J. M. Pap, and W. Schmidt, phological closing and dilation operators are used to define “Sunspot photometry and the total solar irradiance deficit measured in 1980 by ACRIM,” Astrophysics and Space Science, the regions of interest possibly containing sunspots. Each re- vol. 170, no. 1-2, pp. 127–133, 1990. gion is then examined separately and the values of thresh- [9] P. N. Brandt, W. Schmidt, and M. Steinegger, “On the umbra- olds used to define sunspot umbra and penumbra bound- penumbra area ratio of sunspots,” Solar Physics, vol. 129, pp. aries are determined as a function of the full-disk quiet Sun 191–194, September 1990. intensity value and statistical properties of the region (such as [10] T. Pettauer and P. N. Brandt, “On novel methods to deter- mean intensity, standard, deviation, and absolute mean de- mine areas of sunspots from photoheliograms,” Solar Physics, viation). The sunspots detected in WL are verified using the vol. 175, no. 1, pp. 197–203, 1997. [11] M. Steinegger, J. A. Bonet, and M. Vazquez,´ “Simulation SOHO/MDI magnetogram data. of seeing influences on the photometric determination of The detection results for the selected months in 2002– sunspot areas,” Solar Physics, vol. 171, no. 2, pp. 303–330, 2003 show a good agreement with the Meudon manual syn- 1997. optic maps and a very good agreement with the Locarno [12] D. G. Preminger, S. R. Walton, and G. A. Chapman, “Solar manual drawings (95%–98%). The temporal variations of feature identification using contrasts and contiguity,” Solar sunspot areas detected in 2003 with the presented technique Physics, vol. 202, no. 1, pp. 53–62, 2001. [13] M. Turmon, J. M. Pap, and S. Mukhtar, “Statistical pattern from white-light images revealed a very high correlation recognition for labeling solar active regions: application to (96%) with those produced manually at the National Obser- SOHO/MDI imagery,” Astrophysical Journal, vol. 568, no. 1, vatory for Astronomy and Astrophysics (NOAA), US. part 1, pp. 396–407, 2002. 2584 EURASIP Journal on Applied Signal Processing

[14] L. Gyori,¨ “Automation of area measurement of sunspots,” So- S. Ipson graduated in 1969 with a First Class lar Physics, vol. 180, no. 1-2, pp. 109–130, 1998. with Honors degree in applied physics from [15] P. H. Sherrer, R. S. Bogart, R. I. Bush, et al., “The solar oscilla- the University of Bradford, following peri- tions investigation-Michelson Doppler imager,” Solar Physics, ods spent at the Rutherford Appleton Lab- vol. 162, no. 1/2, pp. 129–188, 1995. oratory, BP Research Laboratory, Sunbury, [16] V. V. Zharkova, S. S. Ipson, S. I. Zharkov, A. Benkhalil, J. and the AEA, Winfrith. After carrying out Aboudarham, and R. D. Bentley, “A full-disk image standard- theoretical and experimental work at AERE, isation of the synoptic solar observations at the Meudon Ob- Harwell, and the Universities of Bradford, servatory,” Solar Physics, vol. 214, no. 1, pp. 89–105, 2003. Oxford, and Heidelberg, he was awarded a [17] J. Canny, “A computational approach to edge detection,” IEEE Ph.D. degree in theoretical nuclear physics Trans. Pattern Anal. Machine Intell., vol. 8, no. 6, pp. 679–698, in 1975 from the University of Bradford. He is currently a Senior 1986. Lecturer in the Department of Cybernetics, University of Bradford [18] C. W. Allen, Astrophysical Quantities, Athlone Press, London, and his present research interests are digital image restoration, pat- UK, 1973. [19] J. Serra, Image Analysis and Mathematical Morphology, vol. 1, tern recognition, and 3D reconstruction from multiple images. Academic Press, London, UK, 1988. A. Benkhalil received a B.Eng. degree in [20] G. Matheron, Random Sets and Integral Geometry,JohnWiley & Sons, New York, NY, USA, 1975. computer engineering from The Higher In- [21] P. T. Jackway, “Gradient watersheds in morphological scale- stitute of Electronics, Libya, in 1988, an space,” IEEE Trans. Image Processing, vol. 5, no. 6, pp. 913–921, M.S. degree in real-time electronic systems 1996. from University of Bradford, UK, in 1996, [22] A. Rosenfeld and J. L. Pfaltz, “Sequential operations in digital and a Ph.D. degree from the same univer- picture processing,” Journal of the Association for Computing sity in 2001 on a project entitled “An FPGA- Machinery, vol. 13, no. 4, pp. 471–494, 1966. based real-time autonomous video surveil- [23] M. Waldmeier, The Sunspot Activity in the Years 1610–1960, lance system.” He is currently a full-time Sunspot Tables, Schulthess Publisher, Zurich, Switzerland, postdoctoral Research Assistant for the Eu- 1961. ropean Grid of Solar Observations (EGSO), University of Bradford. [24] Solar Index Data Catalogue, Belgian Royal Observatory, His current research interests include image processing, computer http://sidc.oma.be/products/. vision, and real-time systems design. [25] NOAA National Geophysical Data Center, http://www.ngdc. noaa.gov/stp/SOLAR/ftpsunspotregions.html. [26] S. I. Zharkov and V. V. Zharkova, “Statistical properties of sunspots and their magnetic field detected from full disk SOHO/MDI images in 1996–2003,” Adv. Space Res., in press.

S. Zharkov obtained his B.A. (Hons) de- gree in mathematics from Cambridge Uni- versity in 1996, his Ph.D. degree from Glas- gow University in 2000. He is a postdoctoral Research Assistant at Bradford University, a key developer of the solar feature catalogues for the European Grid of Solar Observa- tions (EGSO) Project. His present interests include digital image processing, solar dy- namo theory, and differential geometry. V. Zhar kova obtained an M.S. degree (first class with distinction) in applied mathe- matics from Kiev State University in 1975, a Ph.D. degree in astrophysics from the Main Astronomical Observatory, National Academy of Sciences of Ukraine in 1984, Advanced Certificate in computing sciences from Kiev State University in 1989. She worked in the Physics and Applied Math- ematics Department, Kiev State University, and in the Physics and Astronomy Department, Glasgow Univer- sity, UK. Currently she is a Professor in applied mathematics at the Department of Cybernetics, Bradford University, UK, and leads the Solar Imaging Research Group. Her current research interests are solar feature recognition from solar images obtained from various space and ground-based observations,flare-induced wave recogni- tion, energy release and transport in solar flares, and prediction of solar activity. EURASIP Journal on Applied Signal Processing 2005:15, 2585–2594 c 2005 Hindawi Publishing Corporation

On-board Data Processing to Lower Bandwidth Requirements on an Infrared Astronomy Satellite: Case of Herschel-PACS Camera

Ahmed Nabil Belbachir Pattern Recognition and Image Processing Group, Vienna University of Technology, Favoritenstrasse 9/1832, 1040 Vienna, Austria Email: [email protected] Horst Bischof Institute for Computer Graphics and Vision, Technical University of Graz, Inffeldgasse 16/II, 8010 Graz, Austria Email: [email protected]

Roland Ottensamer Institut fur¨ Astronomie, Universitat¨ Wien, Turkenschanzstrasse¨ 17, 1180 Vienna, Austria Email: [email protected]

Franz Kerschbaum Institut fur¨ Astronomie, Universitat¨ Wien, Turkenschanzstrasse¨ 17, 1180 Vienna, Austria Email: [email protected]

Christian Reimers Institut fur¨ Astronomie, Universitat¨ Wien, Turkenschanzstrasse¨ 17, 1180 Vienna, Austria Email: [email protected]

Received 31 May 2004; Revised 1 March 2005

This paper presents a new data compression concept, “on-board processing,” for infrared astronomy, where space observatories have limited processing resources. The proposed approach has been developed and tested for the PACS camera from the European Space Agency (ESA) mission, Herschel. Using lossy and lossless compression, the presented method offers high compression ratio with a minimal loss of potentially useful scientific data. It also provides higher signal-to-noise ratio than that for standard compression techniques. Furthermore, the proposed approach presents low algorithmic complexity such that it is implementable on the resource-limited hardware. The various modules of the data compression concept are discussed in detail. Keywords and phrases: Herschel, PACS, on-board processing, infrared astronomy, compression.

1. INTRODUCTION Infrared detectors consist, as a rule, of fewer pixels than Infrared (IR) astronomy requires dedicated data compres- those for visual range, but the design of multisensor instru- sion for economical storage and transmission of the large ments leads to even higher data volumes. If multiple detec- data volume regarding the limited budget and resources tors are operated in parallel to support multispectral or even available for space missions [1, 2]. In fact, this is most de- hyper-spectral imaging, then the data volumes multiply. Fur- manding for space observatories where images are generated thermore, small spacecrafts are usually used for deep space in different domains with higher resolution and therefore missions. They are characterized by being restricted to low larger dimensions. This yields to an important increase in budget and consequently a low data rate. Therefore, although terms of data volume and bit rate. Furthermore, telemetry many applications exist, which generate or manipulate as- capabilities did not follow the same performance increase. tronomical data [3, 4, 5, 6], transmitting image information Therefore, compression becomes a requirement for commu- still faces a bottleneck such that this constraint has stim- nication systems in charge of storage and/or transmission of ulated advances in compression techniques for astronomy the data. [7, 8]. However, the proposed techniques are often ad hoc 2586 EURASIP Journal on Applied Signal Processing

(a) (b) (c)

Figure 1: Example of an infrared image. (a) Raw image at 1500- second integration time. (b) One interesting object in the image “Galaxy SBS-0335-05210.” (c) Resulting image after denoising (noise from the electronic). and sometimes not appropriate for infrared data. For exam- invested in analyzing the performance of the compression al- ple, in [8], the listed methods involve filtering of informa- gorithms in terms of results quality and complexity. Such an tion, which is not considered to be of use, by means of ob- analysis forms the basis for optimizing the algorithms, and ject recognition methods that face the background estima- also for determining whether a given algorithm is appropri- tion problem to guarantee not to destroy information. Fur- ate for the application at hand. Basically, data compression is thermore, this lessens the interpretability of the results and a matter of modeling. The more information can be derived limits the extension of the method to nonimage data struc- from it, the less information has to be transmitted. This pa- tures. per is concerned with recognizing the best-suited technique Indeed, thermal infrared (mid and far infrared) imaging for improving the efficiency of bandwidth-limited transmis- is a measure of heat. To capture this energy, a complex instru- sion channels in case of IR space astronomy. We propose a mentation is used such that the detectors are cooled down to new concept, “on-board processing,” which addresses both few kelvins. Therefore, a composite signal (source + back- aspects of data quality and complexity [10]. The photodetec- ground) will result. The source is considered as the object tor array camera and spectrometer (PACS) [11] is one of the heat (observed target). The background is the environment three instruments operating on board the Herschel space ob- heat whose amplitude is usually far higher than that of the servatory (HSO) [12] foreseen to be launched in early 2007. observed target. To capture the infrared image of the wished Our task in the framework of the PACS consortium concerns target, one has to integrate several images over time (usually the reduction of the data collected on-board, to fit the avail- hours, depending on the wavelength), which is called inte- able telemetry. This task is of special importance because of gration time. Therefore, the infrared image acquired at time the extreme compression ratio (up to 40!) dictated by the “t” has no object structure, which makes compression task combination of a high raw data rate with a relatively low challenging, such that it has to ensure that the relevant sig- telemetry rate available for an L2- space mission. An nal (observed source) is not lost during compression. Fur- on-board processing scheme combining data reduction with thermore, infrared image acquisition is susceptible to heavy lossless compression algorithms for high compression ratio particles (glitches) that might on one side disturb the signal is presented in this paper. accuracy, changing the electronic characteristics (e.g., detec- This paper is structured as follows. In Section 2, the chal- tor responsivity), and on the other side, it might increase the lenges of a compression method are presented. We present signal entropy, and thus decrease the compression efficiency. the problem statements and the characteristics of the astro- Figure 1 illustrates a typical infrared image, as observed nomical data, from PACS, in Section 3.InSection 4, the de- by the telescope GEMINI [9]. Figure 1a shows the raw im- scriptions of the proposed approach, on-board processing, age at 1500 second integration time. One interesting object and its modules are given. Experimental results of the appli- (galaxy) in the image after a postprocessing can be found cation of this reduction concept are given in Section 5.We in Figure 1b. Figure 1c shows the relevant image for the as- conclude with a short summary. tronomy expert after removing the noise and the stripping artifact due to the instrument electronic. The challenge of data compression is to preserve as much information from 2. COMPRESSION CHALLENGES the image as possible such that the relevant image structure The major concern of a compression method is to recognize (e.g., Figure 1b) can be reconstructed. and remove all redundancy in order to reduce the data traf- Indeed, no real study has been performed for IR astro- fic over the transmission channel. The performance of a data nomical data compression apart from the use of wavelet- compression method for infrared astronomy can be evalu- based compression techniques [7, 8]. Generally, IR data are ated using the following relevant parameters: collected on-board an observatory (satellite) that can over- load downlink bandwidth and on-board memory resources (i) the compression ratio versus the reconstruction error; rapidly. Therefore, a significant research effort has to be (ii) the complexity of the method. On-board Processing for Infrared Space Astronomy: Herschel-PACS 2587

The first criterion (compression ratio) points out the capa- of the PACS instrument, the cold readout electronics (CRE) bility of the method to find and remove the redundancy in preamplifies and samples the photo-currents generated in the data. The more redundancy is removed the better com- the detector. There will be typically 8 to 1024 samples on each pression ratio is achieved. ramp. The reconstruction error defines the quality of the data An ideal bolometer, by definition, is a device that detects after reconstruction. The results quality criteria, which can all the radiation falling on it. A typical detector consists of a be retained for estimating the merits and performances of small chip of the doped material supported by very thin wires a compression method, in case of astronomy, are visual as- which act as electrical conductors for the measurement of its pect, signal-to-noise ratio, detection of real and faint objects, resistance and at the same time connect it to a heat sink with object morphology, , and photometry. Although a certain thermal resistance which has to be chosen in ad- the upper criteria are very important to design a compres- vance according to the background level of radiation that is sion method, the complexity of the algorithm is of bigger expected to strike it. The doping level of the material is cho- importance because it defines the feasibility of the method. sen to provide an optimum sensitivity of resistance to tem- As the algorithm has to be run on-board a space observa- perature at around its operating temperature, which is 0.3K. tory computer, then, the implementation of the method has The analog signal is buffered, and amplified at 300 K stage, to be part of the design of the method. In this paper, we treat then oversampled at the multiplexer stage while being con- the case of PACS where the photoconductors and bolometers verted to digital signal. are used as detectors. We focus in this paper on the photo- The main challenge is the high data rate of the instru- conductors case. When such detectors are receiving infrared ment. The raw data stream in spectroscopy consists of 2 × photons from an astronomical or internal source, the current 26 × 18 channels, so a total of 936 channels. With a readout I at their output is proportional to the number of photons rate of 256 Hz we get a sampling rate of 239616 samples/s. falling on the detector. As this signal has generally very low Conversion of this analog data stream by means of a 16 bit amplitude [1], the signal preamplification and sampling are ADC yields the maximum data rate of the raw data stream required. Since the output voltage should stay within a quite of 3744 Kbits/s. The data stream in photometry consists of limited range, a voltage reset pulse is applied in addition to a 10 × 16 × 16 channels, so a total of 2560 channels. With sample pulse after sampling a number of desired voltages. maximum readout rate of 40 Hz, we get a raw data stream An illustration of 1-dimensional signal for selected pix- of 1600 Kbps after ADC oversampling. els from Herschel-PACS Camera [11]isgiveninFigure 2.All For science data transmission, different modes are fore- plots represent the detector output voltage in bit values (y- seen. In PACS prime mode, the maximum data rate is lim- axis) against time (x-axis). We can see different signal behav- ited to 120 Kbps, while it is limited to 60 Kbps when PACS iors that depend on the detector setting and responsivity. and SPIRE share the downlink bandwidth in parallel mode. In the burst mode, the maximum data rate is up to 300 Kbps. Hence, a minimum compression ratio of 40 is required1 in 3. HERSCHEL-PACS CHARACTERISTICS prime mode. In addition to that, the detectors are continu- ously exposed to high energy cosmic particles inducing a dis- HSO will be equipped with a 3.5 m Cassegrain telescope and turbance (glitches) of the readout voltage, which decreases house three instruments inside its superfluid helium cryo- the signal-to-noise ratio and hence the data accuracy level. stat covering the spectral range between 55 and 670 µm. The In the sequel, we assume the characteristics of the detector three instruments are built by different European consortia and the signalsto be as follows: signal-to-noise ratio ≈ 600– with international cooperation with international coopera- 11000, glitch rate = 10 s/pixel, glitch tails < 0.5second, de- tion as listed in Table 1. tector output = 16 bits, significant bits = 14 bits. PACS will conduct dual band photometry and imaging The maximum possible compression rate we could ob- spectroscopy in the 55–210 micron spectral range. The in- × tain by a lossless compression (i.e., the original measure- strument consists of two 25 18 Ge:Ga photoconductor ar- ments can be recovered) can be computed as follows. A rays for spectroscopy, read out at 256 Hz and two bolome- / × × compression ratio of 16 14 is obtained by eliminating non- terarrayswith32 16 and 64 32 pixels for photometry, significant bits via spatial and temporal redundancy reduc- read out at a frequency of 40 Hz. In both modes, a high raw tion. An additional compression factor of 4 is obtained by data flow of up to 4 Mbit/s is generated. This is far above the calculating the slope of the sub-ramp, which has to be given nominal telemetry downlink bandwidth, which is restricted at least with the accuracy of the S/N. Therefore, 16 bit for to 120 kbps, due to the L2-orbit of the spacecraft in about the slope are sufficient. A further lossless compression of four times the moon’s distance. the signal is not possible because it contains basically the When the photoconductors are receiving IR photons noise of the telescope, which is by definition, incompressible. from an astronomical or internal source, the voltage V at This noise cannot be eliminated because we would loose the its output will increase as a function of time. The incom- ing photons excite charge carriers into the conduction or va- lence band. The voltage increase is proportional to the cur- 1In what follows we will only consider the prime mode, on which PACS rent through the detector which is in turn proportional to will operate most of the mission. Furthermore, compression requirements the number of photons falling on the detector. In the case are less demanding in the other modes. 2588 EURASIP Journal on Applied Signal Processing

×104 ×104 5.28 4.55

5.26 4.5

5.24 4.45

5.22 4.4 0 10203040506070 0 10203040506070

(a) (b)

×104 ×104 3.59 3.604

3.585 3.602

3.58 3.6

3.575 3.598 0 10203040506070 0 10203040506070

(c) (d)

×104 ×104 5.1 4.6

5.05 5 4.55 4.95 4.9 4.5 0 10203040506070 0 10203040506070

(e) (f)

×104 ×104 3.595 5.5

5 3.59 4.5 3.585 4 3.58 3.5 0 10203040506070 0 10203040506070

(g) (h)

Figure 2: Example of different 1D infrared signals for eight selected PACS photoconductors data: (a) active detector 1, (b) active detector 3, (c) blind detector 17, (d) blind detector 18, (e) active detector 18, (f) active detector 5, (g) blind detector 1, and (h) blind detector 2.

Table 1: The scientific payload of the Herschel space observatory.

Instrument PI location Spectral range PACS MPE Garching, GER 55–210 µm Heterodyne instrument for the far infrared (HIFI) SRON Groningen, NL 480–1910 GHz Spectral photometer imaging receiver (SPIRE) University of Wales/Cardiff, GB 200–670 µm astronomical signal. Therefore we can achieve a lossless com- 4. DATA COMPRESSION CONCEPT pression factor of 4.57. Since lossless compression is impos- This section reviews the basic concept for PACS on-board sible at such rate, we have to perform on-board processing. processing to achieve the desired downlink data rates. In the next section we describe our compression concept in Figure 3 presents the different software modules. We will detail. consider the case of photoconductors, which is challenging On-board Processing for Infrared Space Astronomy: Herschel-PACS 2589

Raw data Spectroscopy Photometry 3744 + 256 kbps 1600 + 160 kbps

Header compression Data separation, detector selection Raw data selection

Preliminary processing

Glitch detection Ramp fitting/averaging

Integration

Lossless compression Telemetr y 120 kbps

Figure 3: A schematic diagram for on-board processing concept. (Dark grey color indicates the modules where raw data are lossy com- pressed.) in term of compression rate. Similarly, the bolometers case ground. In what follows we will not describe this mod- can be treated. First, the data packet received from the focal ule further. plane unit (FPU) is grouped into a set of reset interval mea- surements (useful time). Each one is called ramp. It contains In the following subsections we describe the individual measurement samples during one reset interval. modules in detail, especially the integration and glitch detec- The compression concept can be coarsely divided into tion part. four modules. 4.1. Detector selection (1) Header compression. Header or control data repre- sent the observation configuration, the detectors set- This module performs the data selection according to the de- ting, and the compression parameters. They are set tector tables. The detectors selection tables consist of model by ground engineers that are responsible for run- sets stored on-board depending on the object to observe or ning the planned observation. The control data are on the detector status. For instance, pixels that represent the transmitted to PACS within the daily telecommunica- object of interest are selected and data from others are dis- tion period, executed by the detector and mechanic carded. Furthermore, bad pixels (e.g., dead pixels, saturated controllers according to the prescribed commanding pixels, etc.) may be deselected and data from those pixels sequence, and routed again to ground engineers as could be discarded. header information of the science observation data. This header is generated at science readout rate. The 4.2. Preliminary processing goal of this module is to compress the control data This module is used to transform the received signal to the (header) lossless as much as possible such that the lim- appropriate form (e.g., linearization of the ramps). In fact, ited bandwidth can be fully exploited for the science this is used to reduce the noise (pick up and cross talk noise) data. In what follows, this module is not described in the data. It uses the infrared detectors characteristics where further. blind pixels (not exposed to the light) are used. They are used (2) Integration. The integration part of the approach per- as reference for the correlated pick-up noise. A correlation forms the on-board data reduction. The basic idea is matrix between the blind pixels on the reference lines and that in order to achieve the high compression ratio the actual pixels is used to remove the correlated noise. we have to integrate several samples on-board. Since, arampmaybeeffected by glitches, we have to ensure 4.3. Ramp fitting that we do not integrate over this ramps. This is done Ramp fitting is one of the crucial steps of the proposed on- in the glitch detection module. board processing concept. In this paper, we will only consider (3) Lossless compression. The lossless compression part of linear ramps. Indeed, an extension to nonlinear ramps could the approach consists of the temporal and spatial re- be easily done when analytic model of the ramp is available, dundancy reduction and the entropy coder. or the fitting can be performed over a small part of ramps (4) Raw data selection.Thismoduleisresponsiblefor (sub-ramps), typically 4 samples, such that nonlinear ramps transmitting selected detector data without compress- are also considered, if nonlinearity is above the 4 samples. ing them. The main reason for this module is to The ramps are fitted to the sensor readings in order to obtain check the performance of the compression software on the flux. We consider the samples belonging to a ramp given 2590 EURASIP Journal on Applied Signal Processing

Extrinsic deglitching In this case we have to take into account the difference in slope between two subsequent sub-ramps. If two subsequent slopes differ more than 2σ we have an indication of a glitch. All ramps which are affected by glitches are discarded. Since we might only have four points per sub-ramp, it does not make sense to take those parts of the sub-ramp into ac- Voltage count which are not affected by glitches. Another critical issue is glitch tails detection. Since the behavior of the detector might change for some time after it Slope has been hit by a glitch, this is a critical issue. At the moment 8 Slope Slope Slope Slope Slope Slope Slope the concept foresees to discard all samples within a fixed time 1 2 3 4 5 6 7 interval when a glitch has been detected. However, in the fu- Time ture we will investigate also methods such that this can be detected automatically. Figure 4: Least squares fitting of sub-ramps for 32-sample ramp over every 4 samples of a ramp. 4.5. Integration The integration module will perform on-board integration = x ...x T by a vector x [ 1, n] . A linear ramp is given by of the sensor readings in order to achieve the desired com- pression ratio. This is the lossy compression part of the soft- x = st + o + η,(1)ware. Special emphasis has to be paid in order to guarantee integration over the right readings—synchronized with the where s is the unknown slope, t are the known instants of positions of the chopper—and not to integrate over ramps sampling, o is the unknown offset, and η is a vector of ran- affected by glitches. Thus, the integration process first de- dom variables with distribution of every element assumed to termines whether to discard all data of an integration block be N(0, σ), characterizing the noise process. In order to ob- if there is a lack of confidence in at least some of the sam- tain the parameters of interest, this equation has to be solved ples. Then slope data of a number of successive ramps within in a robust manner. We have following options. the same chopper position will be added, if they are free of glitches. Least squares solution The least squares solution can be easily calculated in analytic 4.6. Lossless compression form, and is optimal with respect to the Gaussian noise pro- The previous modules represent the lossy, that is, reduction, cess. However, this solution is not appropriate in case of out- part of the PACS data reduction/compression system. The liers (i.e., glitches), therefore, glitch detection module has to further modules constitute the lossless, that is, compression, ensure the outliers removal before the fitting. Figure 4 shows part. To perform the high compression rate required, several an example where least squares fitting is successively per- compression iterations should be applied. formed over every 4 samples of a ramp (4-sample sub-ramp). ff The result of the ramp fitting are slopes and the o set of 4.6.1. Preprocessing the sub-ramps, and for each sample on the ramp we have a flag if it is an outlier or not. If it is not an outlier, we have After the integration we have a sequence of arrays we call it t × in addition a residual value. This is the input to the glitch frames (i.e., A ,whereA ∈ R16 25 is an array of integrated detection module. slopes at time t). Since temporarily and spatially adjacent measurements will be similar, we can use this fact for further 4.4. Glitch detection data reduction. Since ramp fitting/averaging and on-board integration might Temporal redundancy reduction be performed, we have to ensure that we do not integrate t t t t n t t n over invalid sensor readings (i.e., glitches). The detection of We calculate ∆ +1 = A − A +1 ···∆ + = A − A + . If subse- t i t i such events will be performed in the glitch detection module. quent frames are similar |∆ + ||A + |,1≤ i ≤ n, therefore t t i The glitch detection will be done at the individual sample we can gain in the compression ratio encoding A and ∆ + , level “intrinsic deglitching” as well as at ramp level “extrinsic 1 ≤ i ≤ n. deglitching” and by considering subsequent ramps. Spatial redundancy reduction Intrinsic deglitching After the temporal redundancy reduction, spatially neigh- This is done by the residual and offset information calculated boring values in ∆t+i should be similar (in the ideal case they by the ramp-fitting module. All ramps/sub-ramps where an are zero), therefore we can gain additional compression by outlier has been detected will be discarded. encoding the difference of neighboring pixels. On-board Processing for Infrared Space Astronomy: Herschel-PACS 2591

Table 2: Potential loss of scientific data. Number of ramps No glitch detection 50% 90% 99% 2 9.75% 4.94% 0.99% 0.1% 4 18.55% 9.63% 1.99% 0.19% 8 33.66% 18.33% 3.93% 0.39% 14 51.23% 29.84% 6.77% 0.67%

4.6.2. Entropy coding Redundancy reduction as outlined above should have re- duced the magnitude of pixels values as much as possible. This fact makes it possible to have assumptions about the dis- tribution of the data, which is a prerequisite for efficient loss- less compression. Generally, the astronomical images have uniform background, while observing the same target for long duration. Therefore, the data packet related will con- tain many identical sample values. The redundancy reduc- (a) (b) tion is suitable to optimize the data packet size. A combi- nation between the RZIP [13] and arithmetic encoding [14] Figure 5: Resulted NGC 1808 image after (a) JPEG 2000 compres- algorithms is performed for further compression ratio. The sion (EBCOT method) for a compression ratio of 6 and (b) on- RZIP algorithm is especially developed for PACS data en- board processing for CR = 15. tropy coding. See [13] for further details. Since arithmetic encoding is a standard algorithm, we will not describe it fur- ther. Table 2 lists the potential data loss for various numbers of integrated ramps for different glitch detection rates. 5. EVALUATION OF THE ON-BOARD A glitch detection rate of more than 95% seems feasible, PROCESSING CONCEPT therefore the potential data loss will be around 1%–3%. In fact, it will be lower because in the above calculations we The on-board processing concept is evaluated in this section, have assumed for simplicity that a glitch and its tail are in- on a theoretical basis and on NGC1808 IR image from the dependent events, which is not true. In fact, if the glitch is infrared space observatory (ISO) mission. detected, we have also detected its tail. In addition, we have 5.1. Data reduction evaluation assumed that when we do not detect a glitch, all integrated measurements will be lost. In fact if we miss a small glitch We first consider how many ramps have to be integrated in and integrate over it, this just decreases the signal-to-noise order to achieve the desired compression ratio of 40. As we ratio. Another thing we have not considered is false negative have explained in Section 3 with lossless compression, we rate, that is, we discard a ramp even if it is not affected by a can achieve only a compression ratio of 4.57. The additional glitch, this will of course also lead to a loss of scientific data. compression factor of 7.8 has to be gained by integration of But this can be directly estimated. In addition, this has no ramps or fitting bigger sub-ramp length (e.g., 8 samples per effect on the other data. From these considerations, one can sub-ramp). Therefore, we have to integrate over 8 ramps2. see that the desired compression ratio can be achieved with The next thing to consider is the potential loss of scien- minimal loss of scientifically valuable data. tific data. Of course, the glitch detection will not be 100% correct. We can quantify the potential loss of scientifically 5.2. Quantitative results valid data by the glitch detection rate and the number of ramps that will be integrated. Assuming a glitch rate of ev- For performance evaluation, the proposed approach, on- ery 10 s/pixel, with a glitch tail of 0.5 s we get a probability of board processing, is compared with JPEG 2000 [15], ZIP [16], and RAR [17] methods, in terms of compression ra- pglitch = 1/20 that a ramp is affected by a glitch. Then we can tio (CR), processing time, and memory usage, on NGC calculate the potential loss of scientific data ploss by 1808 raw infrared images (1032 frames) from the ISO mis-   n p = 1 − 1 − p 1 − p ,(2)sion. Figure 5a depicts the resulted image after JPEG 2000 loss glitch det compression of individual raw images for CR = 6, while Figure 5b shows the on-board processing result for CR = 15. where n is the number of integrated ramps and pdet is the glitch detection efficiency. This JPEG 2000 implementation [15] uses EBCOT algorithm (embeded block coding with optimized truncation) [18]for the quantization of the wavelet coefficients and the binary 2In fact, integration over 7 ramps should be sufficient because due to arithmetic coder as backend entropy codec. The white ver- the decrease in signal-to-noise ratio we could gain the rest by temporal and tical line represents the column 24 with dead pixels, detec- spatial redundancy reduction. tors that were lost during the mission. The quality loss is 2592 EURASIP Journal on Applied Signal Processing

Table 3: Comparison of compression performances on NGC 1808 raw IR images (1032 frames) between JPEG 2000, ZIP, RAR, and the proposed approach.

Method CR Time (ms) Memory usage (kbytes) ZIP 1.39 24285 37240 RAR 1.40 34500 18912 JPEG 2000 6.38 158600 4128 On-board processing 15.44 120 1900

220 of the European Space Agency (ESA). We have described the key modules like ramp fitting and glitch detection in de- 200 tail. Our concept combines lossy and lossless compression; the presented method offers a high compression ratio with a 180 minimal loss of potentially useful scientific data. It also pro- vides higher signal-to-noise ratio than that for standard com- 160 pression techniques. In [19] we illustrate the feasibility of the

Values method presented in this paper on data from Infrared Space 140 Observatory (ISO).

120 ACKNOWLEDGMENT

100 This work is supported by a grant from the Federal Ministry 0 200 400 600 800 1000 1200 of Science, Innovation and Transport and the Austrian Space Detectors Agency. OBP image J2K image REFERENCES

Figure 6: Illustration of the JPEG 2000 compressed image (dashed- [1]I.S.Glass,Handbook of Infrared Astronomy, Cambridge Uni- dotted line) and the resulted image from on-board processing (solid versity Press, Cambridge, UK, 1999. line) on a 1D plot. [2] A. N. Belbachir, H. Bischof, and F. Kerschbaum, “A data com- pression concept for space application,” in Proc. IEEE Digital Signal Processing Workshop (DSP-SPE ’00),Hunt,Tex,USA, observed compared to the resulted image (Figure 5b)from October 2000. our proposed approach, which is due to the performed quan- [3] R. Buccigrossi and E. Simoncelli, “Image compression via tization by means of the EBCOT algorithm. For better error joint statistical characterization in the wavelet domain,” IEEE Trans. Image Processing, vol. 8, no. 12, pp. 1688–1701, display, both reconstructed images are plotted as a 1D sig- 1999. nal in Figure 6. On the x-axis, the pixel indexes (1–1024) are [4] W. Press, “Wavelet-based compression software for FITS im- represented while pixel values, for both reconstructed images ages,” in Astronomical Data Analysis Software and Systems I, (JPEG 2000 and our approach), are depicted on the y-axis. It D. M. Worrall, C. Biemesderfer, and J. Barnes, Eds., vol. 25 of is shown in this figure the approximation error due to the A.S.P. Conf. Ser., pp. 3–16, Tucson, Ariz, USA, 1992. EBCOT quantization. [5] J. Sanchez and M. P. Canton, Space Image Processing,CRC Press, Boca Raton, Fla, USA, 1999. Compression results comparing the above listed methods [6] R. L. White, M. Postman, and M. G. Lattanzi, Digitized Optical are reported in Table 3 for comparison purpose. All methods Sky Surveys, Kluwer Academic, Amsterdam, The Netherlands, have been run on a 450 MHz Pentium PC with Windows Nt H. T. MacGillivray and E. B. Thomson Eds., 1992. 4, for identical comparison platform, although the on-board [7] M. Datcu and G. Schwarz, “Advanced image compression: processing is dedicated for embedded hardware (DSPs) for specific topics for space applications,” in Proc. 6th Interna- space applications. It is noted the highest compression ratio tional Workshop on Digital Signal Processing Techniques for for faster processing time needed by the on-board processing Space Applications (DSP ’98), Noordwijk, The Netherlands, September 1998. for the reduction of NGC1808 raw images compared to those [8]M.Louys,J.L.Starck,S.Mei,F.Bonnarel,andF.Murtagh, with the generic compression methods. Our approach makes “Astronomical image compression,” Astronomy and Astro- use of the IR astronomy signal characteristics and the limited physics Supplement Series, vol. 136, no. 3, pp. 579–590, 1999. resources for better fit the compression needs to the available [9] J. L. Starck, D. L. Donoho, and E. J. Candes,` “Astronomical resources. image representation by the curvelet tansform,” Astronomy & Astrophysics, vol. 398, no. 2, pp. 785–800, 2003. [10] A. N. Belbachir and H. Bischof, “On-board data compression: 6. CONCLUSION noise and complexity related aspects,” Tech. Rep. 75, PRIP, Vi- enna University of Technology, Vienna, Austria, 2003. In this paper, a novel on-board data processing concept is [11] A. Poglitsch, C. Waelkens, O. H. Bauer, et al., “The photode- proposed, that was dedicated for the Herschel-PACS mission tector array camera and spectrometer (PACS) for the Herschel On-board Processing for Infrared Space Astronomy: Herschel-PACS 2593

space observatory,” in Bulletin of the American Astronomical include object recognition, visual learning, medical computer Society, vol. 36, 2004, AAS 204th Meeting, Denver, Colo, USA. vision, neural networks, and adaptive methods for com- [12] G. L. Pilbratt, “The Herschel mission, scientific objectives, puter vision, where he has published more than 210 scien- and this meeting,” in The Promise of the Herschel Space Ob- tific papers. He was Cochairman of international conferences servatory, vol. 460 of ESA-SP, pp. 13–20, Toledo, Spain, July (ICANN, DAGM), and Local Organizer for ICPR’96. He is 2001. the Program Cochair of ECCV2006. Currently, he is an As- [13] R. Ottensamer, A. N. Belbachir, H. Bischof, et al., “Her- sociate Editor for Pattern Recognition Journal, Computer and schel/PACS onboard reduction/compression software imple- Informatics Journal, and Journal of Universal Computer Science. mentation,” in International Symposium on Astronomical Tele- He is currently the Vice-Chair of the Austrian Association for Pat- scopes, vol. 5487 of Proceedings of SPIE, Glasgow, Scotland, tern Recognition. He has received an award from the Pattern Recog- UK, June 2004. nition Journal in 2002, where the paper “Multiple eigenspaces” has [14] G. Held, Data and Image Compression,JohnWiley&Sons, New York, NY, USA, 1996. been selected as the most original manuscript. [15] D. Taubman, E. Ordentlich, M. Weinberger, and G. Seroussi, “Embedded block coding in JPEG 2000,” Signal Processing: Roland Ottensamer received his M.S. de- Image Communication, vol. 17, no. 1, pp. 49–72, 2002. gree in 2004 from the Institute for Astron- [16] http://www.winzip.com. omy in Vienna, where he holds a research [17] http://www.rarlab.com. position. During the last four years, he was [18] D. Taubman, “High performance scalable image compression highly involved in the development of the with EBCOT,” IEEE Trans. Image Processing,vol.9,no.7,pp. data compression/decompression software 1158–1170, 2000. for the IR photo-detector camera PACS and [19] C. Reimers, A. N. Belbachir, H. Bischof, et al., “A feasibil- has presented his work at the major confer- ity study of on-board data compression for infrared cameras ences for instrumentation in astronomy. He of space observatories,” in Proc. 17th International Conference is currently working on the development of on Pattern Recognition (ICPR ’04), vol. 1, pp. 524–527, Cam- the interactive analysis framework for Herschel. His research inter- bridge, UK, August 2004. ests are astronomy and computer science with special emphasis on data mining and processing. He is a Founder Member of OeGAA. Ahmed Nabil Belbachir received the Elec- tronic Engineering degree in 1996 and the Franz Kerschbaum received his Ph.D. de- M.S. degree in signal processing, 2000, from gree with distinction from the University of the University of Science and Technology Vienna in 1993. Between 1997 and 2000, of Oran, (USTO) Algeria. In March 2005, Kerschbaum received an APART-Grant of he was awarded the Ph.D. degree in com- the Austrian Academy of Sciences. In 2000, puter science from the Vienna University he got his Habilitation in observational as- of Technology. Currently, he is a Research trophysics. Since 2001, he has been an Asso- Fellow at the Pattern Recognition and Im- ciate Professor at the Institute for Astron- age Processing Group at the Vienna Univer- omy at the University of Vienna. He is a sity of Technology and he is involved in the ESA-Herschel Project, referee, evaluator, or expert for the Sixth where he is responsible for the data reduction and image com- European Union Framework Programme, the European Space pression for PACS instrument. He has developed the on-board re- Agency, Deutsche Forschungsgemeinschaft, Italian Space Agency, duction/compression software for the IR photo-detector camera Instituto de Astrofisica de Canarias, Katholieke Universiteit Leu- PACS. His research interests are digital filter design, data com- ven, Friedrich-Schiller-Universitat¨ Jena, A&A, A&A Letters, and pression, signal/image processing, and real-time systems where he MNRAS. His research interests include late stages of stellar evo- has published more than 20 scientific publications. He is a Mem- lution, astromineralogy, instrumentation, space experiments, and ber of the IAPR TC13 (Technical Committee for Pattern Recog- history of astronomy, where he has published more than 140 publi- nition in Astronomy and Astrophysics), AAPR (Austrian Asso- cations in scientific journals, and 25 articles in popular media, and ciation for Pattern Recognition), OeGAA (Austrian Association he was invited to several lectures. He is the cofounder and Board ¨ for Astronomy and Astrophysics), and EURASIP. He is also a re- Member of the Osterreichische Gesellschaft fur¨ Astronomie und viewer for the IEEE Signal/Image Processing, the Elsevier Digital Astrophysik, Member of International Astronomical Union, Euro- Signal Processing, and the Journal of Intelligent and Fuzzy Sys- pean Astronomical Society, Astronomische Gesellschaft, Pro Scien- tems. tia, and Forum Sankt Stephan, as well as Member of ESAs Astron- omy Working Group from 1999 to 2001, being also a Deputy Head Horst Bischof received his M.S. and Ph.D. in the year 2001. degrees in computer science from the Vi- enna University of Technology in 1990 and Christian Reimers is an Astrophysicist with 1993, respectively. In 1998, he got his Ha- special interest in theoretical and compu- bilitation (venia docendi) for applied com- tational modelling of radiation hydrody- puter science. Currently, he is a Profes- namic problems and the shaping of plan- sor at the Institute for Computer Graph- etary nebulae. After graduating from the ics and Vision at the Technical University of Technical High School for Communica- Graz, Austria. He is also a key researcher at tions Engineering in Rankweil, Austria, the recently founded K+ Competence Cen- he entered the studies of astronomy and ter “Advanced Computer Vision” where he is responsible for re- physics at the University of Vienna, Aus- search projects in the area of classification. His research interests tria. In 1999, he obtained the M.S. degree in 2594 EURASIP Journal on Applied Signal Processing astronomy at the Institute of Astronomy (IfA) from the University of Vienna which was awarded the price of appreciation of the Aus- trian Federal Ministry for Education, Science and Culture. While pursuing his Ph.D. at IfA, he joined the Herschel/PACS Team to work on the development of the compression software and related documentation for the ESA cornerstone mission Herschel from 2001 to 2003. Since 2001, he is a Tutor for “Introduction to Astron- omy: Part II” at IfA. He is a Member of OeGAA (Austrian Associa- tion for Astronomy and Astrophysics). His ambition is to finish his Ph.D. in 2005.