Estimating the Redshift Distribution of Photometric Galaxy Samples II
Total Page:16
File Type:pdf, Size:1020Kb
Mon. Not. R. Astron. Soc. 000, 000–000 (0000) Printed 19 June 2018 (MN LATEX style file v2.2) Estimating the Redshift Distribution of Photometric Galaxy Samples II. Applications and Tests of a New Method Carlos E. Cunha1,2,3⋆, Marcos Lima2,4,5, Hiroaki Oyaizu1,2, Joshua Frieman1,2,6, Huan Lin6 1Department of Astronomy and Astrophysics, University of Chicago, Chicago, IL 60637 2Kavli Institute for Cosmological Physics, University of Chicago, Chicago, IL 60637 3Department of Physics, University of Michigan, 450 Church St., Ann Arbor, MI 48109 4Department of Physics, University of Chicago, Chicago, IL 60637 5Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA 19104 6Center for Particle Astrophysics, Fermi National Accelerator Laboratory, Batavia, IL 60510 19 June 2018 ABSTRACT In Lima et al. (2008) we presented a new method for estimating the redshift distribution, N(z), of a photometric galaxy sample, using photometric observables and weighted sampling from a spectroscopic subsample of the data. In this paper, we extend this method and explore various applications of it, using both simulations of and real data from the SDSS. In addition to estimating the redshift distribution for an entire sample, the weighting method enables accurate estimates of the red- shift probability distribution, p(z), for each galaxy in a photometric sample. Use of p(z) in cosmological analyses can substantially reduce biases associated with tradi- tional photometric redshifts, in which a single redshift estimate is associated with each galaxy. The weighting procedure also naturally indicates which galaxies in the photometric sample are expected to have accurate redshift estimates, namely those that lie in regions of photometric-observable space that are well sampled by the spec- troscopic subsample. In addition to providing a method that has some advantages over standard photo-z estimates, the weights method can also be used in conjunction with photo-z estimates, e.g., by providing improved estimation of N(z) via deconvolution of N(zphot) and improved estimates of photo-z scatter and bias. We present a publicly available p(z) catalog for ∼ 78 million SDSS DR7 galaxies. arXiv:0810.2991v4 [astro-ph] 16 Mar 2010 Key words: distance scale – galaxies: distances and redshifts – galaxies: statistics – large scale structure of Universe 1 INTRODUCTION abling approximate distance-redshift measurements as well as study of the growth of density perturbations. The power Optical and near-infrared wide-area surveys planned for the of these surveys to constrain cosmological parameters will be next decade will increase the size of photometric galaxy sam- limited in part by the accuracy with which the galaxy red- ples by an order of magnitude, delivering measurements of shift distributions can be determined (Huterer et al. 2004, billions of galaxies. Much of the utility of these samples for 2006; Zhan & Knox 2006; Zhan 2006; Ma et al. 2006; Lima astronomical and cosmological studies will rest on knowledge & Hu 2007). of the redshift distributions of the galaxies they contain. For example, surveys aimed at probing dark energy via clusters, Photometric redshifts—approximate estimates of weak lensing, and baryon acoustic oscillations (BAO) will galaxy redshifts based on their broad-band photometric rely on the ability to coarsely bin galaxies by redshift, en- observables, e.g., magnitudes or colors—offer one set of techniques for approaching this problem. However, photo-z estimators are typically biased to some degree, and they ⋆ [email protected] can suffer from catastrophic failures in certain regimes. c 0000 RAS 2 Cunha et al. These problems motivate the development of potentially weighting technique on its own, independently of ‘tradi- more robust methods. tional’ photo-z estimates. The accuracy of the weighting In Lima et al. (2008) we presented a new, empirical method in directly reconstructing N(z) is affected by pho- technique aimed not at estimating individual galaxy red- tometric errors and by sparse or incomplete coverage by the shifts but instead at estimating the redshift distribution, training set of the space of photometric observables spanned N(z), for an entire photometric galaxy sample or suitably by the photometric data. We develop and test a bootstrap selected subsample. The method is based upon matching the technique to estimate random errors in the weighted N(z) distributions of photometric observables (e.g., magnitudes, estimate and present a technique for detecting systematic colors, etc) of a spectroscopic subsample to those of the pho- errors in it as well. We also discuss the effects of training-set tometric sample. The method assigns weights to galaxies in non-representativeness on the N(z) estimate. Perhaps most the spectroscopic subsample (hereafter denoted the training importantly, we show that the weighting procedure can be set, in analogy with machine-learning methods of photo-z used to estimate not only the redshift distribution for the estimation), such that the weighted distributions of observ- (entire) photometric sample, N(z), but also a redshift prob- ables for these galaxies match those of the photometric sam- ability distribution, p(z), for each galaxy in the photometric ple. The weight for each training-set galaxy is computed by sample. Such a distribution contains much more information comparing the local “density” of training-set galaxies in the than a discrete photo-z estimate, zphot. Use of p(z) instead multi-dimensional space of photometric observables to the of zphot in cosmological analyses can potentially greatly re- density of the photometric sample in the same region. We es- duce the biases arising from photo-z’s. timate the densities using a nearest-neighbor approach that The paper is organized as follows. In §2 we review and ensures that the density estimates are both local and sta- extend the weighting method for estimating the redshift dis- ble in sparsely occupied regions of the space. The use of the tribution and the redshift probability distribution, focus- nearest neighbors ensures optimal binning of the data, which ing in particular on sources and estimates of errors in the minimizes the requisite size of the spectroscopic subsample. method. In §3 we describe the actual and simulated SDSS After the training-set galaxy weights are derived, we sum galaxy catalogs that we use to test the weighting method and them in redshift bins to estimate the redshift distribution its alternatives. We demonstrate how the weighting method for the photometric sample. improves upon photometric-redshift estimates in the mock As Lima et al. (2008) show, this weighting method pro- catalog in §4, and we demonstrate its effectiveness in esti- vides a precise and nearly unbiased estimate of the underly- mating N(z), in comparison with photo-z-based methods, in ing redshift distribution for a photometric sample without §5. We apply the new methods to the real SDSS DR6 in §6. recourse to photo-z estimates for individual galaxies. More- We present our conclusions in §7 and include some technical over, the spectroscopic training set does not have to be rep- details of the analysis in the Appendices. resentative of the photometric sample, in its distributions of magnitudes, colors, or redshift, for the method to work. (By contrast, the performance of training-set-based photo-z 2 THE WEIGHTING METHOD estimators generally degrades as the training set becomes In this section, we briefly review and extend the weight- less representative of the photometric sample.) The only re- quirement is that the spectroscopic training set covers, even ing method introduced in Lima et al. (2008). We define the weight, w, of a galaxy in the spectroscopic training set as sparsely, the range of photometric observables spanned by the normalized ratio of the density of galaxies in the photo- the photometric sample. The weighting method can be ap- plied to different combinations of photometric observables metric sample to the density of training-set galaxies around the given galaxy. These densities are calculated in a local that correlate with redshift—here, we confine our analysis neighborhood in the space of photometric observables, e.g., to magnitudes and colors. multi-band magnitudes. More formally, given a training-set In this paper we present additional applications of the galaxy, we define its weight by weighting method, test its performance on simulated data 1 ρ sets, and show results of those applications using data from w ≡ P , (1) the SDSS. The applications of the weighting method nat- NP,tot ρT urally fall into two categories, those that enhance photo-z where NP,tot is the total number of galaxies in the photo- estimators and those that (potentially) replace photo-z es- metric sample, and ρP and ρT are the local number densities timation. In the first category, we show that the weighting in the space of observables for the photometric and training method can be used to improve estimates of the scatter and sets, bias of training-set-based photo-z estimates as functions of NP,T (true) spectroscopic redshift. Knowledge of such errors are ρP,T ≡ , (2) very important, since uncertainties in photo-z bias and scat- VP,T ter are nuisance parameters that significantly degrade the where NP(T) is the number of photometric (training) set power of cosmological probes (eg. Huterer et al. 2004; Ma galaxies within volume VP(T). et al. 2006; Lima & Hu 2007). We also show that the weights We adopt a nearest-neighbor approach to estimating the can be used to obtain improved estimates of the error distri- density of galaxies in magnitude space, because it enables bution of the photo-z’s, P (zphot|zspec), and thereby improve control of statistical errors (shot noise) while also ensuring the deconvolution procedure used to infer the underlying adequate “locality” of the volume in magnitude space. We th redshift distribution, N(z), from the distribution of photo- define the distance dαβ in magnitude space between the α z’s (Padmanabhan et al.