Radio Interferometric Gain Calibration As a Complex Optimization Problem 3 from Which Definitions of the Complex Jacobian and Complex Laurent Et Al
Total Page:16
File Type:pdf, Size:1020Kb
A Mon. Not. R. Astron. Soc. 000, 1–18 (2014) Printed 26 February 2015 (MN LTEX style file v2.2) Radio interferometric gain calibration as a complex optimization problem O.M. Smirnov12⋆, C. Tasse31 1Department of Physics and Electronics, Rhodes University, PO Box 94, Grahamstown, 6140 South Africa 2SKA South Africa, 3rd Floor, The Park, Park Road, Pinelands, 7405 South Africa 3GEPI, Observatoire de Paris, CNRS, Université Paris Diderot, 5 place Jules Janssen, 92190 Meudon, France Accepted 2015 February 24. Received 2015 February 13; in original form 2014 October 30 ABSTRACT Recent developments in optimization theory have extended some traditional al- gorithms for least-squares optimization of real-valued functions (Gauss-Newton, Levenberg-Marquardt, etc.) into the domain of complex functions of a complex vari- able. This employs a formalism called the Wirtinger derivative, and derives a full- complex Jacobian counterpart to the conventional real Jacobian. We apply these de- velopments to the problem of radio interferometric gain calibration, and show how the general complex Jacobian formalism, when combined with conventional optimization approaches, yields a whole new family of calibration algorithms, including those for the polarized and direction-dependent gain regime. We further extend the Wirtinger calculus to an operator-based matrix calculus for describing the polarized calibration regime. Using approximate matrix inversion results in computationally efficient im- plementations; we show that some recently proposed calibration algorithms such as StefCal and peeling can be understood as special cases of this, and place them in the context of the general formalism. Finally, we present an implementation and some applied results of CohJones, another specialized direction-dependent calibration al- gorithm derived from the formalism. Key words: Instrumentation: interferometers, Methods: analytical, Methods: nu- merical, Techniques: interferometric INTRODUCTION and DD gains terms (the latter in the sense of being solved independently per direction). In radio interferometry, gain calibration consists of solving arXiv:1502.06974v1 [astro-ph.IM] 24 Feb 2015 for the unknown complex antenna gains, using a known Gain calibration is a non-linear least squares (NLLS) (prior, or iteratively constructed) model of the sky. Tra- problem, since the noise on observed visibilities is al- ditional (second generation, or 2GC) calibration employs most always Gaussian (though other treatments have been an instrumental model with a single direction-independent proposed by Kazemi & Yatawatta 2013). Traditional ap- (DI) gain term (which can be a scalar complex gain, or proaches to NLLS problems involve various gradient-based 2 × 2 complex-valued Jones matrix) per antenna, per some techniques (for an overview, see Madsen et al. 2004), such time/frequency interval. Third-generation (3GC) calibration as Gauss-Newton (GN) and Levenberg-Marquardt (LM). also addresses direction-dependent (DD) effects, which can These have been restricted to functions of real variables, be represented by independently solvable DD gain terms, since complex differentiation can be defined in only a very or by some parameterized instrumental model (e.g. pri- restricted sense (in particular, ∂z/∂z¯ does not exist in the mary beams, pointing offsets, ionospheric screens). Dif- usual definition). Gains in radio interferometry are complex ferent approaches to this have been proposed and imple- variables: the traditional way out of this conundrum has mented, mostly in the framework of the radio interferome- been to recast the complex NLLS problem as a real prob- try measurement equation (RIME, see Hamaker et al. 1996); lem by treating the real and imaginary parts of the gains as Smirnov (2011a,b,c) provides a recent overview. In this work independent real variables. we will restrict ourselves specifically to calibration of the DI Recent developments in optimization theory (Kreutz-Delgado 2009; Laurent et al. 2012) have shown that using a formalism called the Wirtinger complex derivative (Wirtinger 1927) allows for a mathematically ⋆ E-mail: [email protected] robust definition of a complex gradient operator. This leads 2 O.M. Smirnov & C. Tasse to the construction of a complex Jacobian J, which in Table 1. Notation and frequently used symbols turn allows for traditional NLLS algorithms to be directly applied to the complex variable case. We summarize these x scalar value x developments and introduce basic notation in Sect. 1. x¯ complex conjugate In Sect. 2, we follow on from Tasse (2014) to apply this x vector x theory to the RIME, and derive complex Jacobians for X matrix X X X (unpolarized) DI and DD gain calibration. vector of 2 × 2 matrices =[Xi] (Sect. 4) R space of real numbers C space of complex numbers In principle, the use of Wirtinger calculus and complex I identity matrix Jacobians ultimately results in the same system of LS equa- diag x diagonal matrix formed from x tions as the real/imaginary approach. It does offer two im- ||· ||F Frobenius norm portant advantages: (i) equations with complex variables are (·)T transpose more compact, and are more natural to derive and analyze (·)H Hermitian transpose than their real/imaginary counterparts, and (ii) the struc- ⊗ outer product a.k.a. Kronecker product ¯ ture of the complex Jacobian can yield new and valuable x¯, X element-by-element complex conjugate of x, X x X insights into the problem. This is graphically illustrated in x˘, X˘ augmented vectors x˘ = , X˘ = i x¯ XH Fig. 1 (in fact, this figure may be considered the central i X upper half of matrix X insight of this paper). Methods such as GN and LM hinge U X , X left, right half of matrix X around a large matrix – J H J – with dimensions correspond- L R XUL upper left quadrant of matrix X ing to the number of free parameters; construction and/or Y Y Y order of operations is XU = XU = (XU) , inversion of this matrix is often the dominant algorithmic Y Y or X U = (X )U cost. If J H J can be treated as (perhaps approximately) d, v, r, g, m data, model, residuals, gains, sky coherency sparse, these costs can be reduced, often drastically. Figure 1 (·)k value associated with iteration k H shows the structure of an example J J matrix for a DD gain (·)p,k value associated with antenna p, iteration k calibration problem. The left column row shows versions of (·)(d) value associated with direction d J H J constructed via the real/imaginary approach, for four W matrix of weights different orderings of the solvable parameters. None of the Jk, J k∗ partial, conjugate partial Jacobian at iteration k J full complex Jacobian orderings yield a matrix that is particularly sparse or easily H, H˜ J H J and its approximation invertible. The right column shows a complex J H J for the vec X vectorization operator same orderings. Panel (f) reveals sparsity that is not appar- RA right-multiply by A operator (Sect. 4) ent in the real/imaginary approach. This sparsity forms the LA left-multiply by A operator (Sect. 4) basis of a new fast DD calibration algorithm discussed later i δj Kronecker delta symbol in the paper. A B matrix blocks "C D# In Sect. 3, we show that different algorithms may be de- ց, ր, ↓ repeated matrix block rived by combining different sparse approximations to J H J with conventional GN and LM methods. In particular, we show that StefCal, a fast DI calibration algorithm recently 1 WIRTINGER CALCULUS & COMPLEX proposed by Salvini & Wijnholds (2014a), can be straight- LEAST-SQUARES forwardly derived from a diagonal approximation to a com- The traditional approach to optimizing a function of n com- plex J H J. We show that the complex Jacobian approach plex variables f(z), z ∈ Cn is to treat the real and imaginary naturally extends to the DD case, and that other sparse parts z = x + iy independently, turning f into a function approximations yield a whole family of DD calibration algo- of 2n real variables f(x, y), and the problem into an opti- rithms with different scaling properties. One such algorithm, mization over R2n. CohJones (Tasse 2014), has been implemented and success- Kreutz-Delgado (2009) and Laurent et al. (2012) pro- fully applied to simulated LOFAR data: this is discussed in pose an alternative approach to the problem based on Sect. 6. Wirtinger (1927) calculus. The central idea of Wirtinger cal- culus is to treat z and z¯ as independent variables, and op- In Sect. 4 we extend this approach to the fully polar- timize f(z, z¯) using the Wirtinger derivatives ized case, by developing a Wirtinger-like operator calculus ∂ 1 ∂ ∂ ∂ 1 ∂ ∂ in which the polarization problem can be formulated suc- = − i , = + i , (1.1) ∂z 2 ∂x ∂y ∂z¯ 2 ∂x ∂y cinctly. This naturally yields fully polarized counterparts to the calibration algorithms defined previously. In Sect. 5, we where z = x + iy. It is easy to see that discuss other algorithmic variations, and make connections ∂z¯ ∂z to older DD calibration techniques such as peeling (Noordam = = 0, (1.2) 2004). ∂z ∂z¯ i.e. that z¯ (z) is treated as constant when taking the deriva- tive with respect to z (z¯). From this it is straightforward to While the scope of this work is restricted to LS solutions to the DI and DD gain calibration problem, the potential ap- define the complex gradient operator plicability of complex optimization to radio interferometry ∂ ∂ ∂ ∂ ∂ ∂ ∂ = , = ,..., , ,..., , (1.3) is perhaps broader. We will return to this in the conclusions. ∂C z ∂z ∂z¯ ∂z ∂z ∂z¯ ∂z¯ 1 n 1 n Radio interferometric gain calibration as a complex optimization problem 3 from which definitions of the complex Jacobian and complex Laurent et al. (2012) show that Eqs. 1.8 and 1.9 yield Hessians naturally follow.