Imaging with Information Field Theory
Philipp Arras November 28, 2017
Max-Planck Institute for Astrophysics, Garching, Germany Technical University of Munich Introduction Radio Aperture Synthesis
dirty image 1011 reconstructed sky × 0.0012 5
4 0.0010
3 0.0008
2
flux 0.0006
1 intensity
0.0004 0
0.0002 1 − Figure 1: Dirty image Figure 2: Reconstructed sky model
Supernova remnant 3C391, data from CASA tutorial on NRAO website, VLA, 128 MHz bandwidth, two spectral windows at 4.6 and 7.5 GHz. Reconstructed with RESOLVE + noise estimation, only Stokes I.
1 Inverse Problems
True sky model Comes with an infinite number of degrees of freedom. 3 Fields (e.g. ρ : R R≥0) −→ →
2 Inverse Problems
Data True sky model Always finite number of degrees of Comes with an infinite number of freedom. degrees of freedom. 3 Data arrays Fields (e.g. ρ : R R≥0) −→ −→ → (e.g. vis.shape = (4132094,))
2 Figure 4: Reconstructed image (→ s)
(s d) P |
Inverse Problems (even worse)
U-V-coverage 0.3
0.2
0.1
0.0 v
0.1 −
0.2 −
0.3 −
0.4 0.2 0.0 0.2 0.4 − − u
Figure 3: UV coverage (→ d)
3 (s d) P |
Inverse Problems (even worse)
U-V-coverage 0.3
0.2
0.1
0.0 v
0.1 −
0.2 −
0.3 −
0.4 0.2 0.0 0.2 0.4 − − u
Figure 3: UV coverage (→ d) Figure 4: Reconstructed image (→ s)
3 Inverse Problems (even worse)
U-V-coverage 0.3
0.2
0.1
0.0 v
0.1 −
0.2 −
0.3 −
0.4 0.2 0.0 0.2 0.4 − − u
Figure 3: UV coverage (→ d) Figure 4: Reconstructed image (→ s)
(s d) P |
3 (d s) (s) (s d) = P |(dP) P | P
3 Bayesian Inference Bayesian inference
Likelihood Probability to obtain a data set given the signal is known.
Definitions s := physical signal, d := data
Product Rule of Probabilities aka Bayes’ theorem (d s) (s) (s d)= P | P P | (d) P
4 Bayesian inference
Prior Available information on the signal, prior to the measurement.
Definitions s := physical signal, d := data
Product Rule of Probabilities aka Bayes’ theorem (d s) (s) (s d)= P | P P | (d) P
4 Bayesian inference
Posterior Probability for a signal realization given the measured data.
Definitions s := physical signal, d := data
Product Rule of Probabilities aka Bayes’ theorem (d s) (s) (s d)= P | P P | (d) P
4 Bayesian inference
Evidence For inferences on s, normalization factor can be ignored in most cases.
Definitions s := physical signal, d := data
Product Rule of Probabilities aka Bayes’ theorem (d s) (s) (s d)= P | P P | (d) P
4 This is a lot! Theoretically, infinite-dimensional. Practically, e.g. 2 5 Npix = 256 , Nvalues = 10 655 Mio. dimensions. →
Probability Distributions Over All Possible Images
·10−51
4 ) d | s (
P 2
0 Npix Nvalues dimensions ·
5 Probability Distributions Over All Possible Images
·10−51
4 ) d | s (
P 2
0 Npix Nvalues dimensions · This is a lot! Theoretically, infinite-dimensional. Practically, e.g. 2 5 5 Npix = 256 , Nvalues = 10 655 Mio. dimensions. → Need to extract information about (s d) P | where s is a field.
5 Information Field Theory Provides dictionary: • Statistical mechanics & Bayesian inference Field theory
Define (s, d) := log (s, d). Then: • H − P (s, d) e−H(s,d) (s d) = P = . P | (d) R s e−H(s,d) P D
Information Field Theory
Information Field Theory := Information theory with fields. •
6 Define (s, d) := log (s, d). Then: • H − P (s, d) e−H(s,d) (s d) = P = . P | (d) R s e−H(s,d) P D
Information Field Theory
Information Field Theory := Information theory with fields. • Provides dictionary: • Statistical mechanics & Bayesian inference Field theory
6 Information Field Theory
Information Field Theory := Information theory with fields. • Provides dictionary: • Statistical mechanics & Bayesian inference Field theory
Define (s, d) := log (s, d). Then: • H − P (s, d) e−H(s,d) (s d) = P = . P | (d) R s e−H(s,d) P D
6 Information Field Theory
Theory on continuous spaces, actual calculations discretized •
7 Information Field Theory
Theory on continuous spaces, actual calculations discretized •
7 Wiener filter demo
See Jupyter Notebook on google drive. −→ 8 Wiener filter demo
See Jupyter Notebook on google drive. −→ 8 Wiener filter demo
See Jupyter Notebook on google drive. −→ 8 Wiener filter demo
See Jupyter Notebook on google drive. −→ 8 IFT actually works.
8 Imaging the Radio Sky: RESOLVE Challenges to be addressed
Extended sources (exclude point sources for now). • Uncertainty maps. • Low Signal-to-Noise performance. • Reproducibility. •
9 Data U-V-coverage 0.3
0.2
0.1
0.0 v
0.1 −
0.2 −
0.3 −
0.4 0.2 0.0 0.2 0.4 − − u Prior power spectrum, Posterior
power spectrum
2 10 power
smoothness 100
2 10− correlation 4 10−
6 prior, flat Inv- 10−
100 101 102 harmonic mode Gamma prior
Data model
d = Res + n
RESOLVE (without noise estimation)
Inference algorithm
10 Prior power spectrum, Posterior
power spectrum
2 10 power
smoothness 100
2 10− correlation 4 10−
6 prior, flat Inv- 10−
100 101 102 harmonic mode Gamma prior
Data model
d = Res + n
RESOLVE (without noise estimation)
Data U-V-coverage 0.3
0.2
0.1
0.0 v
0.1 −
0.2 −
0.3 −
0.4 0.2 0.0 0.2 0.4 − − u
Inference algorithm
10 Prior power spectrum, Posterior
power spectrum
2 10 power
smoothness 100
2 10− correlation 4 10−
6 prior, flat Inv- 10−
100 101 102 harmonic mode Gamma prior
RESOLVE (without noise estimation)
Data U-V-coverage 0.3
0.2
0.1
0.0 v
0.1 −
0.2 −
0.3 −
0.4 0.2 0.0 0.2 0.4 − − u
Inference algorithm
Data model
d = Res + n 10 Posterior
power spectrum
2 10 power
100
2 10− correlation 4 10−
6 10−
100 101 102 harmonic mode
RESOLVE (without noise estimation)
Data U-V-coverage 0.3
0.2
0.1
0.0 v
0.1 −
0.2 −
0.3 −
0.4 0.2 0.0 0.2 0.4 − − u Prior power spectrum, Inference smoothness algorithm prior, flat Inv- Gamma prior
Data model
d = Res + n 10 Information Hamiltonian (without noise estimation)
(d, s, τ ) = log (d, s, τ ) H − P 1 s † −1 s 1 = 2 (d Re ) N (d Re ) + 2 log N | − {z− | }| Likelihood 1 † −1 1 + 2 s S s + 2 log S | {z | }| Prior / regularization † † −τ 1 † + (α 1) τ + q e + 2 τ T τ | − {z } hyper-prior
with τ being the power spectrum of signal covariance S correlation structure:
X τk S = e Sk k
11 Inference Algorithm (without noise estimation)
Approximate posterior: • (s d) = (s m, D) δ(τ τ ∗) P | G − · − Solve for m and τ ∗ (with NIFTy’s1 help): • d
Map m Power spectrum τ ∗
1https://gitlab.mpcdf.mpg.de/ift/nifty
12 Posterior
power spectrum
2 10 power
100
2 10− correlation 4 10−
6 10−
100 101 102 harmonic mode
RESOLVE (without noise estimation)
Data U-V-coverage 0.3
0.2
0.1
0.0 v
0.1 −
0.2 −
0.3 −
0.4 0.2 0.0 0.2 0.4 − − u Prior power spectrum, Inference smoothness algorithm prior, flat Inv- Gamma prior
Data model
d = Res + n 13 RESOLVE (without noise estimation)
Data U-V-coverage 0.3
0.2
0.1
0.0 v
0.1 −
0.2 −
0.3 −
0.4 0.2 0.0 0.2 0.4 − − u Prior power spectrum, Posterior Inference power spectrum
2 10 power
smoothness 100
2 algorithm 10− correlation 4 10−
6 prior, flat Inv- 10−
100 101 102 harmonic mode Gamma prior
Data model
d = Res + n 13 3C391 again.
dirty image 1011 × 5
4
3
2 flux
1
0
1 −
14 3C391 again.
dirty image 1011 × 5
4
3
2 flux
1
0
1 −
14 3C391 again.
reconstructed sky
0.0012
0.0010
0.0008
0.0006 intensity
0.0004
0.0002
14 3C391 again.
14 power spectrum
2 10 power
100
2 10− correlation 4 10−
6 10−
100 101 102 harmonic mode
3C391 again.
relative error 4.0
3.5
3.0
2.5
2.0
1.5 relative error
1.0
0.5
14 3C391 again.
relative error 4.0
3.5
3.0
2.5
2.0
1.5 relative error
1.0
0.5
power spectrum
2 10 power
100
2 10− correlation 4 10−
6 10−
100 101 102 harmonic mode
14 Imaging the Gamma-Ray Sky: D3PO D3PO
D3PO: Denoise, Deconvolve and Decompose Photon Observations. • Assumptions: Like RESOLVE but with Poisson statistics and add • point sources. Paper and codes: www.mpa-garching.mpg.de/ift/d3po. •
15 D3PO in action: 6.5 years all sky data
Data...
Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action: 6.5 years all sky data
Data...Log-data...
Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action: 6.5 years all sky data
Data...Log-data...Denoised...
Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action: 6.5 years all sky data
Data...Log-data...Denoised...Deconvolved...
Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action: 6.5 years all sky data
Data...Log-data...Denoised...Deconvolved...Decomposed.
Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action: 6.5 years all sky data
Data...Log-data...Denoised...Deconvolved...Decomposed.
Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action
17 Wrap-up • Continuity. • Correlation structures (power spectra).
• Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: •
Summary
Information Field Theory is good for: •
18 • Continuity. • Correlation structures (power spectra).
• Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: •
Summary
Information Field Theory is good for: • • Incomplete data.
18 • Continuity. • Correlation structures (power spectra).
• Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: •
Summary
Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device.
18 • Continuity. • Correlation structures (power spectra).
• Estimating power spectra and noise level. Reconstruction algorithms rely on: •
Summary
Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps.
18 • Continuity. • Correlation structures (power spectra).
Reconstruction algorithms rely on: •
Summary
Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level.
18 • Continuity. • Correlation structures (power spectra).
Summary
Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: •
18 • Correlation structures (power spectra).
Summary
Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: • • Continuity.
18 Summary
Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: • • Continuity. • Correlation structures (power spectra).
18 mostly without direct user input. Assumptions and approximations written down as statistical models no subjective input at runtime. → Reproducibility. −→ Combine imaging and calibration into one single estimation and • inference framework. More information. −→
Bayesian imaging
Image Data Calibration Calibrated data Cleaning
The Future
Automatize imaging with a Bayesian framework that can estimate: • • all polarizations, • a full spectral 3D model, • uncertainties,
19 Combine imaging and calibration into one single estimation and • inference framework. More information. −→
Bayesian imaging
Image Data Calibration Calibrated data Cleaning
The Future
Automatize imaging with a Bayesian framework that can estimate: • • all polarizations, • a full spectral 3D model, • uncertainties, mostly without direct user input. Assumptions and approximations written down as statistical models no subjective input at runtime. → Reproducibility. −→
19 The Future
Automatize imaging with a Bayesian framework that can estimate: • • all polarizations, • a full spectral 3D model, • uncertainties, mostly without direct user input. Assumptions and approximations written down as statistical models no subjective input at runtime. → Reproducibility. −→ Combine imaging and calibration into one single estimation and • inference framework. More information. −→
Bayesian imaging
Image Data Calibration Calibrated data Cleaning
19 Fermi
COMPTEL
Integral
RXTE Physical components 4D radio-gamma Chandra Tomography 3D radio-gamma ROSAT 3D galactic, 3D lensing 3D radio 3D gamma XMM DSC UBIK 2D radio 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference
LOFAR
VLA
MeerKAT, SKA
Effelsberg-SRT Envisioned In preparation Published Science results
The Future
D3PO 2D event based imaging 1D time-line reconstructions
RESOLVE 2D aperture synthesis
20 COMPTEL
Integral
Physical components
4D radio-gamma Chandra Tomography 3D radio-gamma ROSAT 3D galactic, 3D lensing 3D radio 3D gamma XMM DSC UBIK 2D radio 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference
LOFAR
MeerKAT, SKA
Effelsberg-SRT Envisioned In preparation Science results
The Future
Fermi
D3PO 2D event based imaging RXTE 1D time-line reconstructions
RESOLVE VLA 2D aperture synthesis
Published 20 Physical components
4D radio-gamma Chandra Tomography 3D radio-gamma ROSAT 3D galactic, 3D lensing 3D radio 3D gamma XMM DSC UBIK 2D radio 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference
Effelsberg-SRT Envisioned Science results
The Future
Fermi
COMPTEL
Integral D3PO 2D event based imaging RXTE 1D time-line reconstructions
LOFAR RESOLVE VLA 2D aperture synthesis
MeerKAT, SKA
In preparation Published 20 Physical components
4D radio-gamma Tomography 3D radio-gamma 3D galactic, 3D lensing 3D radio 3D gamma DSC UBIK 2D radio 1D dynamical system classifier Universal Baysian Imaging toolKit DFI 2D dynamical field inference
Science results
The Future
Fermi
COMPTEL
Integral D3PO 2D event based imaging RXTE 1D time-line reconstructions
Chandra
ROSAT
XMM
eRosita
...
Planck
LOFAR RESOLVE VLA 2D aperture synthesis
MeerKAT, SKA
Effelsberg-SRT Envisioned In preparation Published 20 Physical components
4D radio-gamma
3D radio-gamma
3D radio
3D gamma
UBIK 2D radio Universal Baysian Imaging toolKit
Science results
The Future
Fermi
COMPTEL
Integral D3PO 2D event based imaging RXTE 1D time-line reconstructions
Chandra Tomography ROSAT 3D galactic, 3D lensing
XMM DSC 1D dynamical eRosita system classifier
... DFI Planck 2D dynamical field inference
LOFAR RESOLVE VLA 2D aperture synthesis
MeerKAT, SKA
Effelsberg-SRT Envisioned In preparation Published 20 Physical components
4D radio-gamma
3D radio-gamma
3D radio
3D gamma
2D radio
Science results
The Future
Fermi
COMPTEL
Integral D3PO 2D event based imaging RXTE 1D time-line reconstructions
Chandra Tomography ROSAT 3D galactic, 3D lensing
XMM DSC UBIK 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference
LOFAR RESOLVE VLA 2D aperture synthesis
MeerKAT, SKA
Effelsberg-SRT Envisioned In preparation Published 20 Physical components
4D radio-gamma
3D radio-gamma
3D radio
3D gamma
2D radio
Science results
The Future
Fermi
COMPTEL
Integral D3PO 2D event based imaging RXTE 1D time-line reconstructions
Chandra Tomography ROSAT 3D galactic, 3D lensing
XMM DSC UBIK 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference
LOFAR RESOLVE VLA 2D aperture synthesis
MeerKAT, SKA
Effelsberg-SRT Envisioned In preparation Published 20 The Future
Fermi
COMPTEL
Integral D3PO 2D event based imaging RXTE Physical components 1D time-line reconstructions 4D radio-gamma Chandra Tomography 3D radio-gamma ROSAT 3D galactic, 3D lensing 3D radio 3D gamma XMM DSC UBIK 2D radio 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference
LOFAR RESOLVE VLA 2D aperture synthesis
MeerKAT, SKA
Effelsberg-SRT Envisioned In preparation Published Science results 20 The end.
20 ?
20 References
Information Field Theory in a nutshell: 1301.2556. • IFT lecture notes • Papers on RESOLVE: 1311.5282, 1605.04317, 1711.02955. • D3PO: 1410.4562. • Website IFT group. • NIFTy’s git repo: https://gitlab.mpcdf.mpg.de/ift/NIFTy. • Includes example scripts. Licensed under GPLv3. If you use it, feel free to submit bug reports! Information Field Theory: Idea
Redefine • (s, d) = log (s, d) H − P Then, Bayes’ theorem reads: • (s, d) e−H(s,d) (s d) = P = . P | (d) R s e−H(s,d) P D Have rewritten Bayesian probability theory in statistical mechanics • language. Can compute arbitrary correlation functions: • D E Z s(x1) s(x2) = s s(x1) s(x2) (s d). P(s|d) D P | Compute MAP-solution minimize (s, d) wrt. s. • ≡ H Compute expectation value of posterior minimize Gibbs energy. • ≡