Imaging with Information Theory

Philipp Arras November 28, 2017

Max-Planck Institute for Astrophysics, Garching, Germany Technical University of Munich Introduction Radio Aperture Synthesis

dirty image 1011 reconstructed sky × 0.0012 5

4 0.0010

3 0.0008

2

flux 0.0006

1 intensity

0.0004 0

0.0002 1 − Figure 1: Dirty image Figure 2: Reconstructed sky model

Supernova remnant 3C391, data from CASA tutorial on NRAO website, VLA, 128 MHz bandwidth, two spectral windows at 4.6 and 7.5 GHz. Reconstructed with RESOLVE + noise estimation, only Stokes I.

1 Inverse Problems

True sky model Comes with an infinite number of degrees of freedom. 3 Fields (e.g. ρ : R R≥0) −→ →

2 Inverse Problems

Data True sky model Always finite number of degrees of Comes with an infinite number of freedom. degrees of freedom. 3 Data arrays Fields (e.g. ρ : R R≥0) −→ −→ → (e.g. vis.shape = (4132094,))

2 Figure 4: Reconstructed image (→ s)

(s d) P |

Inverse Problems (even worse)

U-V-coverage 0.3

0.2

0.1

0.0 v

0.1 −

0.2 −

0.3 −

0.4 0.2 0.0 0.2 0.4 − − u

Figure 3: UV coverage (→ d)

3 (s d) P |

Inverse Problems (even worse)

U-V-coverage 0.3

0.2

0.1

0.0 v

0.1 −

0.2 −

0.3 −

0.4 0.2 0.0 0.2 0.4 − − u

Figure 3: UV coverage (→ d) Figure 4: Reconstructed image (→ s)

3 Inverse Problems (even worse)

U-V-coverage 0.3

0.2

0.1

0.0 v

0.1 −

0.2 −

0.3 −

0.4 0.2 0.0 0.2 0.4 − − u

Figure 3: UV coverage (→ d) Figure 4: Reconstructed image (→ s)

(s d) P |

3 (d s) (s) (s d) = P |(dP) P | P

3 Bayesian inference

Likelihood Probability to obtain a data set given the signal is known.

Definitions s := physical signal, d := data

Product Rule of Probabilities aka Bayes’ theorem (d s) (s) (s d)= P | P P | (d) P

4 Bayesian inference

Prior Available information on the signal, prior to the measurement.

Definitions s := physical signal, d := data

Product Rule of Probabilities aka Bayes’ theorem (d s) (s) (s d)= P | P P | (d) P

4 Bayesian inference

Posterior Probability for a signal realization given the measured data.

Definitions s := physical signal, d := data

Product Rule of Probabilities aka Bayes’ theorem (d s) (s) (s d)= P | P P | (d) P

4 Bayesian inference

Evidence For inferences on s, normalization factor can be ignored in most cases.

Definitions s := physical signal, d := data

Product Rule of Probabilities aka Bayes’ theorem (d s) (s) (s d)= P | P P | (d) P

4 This is a lot! Theoretically, infinite-dimensional. Practically, e.g. 2 5 Npix = 256 , Nvalues = 10 655 Mio. dimensions. →

Probability Distributions Over All Possible Images

·10−51

4 ) d | s (

P 2

0 Npix Nvalues dimensions ·

5 Probability Distributions Over All Possible Images

·10−51

4 ) d | s (

P 2

0 Npix Nvalues dimensions · This is a lot! Theoretically, infinite-dimensional. Practically, e.g. 2 5 5 Npix = 256 , Nvalues = 10 655 Mio. dimensions. → Need to extract information about (s d) P | where s is a field.

5 Information Field Theory Provides dictionary: • Statistical mechanics & Bayesian inference Field theory

Define (s, d) := log (s, d). Then: • H − P (s, d) e−H(s,d) (s d) = P = . P | (d) R s e−H(s,d) P D

Information Field Theory

Information Field Theory := with fields. •

6 Define (s, d) := log (s, d). Then: • H − P (s, d) e−H(s,d) (s d) = P = . P | (d) R s e−H(s,d) P D

Information Field Theory

Information Field Theory := Information theory with fields. • Provides dictionary: • Statistical mechanics & Bayesian inference Field theory

6 Information Field Theory

Information Field Theory := Information theory with fields. • Provides dictionary: • Statistical mechanics & Bayesian inference Field theory

Define (s, d) := log (s, d). Then: • H − P (s, d) e−H(s,d) (s d) = P = . P | (d) R s e−H(s,d) P D

6 Information Field Theory

Theory on continuous spaces, actual calculations discretized •

7 Information Field Theory

Theory on continuous spaces, actual calculations discretized •

7 Wiener filter demo

See Jupyter Notebook on google drive. −→ 8 Wiener filter demo

See Jupyter Notebook on google drive. −→ 8 Wiener filter demo

See Jupyter Notebook on google drive. −→ 8 Wiener filter demo

See Jupyter Notebook on google drive. −→ 8 IFT actually works.

8 Imaging the Radio Sky: RESOLVE Challenges to be addressed

Extended sources (exclude point sources for now). • Uncertainty maps. • Low Signal-to-Noise performance. • Reproducibility. •

9 Data U-V-coverage 0.3

0.2

0.1

0.0 v

0.1 −

0.2 −

0.3 −

0.4 0.2 0.0 0.2 0.4 − − u Prior power spectrum, Posterior

power spectrum

2 10 power

smoothness 100

2 10− correlation 4 10−

6 prior, flat Inv- 10−

100 101 102 harmonic mode Gamma prior

Data model

d = Res + n

RESOLVE (without noise estimation)

Inference

10 Prior power spectrum, Posterior

power spectrum

2 10 power

smoothness 100

2 10− correlation 4 10−

6 prior, flat Inv- 10−

100 101 102 harmonic mode Gamma prior

Data model

d = Res + n

RESOLVE (without noise estimation)

Data U-V-coverage 0.3

0.2

0.1

0.0 v

0.1 −

0.2 −

0.3 −

0.4 0.2 0.0 0.2 0.4 − − u

Inference algorithm

10 Prior power spectrum, Posterior

power spectrum

2 10 power

smoothness 100

2 10− correlation 4 10−

6 prior, flat Inv- 10−

100 101 102 harmonic mode Gamma prior

RESOLVE (without noise estimation)

Data U-V-coverage 0.3

0.2

0.1

0.0 v

0.1 −

0.2 −

0.3 −

0.4 0.2 0.0 0.2 0.4 − − u

Inference algorithm

Data model

d = Res + n 10 Posterior

power spectrum

2 10 power

100

2 10− correlation 4 10−

6 10−

100 101 102 harmonic mode

RESOLVE (without noise estimation)

Data U-V-coverage 0.3

0.2

0.1

0.0 v

0.1 −

0.2 −

0.3 −

0.4 0.2 0.0 0.2 0.4 − − u Prior power spectrum, Inference smoothness algorithm prior, flat Inv- Gamma prior

Data model

d = Res + n 10 Information Hamiltonian (without noise estimation)

(d, s, τ ) = log (d, s, τ ) H − P 1 s † −1 s 1 = 2 (d Re ) N (d Re ) + 2 log N | − {z− | }| Likelihood 1 † −1 1 + 2 s S s + 2 log S | {z | }| Prior / regularization † † −τ 1 † + (α 1) τ + q e + 2 τ T τ | − {z } hyper-prior

with τ being the power spectrum of signal covariance S correlation structure:

X τk S = e Sk k

11 Inference Algorithm (without noise estimation)

Approximate posterior: • (s d) = (s m, D) δ(τ τ ∗) P | G − · − Solve for m and τ ∗ (with NIFTy’s1 help): • d

Map m Power spectrum τ ∗

1https://gitlab.mpcdf.mpg.de/ift/nifty

12 Posterior

power spectrum

2 10 power

100

2 10− correlation 4 10−

6 10−

100 101 102 harmonic mode

RESOLVE (without noise estimation)

Data U-V-coverage 0.3

0.2

0.1

0.0 v

0.1 −

0.2 −

0.3 −

0.4 0.2 0.0 0.2 0.4 − − u Prior power spectrum, Inference smoothness algorithm prior, flat Inv- Gamma prior

Data model

d = Res + n 13 RESOLVE (without noise estimation)

Data U-V-coverage 0.3

0.2

0.1

0.0 v

0.1 −

0.2 −

0.3 −

0.4 0.2 0.0 0.2 0.4 − − u Prior power spectrum, Posterior Inference power spectrum

2 10 power

smoothness 100

2 algorithm 10− correlation 4 10−

6 prior, flat Inv- 10−

100 101 102 harmonic mode Gamma prior

Data model

d = Res + n 13 3C391 again.

dirty image 1011 × 5

4

3

2 flux

1

0

1 −

14 3C391 again.

dirty image 1011 × 5

4

3

2 flux

1

0

1 −

14 3C391 again.

reconstructed sky

0.0012

0.0010

0.0008

0.0006 intensity

0.0004

0.0002

14 3C391 again.

14 power spectrum

2 10 power

100

2 10− correlation 4 10−

6 10−

100 101 102 harmonic mode

3C391 again.

relative error 4.0

3.5

3.0

2.5

2.0

1.5 relative error

1.0

0.5

14 3C391 again.

relative error 4.0

3.5

3.0

2.5

2.0

1.5 relative error

1.0

0.5

power spectrum

2 10 power

100

2 10− correlation 4 10−

6 10−

100 101 102 harmonic mode

14 Imaging the Gamma-Ray Sky: D3PO D3PO

D3PO: Denoise, Deconvolve and Decompose Photon Observations. • Assumptions: Like RESOLVE but with Poisson statistics and add • point sources. Paper and codes: www.mpa-garching.mpg.de/ift/d3po. •

15 D3PO in action: 6.5 years all sky data

Data...

Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action: 6.5 years all sky data

Data...Log-data...

Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action: 6.5 years all sky data

Data...Log-data...Denoised...

Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action: 6.5 years all sky data

Data...Log-data...Denoised...Deconvolved...

Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action: 6.5 years all sky data

Data...Log-data...Denoised...Deconvolved...Decomposed.

Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action: 6.5 years all sky data

Data...Log-data...Denoised...Deconvolved...Decomposed.

Selig, Vacca, Oppermann, Enlin (2015) 16 D3PO in action

17 Wrap-up • Continuity. • Correlation structures (power spectra).

• Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction rely on: •

Summary

Information Field Theory is good for: •

18 • Continuity. • Correlation structures (power spectra).

• Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: •

Summary

Information Field Theory is good for: • • Incomplete data.

18 • Continuity. • Correlation structures (power spectra).

• Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: •

Summary

Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device.

18 • Continuity. • Correlation structures (power spectra).

• Estimating power spectra and noise level. Reconstruction algorithms rely on: •

Summary

Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps.

18 • Continuity. • Correlation structures (power spectra).

Reconstruction algorithms rely on: •

Summary

Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level.

18 • Continuity. • Correlation structures (power spectra).

Summary

Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: •

18 • Correlation structures (power spectra).

Summary

Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: • • Continuity.

18 Summary

Information Field Theory is good for: • • Incomplete data. • Little data but much knowledge about measurement device. • Calculating uncertainty maps. • Estimating power spectra and noise level. Reconstruction algorithms rely on: • • Continuity. • Correlation structures (power spectra).

18 mostly without direct user input. Assumptions and approximations written down as statistical models no subjective input at runtime. → Reproducibility. −→ Combine imaging and calibration into one single estimation and • inference framework. More information. −→

Bayesian imaging

Image Data Calibration Calibrated data Cleaning

The Future

Automatize imaging with a Bayesian framework that can estimate: • • all polarizations, • a full spectral 3D model, • uncertainties,

19 Combine imaging and calibration into one single estimation and • inference framework. More information. −→

Bayesian imaging

Image Data Calibration Calibrated data Cleaning

The Future

Automatize imaging with a Bayesian framework that can estimate: • • all polarizations, • a full spectral 3D model, • uncertainties, mostly without direct user input. Assumptions and approximations written down as statistical models no subjective input at runtime. → Reproducibility. −→

19 The Future

Automatize imaging with a Bayesian framework that can estimate: • • all polarizations, • a full spectral 3D model, • uncertainties, mostly without direct user input. Assumptions and approximations written down as statistical models no subjective input at runtime. → Reproducibility. −→ Combine imaging and calibration into one single estimation and • inference framework. More information. −→

Bayesian imaging

Image Data Calibration Calibrated data Cleaning

19 Fermi

COMPTEL

Integral

RXTE Physical components 4D radio-gamma Chandra Tomography 3D radio-gamma ROSAT 3D galactic, 3D lensing 3D radio 3D gamma XMM DSC UBIK 2D radio 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference

LOFAR

VLA

MeerKAT, SKA

Effelsberg-SRT Envisioned In preparation Published Science results

The Future

D3PO 2D event based imaging 1D time-line reconstructions

RESOLVE 2D aperture synthesis

20 COMPTEL

Integral

Physical components

4D radio-gamma Chandra Tomography 3D radio-gamma ROSAT 3D galactic, 3D lensing 3D radio 3D gamma XMM DSC UBIK 2D radio 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference

LOFAR

MeerKAT, SKA

Effelsberg-SRT Envisioned In preparation Science results

The Future

Fermi

D3PO 2D event based imaging RXTE 1D time-line reconstructions

RESOLVE VLA 2D aperture synthesis

Published 20 Physical components

4D radio-gamma Chandra Tomography 3D radio-gamma ROSAT 3D galactic, 3D lensing 3D radio 3D gamma XMM DSC UBIK 2D radio 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference

Effelsberg-SRT Envisioned Science results

The Future

Fermi

COMPTEL

Integral D3PO 2D event based imaging RXTE 1D time-line reconstructions

LOFAR RESOLVE VLA 2D aperture synthesis

MeerKAT, SKA

In preparation Published 20 Physical components

4D radio-gamma Tomography 3D radio-gamma 3D galactic, 3D lensing 3D radio 3D gamma DSC UBIK 2D radio 1D dynamical system classifier Universal Baysian Imaging toolKit DFI 2D dynamical field inference

Science results

The Future

Fermi

COMPTEL

Integral D3PO 2D event based imaging RXTE 1D time-line reconstructions

Chandra

ROSAT

XMM

eRosita

...

Planck

LOFAR RESOLVE VLA 2D aperture synthesis

MeerKAT, SKA

Effelsberg-SRT Envisioned In preparation Published 20 Physical components

4D radio-gamma

3D radio-gamma

3D radio

3D gamma

UBIK 2D radio Universal Baysian Imaging toolKit

Science results

The Future

Fermi

COMPTEL

Integral D3PO 2D event based imaging RXTE 1D time-line reconstructions

Chandra Tomography ROSAT 3D galactic, 3D lensing

XMM DSC 1D dynamical eRosita system classifier

... DFI Planck 2D dynamical field inference

LOFAR RESOLVE VLA 2D aperture synthesis

MeerKAT, SKA

Effelsberg-SRT Envisioned In preparation Published 20 Physical components

4D radio-gamma

3D radio-gamma

3D radio

3D gamma

2D radio

Science results

The Future

Fermi

COMPTEL

Integral D3PO 2D event based imaging RXTE 1D time-line reconstructions

Chandra Tomography ROSAT 3D galactic, 3D lensing

XMM DSC UBIK 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference

LOFAR RESOLVE VLA 2D aperture synthesis

MeerKAT, SKA

Effelsberg-SRT Envisioned In preparation Published 20 Physical components

4D radio-gamma

3D radio-gamma

3D radio

3D gamma

2D radio

Science results

The Future

Fermi

COMPTEL

Integral D3PO 2D event based imaging RXTE 1D time-line reconstructions

Chandra Tomography ROSAT 3D galactic, 3D lensing

XMM DSC UBIK 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference

LOFAR RESOLVE VLA 2D aperture synthesis

MeerKAT, SKA

Effelsberg-SRT Envisioned In preparation Published 20 The Future

Fermi

COMPTEL

Integral D3PO 2D event based imaging RXTE Physical components 1D time-line reconstructions 4D radio-gamma Chandra Tomography 3D radio-gamma ROSAT 3D galactic, 3D lensing 3D radio 3D gamma XMM DSC UBIK 2D radio 1D dynamical eRosita system classifier Universal Baysian Imaging toolKit ... DFI Planck 2D dynamical field inference

LOFAR RESOLVE VLA 2D aperture synthesis

MeerKAT, SKA

Effelsberg-SRT Envisioned In preparation Published Science results 20 The end.

20 ?

20 References

Information Field Theory in a nutshell: 1301.2556. • IFT lecture notes • Papers on RESOLVE: 1311.5282, 1605.04317, 1711.02955. • D3PO: 1410.4562. • Website IFT group. • NIFTy’s git repo: https://gitlab.mpcdf.mpg.de/ift/NIFTy. • Includes example scripts. Licensed under GPLv3. If you use it, feel free to submit bug reports! Information Field Theory: Idea

Redefine • (s, d) = log (s, d) H − P Then, Bayes’ theorem reads: • (s, d) e−H(s,d) (s d) = P = . P | (d) R s e−H(s,d) P D Have rewritten theory in statistical mechanics • language. Can compute arbitrary correlation functions: • D E Z s(x1) s(x2) = s s(x1) s(x2) (s d). P(s|d) D P | Compute MAP-solution minimize (s, d) wrt. s. • ≡ H Compute expectation value of posterior minimize Gibbs energy. • ≡