The Role of CPU-Based Correlators in

Walter Brisken [email protected]

National Radio Astronomy

12 February 2020

1 / 15 CPU correlation ∗ I prefer not to over-use term “software correlator” ◦ GPU correlators are also software correlators ◦ All modern correlators contain significant amounts of software ◦ E.g., VLBA hardware correlator had more lines of code than DiFX at time VLBA switched! ∗ CPU correlation is distinguished by its relative portability

2 / 15 The Distributed FX (DiFX) correlator

∗ Was half of Adam Deller’s PhD dissertation project ∗ Since then, developed and maintained by global team ∗ Full-featured F-X architecture VLBI correlator ∗ Now in operational use for: ◦ (VLBA) ◦ VLITE (see Tracy Clarke’s talk) ◦ MPIfR Bonn ◦ ◦ US Naval Observatory (USNO) ◦ Korean VLBI Network (KVN) ◦ Australian Long Baseline Array (LBA) ◦ (EHT) ∗ More info at: https://www.atnf.csiro.au/vlbi/dokuwiki/doku.php/difx/start

3 / 15 DiFX features ∗ Continent-scale baselines and space VLBI ∗ Geodetic VLBI: pulse cal, model accountability ∗ Spectral resolution to 2 Hz ∗ Time resolution to milliseconds ∗ Pulsar gating and binning ∗ Massive (hundreds) multi-phase center capability ∗ Correlation of mis-matched bands (EHT is extreme case) ∗ Polarization conversion (mixed X-Y and R-L) ∗ Floating point input and output (approx. 20 ENOB) ∗ Parallelized on time axis: very flexible ∗ Accepts many data formats, real/complex, upper/lower sideband, interoperably ∗ Operates in realtime (e.g., VLITE) or off-line

4 / 15 DiFX: technologies employed ∗ C++ and C for heavy lifting code ∗ Python for peripheral code / scripts ∗ OpenMPI: cluster-level parallelization ∗ SMP threading: core-level parallelization ∗ Intel IPP: instruction-level parallelization ∗ Multicast XML: monitor and control system ∗ Uses standard VLBI data formats: Mark5B, VDIF, VEX, FITS, MarkIV

5 / 15 Platforms supporting DiFX

Raspberry Pi MAGNUS supercomputer, Laptop (Linux or Apple) Perth, AU ∗ Don’t underestimate the value of running a correlator on a laptop!

6 / 15 VLBA DiFX Correlator

∗ Upgraded late 2019 ∗ 20 processing servers, 24 CPU cores each ∗ 40 Gbps Infiniband backplane ∗ Can process 10 antennas at 1 GHz bandwidth ∗ Power consumption: 7 kW ∗ Rack space: about 1 m ∗ Cost: $100k ∗ Easy today

7 / 15 Some things learned ∗ Be careful about longevity and portability of third-party libraries ◦ So far DiFX has been mostly lucky ◦ pgplot and some Fortran code are getting quite annoying though. . . ∗ DiFX is thermal limited, not I/O limited or instruction limited ◦ Cache hit optimization has been done ◦ Code can only be sped up significantly by reducing mathematical operations ◦ Mind the Thermal Design Power (TDP) when choosing CPUs ◦ Benchmarking and optimization can yield very puzzling results ∗ Performance driven mostly by number of antennas and bandwidth; spectral and time resolution important only in extreme cases.

8 / 15 CPU correlation for EVLA? ∗ Performance requirement: 27 antennas at 8 GHz ∗ Would require 1000 servers (2019 technology) ∗ Power consumption: 350 kW ∗ Rack space: about 50 m (about 25 racks) ∗ Cost: $6M ∗ Failure rate: approx. 1 server per month ∗ Not crazy now; easy in 15 years? ∗ Would have been 1 km tall rack when EVLA was inaugurated

9 / 15 CPU correlation for ALMA2030?

∗ Performance requirement: 70 antennas at 20 GHz Where to put such a correlator? ∗ Would require 20000 servers (2019 technology) ∗ Power consumption: 7 MW ∗ Rack space: about 1 km ∗ Cost: $120M ∗ Failure rate: approx. 1 server per day ∗ Unlikely to be practical

10 / 15 CPU correlation for ALMA2030?

∗ Performance requirement: 70 antennas at 20 GHz ∗ Would require 20000 servers (2019 technology) ∗ Power consumption: 7 MW ∗ Rack space: about 1 km ∗ Cost: $120M ∗ Failure rate: approx. 1 server per day ∗ Unlikely to be practical ∗ Would fit nicely inside the Jeddah Kingdon Tower, Saudi Arabia

11 / 15 CPU correlation for ngVLA?

∗ Performance requirement: 260 antennas at 20 GHz ∗ Would require 300000 servers (2019 technology) ∗ Power consumption: 100 MW ∗ Rack space: about 15 km (could host SOFIA at top) ∗ Cost: $2B ∗ Failure rate: approx. 1 server per hour ∗ Keep dreaming!

12 / 15 Why am I still talking? ∗ Clearly ALMA2030 correlator won’t be a CPU correlator ∗ There are many reasons to include CPU correlators in the ALMA2030 landscape. . . ◦ A reference for correctness and digital efficiency comparison ◦ Testbed for new features ◦ Antenna / array diagnostics ◦ Replacement for ACA correlator?

13 / 15 Proposed: a two correlator solution ∗ Correlator 1: a new GPU, FPGA, or ASIC real-time correlator ◦ Use for 99% of science cases ∗ Correlator 2: a CPU-based offline correlator ◦ Use for perhaps 1 hour of observing per week ◦ Pair with multi-petabyte baseband data storage array ◦ Use for special processing: planetary radar, pulsars, FRBs, . . . ◦ Use for development (correlator, calibration, . . . ) ◦ Use for wide variety of tests ◦ Allow inclusion of non-ALMA antennas in correlation (e.g., EHT) ◦ Reduce worst-case specifications from correlator 1

14 / 15 Conclusions ∗ CPU-based correlators offer considerable flexibility with low development overhead ∗ They won’t displace GPU, FPGA, ASIC correlators for large-N correlation at high bandwidth ∗ They remain highly relevant for I/O restricted cases (e.g., VLBI) ∗ They remain useful in secondary roles alongside heavy lifting correlators ∗ Moving to CPU-based correlation moves support from electronics to software and could have socialogical effects

15 / 15