Design Patterns and Software Techniques for Large-Scale, Open and Reproducible Data Reduction
Total Page:16
File Type:pdf, Size:1020Kb
DESIGN PATTERNS AND SOFTWARE TECHNIQUES FOR LARGE-SCALE, OPEN AND REPRODUCIBLE DATA REDUCTION by Gijs Jan Molenaar A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Supervised by Prof. Oleg Smirnov 2021 Contents Abstract v Declaration of Authorship vii Acknowledgements viii Preface x Publications xi Source Code xii 1 History of data reduction pipelines 1 1.1 Introduction . .2 1.2 1930s . .2 1.3 1940s . .2 1.4 1950s . .5 1.5 1960s . .5 1.6 1970s . .7 Zero-generation calibration to first-generation calibration . .8 Astronomical Image Processing System . 10 CANDID . 10 CLEAN . 11 1.7 1980s . 11 Self-calibration and the second generation of calibration algorithms . 13 GIPSY . 13 DWARF..................................... 14 IRAF ...................................... 15 Miriad . 15 1.8 Meanwhile in South Africa . 15 1.9 1990s . 17 AIPS++ . 17 NEWSTAR . 18 The Measurement Set version 1 . 18 The Measurement equation . 19 DIFMAP . 19 1.10 2000s . 19 The Measurement Set version 2 . 19 i CONTENTS ii MeqTrees, third-generation calibration algorithms . 19 Obit....................................... 19 CASA...................................... 20 Casacore . 20 1.11 2010s . 21 Cuisine . 21 Docker...................................... 21 The ALMA pipeline . 21 Common Workflow Language . 22 DDFacet and killMS . 22 Kliko....................................... 22 Stimela . 23 CARACal (formerly known as MeerKATHI) . 23 Default pre-processing pipeline . 23 The LOFAR two-metre survey pipeline . 24 1.12 Overview of events . 25 1.13 Discussion . 27 2 Fundamentals 29 2.1 Electromagnetic radiation . 30 2.2 Interferometry . 33 Aperture synthesis . 34 The (u; v; w) coordinate system . 36 The radio interferometer measurement equation . 37 2.3 Image reconstruction . 40 2.4 Calibration . 42 Reference calibration . 42 Self-calibration . 44 3 KERN 46 3.1 Introduction . 47 3.2 The target platform . 47 3.3 Other packaging methods . 48 Anaconda . 48 Python and pip . 48 Collaboration with Debian . 49 3.4 Usage . 50 3.5 Notable packages . 50 Casacore . 50 Casacore data . 51 MeqTrees . 51 CASA...................................... 51 AIPS....................................... 52 LOFAR ..................................... 52 Pulsar software . 53 Unversioned packages . 53 3.6 Containerisation . 53 Docker...................................... 53 CONTENTS iii Singularity . 53 3.7 Project structure . 54 The release cycle . 54 Technical structure . 54 3.8 Recommended usage . 55 3.9 Usage numbers . 55 3.10 Conclusions . 56 4 Kliko 57 4.1 Introduction . 58 Software in science . 58 Software containerisation with Docker . 58 4.2 The Kliko specification . 60 The Kliko image . 60 Expected run-time behaviour . 60 Flavours of Kliko images . 61 The /kliko.yml schema . 61 The /parameters.json file . 62 4.3 Running Kliko containers . 63 Running a container manually . 63 Inside the Kliko container . 64 Kliko-run . 64 4.4 Chaining containers . 65 4.5 Example of usage of Kliko . 66 VerMeerKAT . 66 RODRIGUES . 67 4.6 Software availability . 70 4.7 Discussions and prospects . 70 Limitations . 70 Future work . 71 4.8 Conclusions . 71 5 CWL and Buis 73 5.1 Introduction . 74 5.2 The CommonWL standard . 74 CommandLineTool class file . 75 Job file . 75 Workflow class file . 75 Runners . 76 5.3 Buis – the web-based frontend for CommonWL runners . 76 Functional design . 76 Technical design . 78 Usage . 79 5.4 Use case example: a 1GC pipeline . 80 5.5 Discussion . 83 CONTENTS iv 6 Vacuum Cleaner 85 6.1 Introduction . 86.