KERN Party the 3Rd Packages
Total Page:16
File Type:pdf, Size:1020Kb
KERN Gijs Molenaara,b, Oleg Smirnovb,a aSKA Sounth Africa, 3rd Floor, The Park, Park Road, Pinelands, 7405, South Africa bDepartment of Physics & Electronics, Rhodes University, PO Box 94, Grahamstown, 6140, South Africa Abstract KERN is a bi-annually released set of radio astronomical software packages. It should contain most of the standard tools that a radio astronomer needs to work with radio telescope data. The goal of KERN is to save time and prevent frustration in setting up of scientific pipelines, and to assist in achieving scientific reproducibility. Keywords: software, packaging, radio astronomy, reproducible science, containerisation 1. Introduction intended to be used for radio astronomical data re- duction. The installation of scientific software for use in astron- omy can be notoriously challenging. The radio astronomy The name KERN means ‘core’ in Dutch and Afrikaans. community has a limited number of dedicated software en- gineers and often lacks the human resources to dedicate to 2. The target platform industrial-quality software development and maintenance. It is not uncommon that poorly written and badly main- A quick look around various astronomy institutes and tained software packages of high complexity are used by universities shows that GNU/Linux and OS X are the most scientists around the world, as these provide some set of used personal computing platforms. On the server side algorithmic features not available elsewhere. it is without question GNU/Linux. Compared to OS X, KERN has been created to facilitate scientists and sys- GNU/Linux is an open source and a freely available plat- tem administrators. KERN is the name of the project to form, which is also a clear advantage. These facts com- structure and automate the packaging of scientific soft- bined result in the choice for Linux as the KERN platform. ware. The main deliverable is the KERN suite, a bi- However, further consideration was required before se- annually released set of 3rd party open source scientific lection of the most suitable platform. There are various software packages. flavours of GNU/Linux, with different design philosophies The primary goals of KERN are: to make it easier to in- and varying packaging formats. The most popular dis- stall the scientific software, to supply a consistent working tributions can be split into two groups, RPM (Red Hat environment to a scientist and to improve interoperability Package Manager) package and Debian package based dis- and interchangeability. tributions. There is no major advantage or disadvantage Due to human resource limitations, we target KERN to either package format. Although there are diverse local to one operating system and distribution. This is unfor- trends, it is our experience that in the South African radio tunate, but recent development and adaptation of con- astronomy community, the majority of frequently utilised tainerisation technology make it easier to deploy packaged platforms are Debian based, specifically Ubuntu LTS. This software on most platforms. Limiting us to only one plat- distribution also appears to have popularity worldwide. arXiv:1710.09145v1 [astro-ph.IM] 25 Oct 2017 form enables us to focus on performing the packaging only Therefore, it was the most logical choice as KERNs target once, and to do this well. The choice of this one platform platform. is then based on install base (desktop, server) and ease of use for user and developer (package creator). The intended audience of this paper is threefold: 3. Other packaging methods • the user, who is interested in using the software bun- In this section we discuss other packaging systems avail- dled with KERN, able to us, and motivate our choice not to use these. • the developer, who wants his radio astronomy soft- 3.1. Anaconda ware available to a wider range of users and A packaging effort named Anaconda is currently gain- • the system administrator who is setting up systems ing popularity. Anaconda is a cross platform set of scien- tific software, with a focus on Python. It supports GNU/Linux, Preprint submitted to Astronomy and Computing, elsevier October 26, 2017 Windows and OS X. OS X is also often used as a desktop casacore depends on the casacore package and both pack- environment in radio astronomy. Supporting OS X would ages need to have a matching ABI. If the version of casacore be advantageous for many end users. differs between compile time and run time, the library does We have performed experiments with packaging pack- not work. Pip does not have any control over reinforcing ages for Anaconda. Users have reported that Anaconda the shared library version. This would imply that we need is easy to operate; however, at the time of writing, the to create, upload and maintain wheels for every casacore packaging procedure is cumbersome for the developer. In released. Moreover, with every casacore release we would effect, developers cannot generate the same high quality, also need to create a wheel for every OS and supported seamlessly installable packages, as is achievable with na- profile. The exponential growth of this cartesian product tive Linux packaging methods. Additionally, Anaconda quickly becomes cumbersome for the package maintainers. lacks an equivalent to Debians Lintian, a packaging tool These limitations combined with pip’s inability to han- that dissects a Debian package and tries to find bugs and dle non-Python libraries, make pip an ill suited candidate violations of the Debian policies. for our packaging effort. Nevertheless, this does not mean Various software packages in KERN are not created pip and KERN cannot be combined. with OS X support in mind thus requiring various modi- The approach we adopt is that we prepackage impure fications to the source code. Also, compilation procedures python libraries and bundle them with KERN. These Python can vary greatly across platforms, doubling the packaging packages are then precompiled against a set of specific li- and maintenance effort if we would support OS X as a brary versions. This guarantees the ABIs always match platform. up. Users can then augment the system python installa- In addition, Linux distributions come with a large set of tion, or a Python virtual environment, with packages from prepackaged software, which eliminates the need to pack- the packaging index using pip. age many dependencies. Using Anaconda would necessi- Although KERN supports both Python 2 and Python tate packaging up many dependencies ourselves. 3, most Python packages in KERN only support Python The limitations of Anaconda led to the preference of version 2. For now, the only package that supports Python Debian- over Anaconda packages. 3 is python-casacore. The KERN Python 3 package for casacore is named python3-casacore. 3.2. Python and pip The Python programming language has become the 3.3. Collaboration with Debian most widely used language in astronomy [2]. Python comes Ubuntu is directly based on Debian and thus is similar with a package manager called pip. Pip assists in down- to Debian. Nonetheless, due to version differences in the loading and installing Python packages from the Python bundled libraries in each distribution, KERN packages are package index (PyPi). Another useful tool for setting up unlikely to run on Debian. Fortunately the packaging pro- Python environments is called virtualenv. Virtualenv cedure is identical for Debian and Ubuntu, which makes enables a user to set up one or more isolated Python envi- creating true Debian packages a matter of a recompilation ronments without system administrator rights. The com- of the source package. bination of these two tools enables the end user to set up In contrast, the build system, dependency management various custom environments with specific versions of de- and library management for RPM is completely different. pendencies. For pure Python projects, pip and virtualenv Porting our packages to RPM is non-trivial, and main- are cross platform and independent of the host operating taining support for RPM based distributions would imply system package manager. However, pip is less suited for doubling the required effort. impure Python projects. Some Python libraries depend We have established a collaboration with Debian de- on non-Python run time libraries and/or non-Python de- velopers, and some packages from KERN (e.g. casacore velopment headers compile time, making them “impure”. and aoflagger) have been incorporated into Debian di- A recent improvement to the Python Packaging system is rectly. These packages have been uploaded to the Debian the introduction of wheels. Wheels are pre-compiled bi- archive, and changes are synchronised between KERN and nary Python packages. These do not require compilation the Debian archive. and will work if the packaged library does not have unusual Not all KERN packages are suitable for uploading to requirements. An example of a independent binary wheel Debian. Packages with a small user base or packages that is Numpy. Numpy only depends on Python and a small are fragile and receive continuous fixes (as opposed to for- set of system libraries. The Application Binary Interface mal release) are not well suited to this distribution model, (ABI) differs across host platforms and python versions, since it can take some time before a package ends up in a requiring a wheel for every platform and python combina- Debian release. For the more stable packages in KERN, tion. These are supplied on PyPi and the correct version we expect a continuation of this effort, with more packages is automatically selected by pip on installation. ending up in Debian in the future. Problems arise for example, with the python-casacore package, which has more unusual dependencies. Python- 2 4. Usage the casacore data package is not required if one is not cal- culating e.g. coordinate conversions, but in practice many To use the packages of KERN, one needs to add the components of casacore will fail or give warnings if the KERN remote repository to the system.