<<

Packaging and Deployment

Alessandra Forti MCnet 6 September 2018 Development cycle

Packaging packaging Preparing the software testing distributing for distribution and deployment. Distribution.

deployment building Deployment Installing (Configuring) development Updating Uninstalling

Packaging and deployment are part of the development cycle. How you produce your software, build and test will affect also packaging and deployment. 2 Why packaging

● You write a python script to solve a problem. Initially it is small and you keep on editing the file.

● Then you realise that previous versions worked better but you cannot go back because you are still editing the file directly, so you start versioning the file names.

● Testing and basic version control ● A colleague start using it and not only you need to keep track of the versions but you also need to make sure she uses a stable version.

● Basic distribution Why packaging

● You start to add features and modularise the code moving some of it else where.

● The file is now files and you start versioning the directories instead.

● You have introduced internal dependencies ● To give it to your friend you need to tar the directory

● Archive file ● You discover some cool python feature and you add it with some more exotic import.

● External dependencies on non standard packages have now been introduced Why packaging

● The code is popular with other people, but they can only run it on machines with a different version of python and you discover the dependencies are not there.

● Your code has portability issues ● Now not only you need to make sure that when the software is installed all the dependencies are installed too, and that your software is portable.

● Adding installation notes and/or a script to the tar file makes it a package ● At some point you will want it to be part of a eco system and will move to its packaging system. Package Managers

● Before package managers were introduced people installed from source tar balls

● Tarballs were distributed via gridftp servers

● Lists of updates distributed via email ● For each package the administrator had to

● Download the tar file and unpack it

● Run the configuration scripts to see if all the dependencies were there

● Build the makefile and tune it to the system

● Build the package and finally install it ● ./configure –prefix= ….. ; make; make install

6 Package Managers

● Then packages like rpm and deb were invented

● They still didn't manage the dependencies but they distributed pre-compiled binarie. ● Browsable WEB servers replaced the ftp servers

● https://www.rpmfind.net/linux/RPM/ByDate.html

● Still have to look for your rpm download it look at the dependencies, download those too and install them. ● Software repositories and package managers were eventually introduced to simplify the deployment of software even further.

7 Package Managers

● The basic functionality of a is to allow

● Find the packages from a software repository

● Extract package archives and execute the instructions to install.

● Update, remove, upgrade, downgrade, list packages, list- updates, maintain a history

● According to repositories rules ● Ensuring the integrity and authenticity of the package by verifying the signature of a package and its checksum

● Managing dependencies to ensure a package is installed with all packages it requires

● Build custom packages

8 What is a package

● Some package formats

● OS: rpms, deb, pkg

● Python: eggs or wheels

● Ruby: gems

● The variety of packages that one can find depends very much on the variety of package managers

● There are dozens of eco systems all coming with their package formats and package managers

● https://libraries.io lists 36 Package Managers and ~3.4 million packages 9 What is a package

● At the core a package is essentially an archive containing the software, the metadata and/or the executables to install the software.

● Package formats may differ in the language and the syntax but they have many things in common

● A well defined directory structure

● A description of the package

● A method to build it

● A list of dependencies

● A licence file

10 What is a package

rpm htcondor puppet

11 Software repositories

● Package managers wouldn't be enough without the advent of software repositories

● A linux installation comes with its OS distribution software repositories already configured. ● Nothing stops you to add, remove replace repositories.

● In system administration this is standard practice

● Replace the default OS repositories with local mirrors, ● Add repositories where extra software is kept ● Add local repositories for custom built packages ● Add test repositories that are not enabled on the production system but get routinely used on the testbed.

12 Local testing

● Testing a non standard package: singularity

13 Dependencies

is a colloquial term for the frustration of some software users who have installed software packages which have dependencies on specific versions of other software packages.

● The more dependencies you add the more difficult it will be to deploy your package everywhere.

● Particularly if the software depends on shared libraries and packages and some other installed software requires a specific version.

14 Dependencies

● This is real life runtime example that happened to me just 2 days ago

● On CentOS7 the package depended on obsolete software

● On SL6 machine it depended on python 2.7 which is not the default. ● Typical developer answer:

● If it doesn't work out of the box on SL6 install python2.7 and reconfigure your environment to use it. ● This is not a good example of packaging and distribution

● Even if I could find python 2.7 on the distributed FS we use on the grid.

15 The LHC experiments problem

16 Roles

● Outside in the world ● In a HEP experiment

● ● Developers Installer

● ● DevOps Builder

● ● System administrators Release-Manager

● ● User Core-developer ● Physicist-programmer

● User

● End user ● Physics group ● Central production ● Distrbuted Ops

● Experiment ● System admins Grid environment

● We call our distributed computing the grid

● ~150 official sites

● 663k processors

● 334 PB storage ● To solve the portability problem all sites had to ● 4 LHC experiments install the same software. ● Dozens of smaller ● All sites on RH compatible experiments distribution ● A variety of non pledged ● All sites install required resources middleware and OS packages ● ATLAS@HOME ● All sites..... 18 ATLAS release in numbers

● 61966 unique files

● 2176 packages

● 316 external packages

● LCG release common to all experiments

● A lot of dependencies are actually under the atlas tree ● Size

● 35GB with debugging on

● 3.5GB without

● We have dozens of these releases 19 Experiment software

● In HEP at least in ATLAS the releases are the distributed package

● Until recently they were monolithic

● The whole release was compiled and distributed

● Infact each release used to be compiled at each site and installed on NFS servers. ● In the past few years there has been a big effort to modularise the building process (read disentangle the internal dependences) and produce more targeted smaller releases.

● Required a complete change of tools svn → git, cmtcontrol → cmake Distribution nowadays

● CVMFS is an http based read only distributed file system that uses a hierarchy of squid caches to make files available.

● Installation done once centrally automatically appears on every computer with a CVMFS client installed.

● Downloads on the node caches only the requested files reducing the amount of distributed software at any one time.

● Singularity production containers distributed like this

● Integration with containers hubs (docker and gitlab) for user images Distribution nowadays

● CVMFS doesn't work well for users software

● Requires access to the central stratum 0 repository

● It has a lag of few hours between installation and accessibility

● It still requires complicated configuration tools to setup the software

● User code tar balls are still uploaded and compiled to run ● It definitely is not indicated for short development cycles

● Users and even experienced developers that need to test on the grid resort also upload their tar balls some where. ● More recently this has been replaced by the use of containers downloaded from a hub.

● SKA users for example solved their development and testing problem with containers. 22 Portability

● Several dependencies have been added to the CVMFS repositories

● Either directly to the experiment spaces

● Or in shared spaces

● we literally have dependencies distributions called LCG ● But the distributed software is still only compiled for RedHat compatible linux distributions

● Resources are progressively more varied and we cannot afford to impose a single deployment model anymore

● End users also want to run on different platforms

23 Relocation

● Software is relocatable when the top directory it is installed in can be changed.

● Relocation after install is what we are looking after here

● Different types os installation have to be supported

● Installing in CVMFS /cvmfs/experiment.space

● Local installations /usr/local

● Unprivileged user relocation /home/$USER

24 Reproducibility

● CVMFS doesn't help with reproducibility

● Even with the same OS everywhere sites may have different package versions.

● Experiments are all trying to implement what is called software preservation

● The ability to reproduce a user analysis in 10 years as it runs on the day it was produced. ● Some package managers try to guarantuee this but most don't care.

● You cannot avoid security updates for example the bare metal system will always be prone to changes.

25 Packaging for HEP

● HSF (HEP Software Foundation) looking into packaging of HEP software to satisfy all these problems

● Nix functional package manager works on all linux distros and macos, supports versioning, relocation and has excellent reproducibility.

● LCGMake developed at CERN used for the dependencies we distribute via CVMFS.

● Containers: actively looked at in the two major experiments.

26 What is a container?

● Ship in a bottle (Star Trek TNG)

● Professor Moriarty a holodeck character takes control of the ship from the simulation and demands to be freed and brought in the real world.... ● A container is similar: it gives the user a lot of control within the container but in the reality of the host machine it is just another program.

27 What is a container

● It is a lightweight virtualization technology that allows applications to be “packed” with all their dependencies and run wherever they want.

● Chroot and pip virtualenv ● Standardized, portable, self-contained, reproducible software environments

● There is no installation process

● There is no package management to do to run the application on the host machine

● The OS of the host machine and what is installed on it becomes irrelevant to the application.

● Docker images are built in layers and changes can be tagged with a version 28 Containers use cases

● Grid/ any data centre

● Installation of different OS from SL/RHEL/CentOS

● OS upgrades don't need coordination with experiments anymore

● Minimal installation on the nodes if sites prefer

● Allows experiment to run tests with specific software or setups

● Offers another approach to software distribution

● Simplifies running on HPC resources

● Payload isolation

29 Containers use cases

● User

● Continuous integration, release testing

● Analysis preservation and reinterpretation

● Reproducible, interactive development environments for personal development / software tutorials

● What a user runs on his laptop should run on the grid ● Or at the very least it should make it much easier to know why it failed ● Example: the problems I described with the application not working on 2 OS versions because of dependencies wouldn't have happened.

30 Which container

● There are several types of containers on the market.

● In HEP environment the main choices are 2: Docker, Singularity.

● Docker is the most developed is the users and analysis groups choice. But cannot be deployed everywhere.

● Singularity is a really light weight, easy to install executable, that can run as a non-root process. Very easy to deploy

● In addition Singularity images can be easily generated using Docker images.

31 Images

● Of course to do all this someone has to build the images.

● While the users can run wild if they want to

● I haven't gone as wild as some but here a list of my tests

32 Images (read packaging)

● For the experiment software the consensus is to exploit the layering feature and build a consistent hierarchy of images that can be supported by the experiment for different use cases. User +user code

ATLAS base +HEP_OSlibs HPC +releases

OS User +user code Tier3 User +...... CERN IT Middleware End user +middleware

ATLAS

33 Images (read packaging)

● To build images (pretty Dockerfile much like packages) you need to tell docker what FROM atlas/analysisbase to do. USER root RUN mkdir /data ● This is done with a Dockerfile RUN install some extra package ● Each line in a dockerfile COPY ./HelloWorld.py / is called a layer. USER atlas ● So in my example the first thing I'm saying is to build my image on top of the atlas/analysisbase one

34 Dockerhub (read software repository) ● Once I built my image I can upload it to a registry

● https://hub.docker.com/

● From the registry my image is accessible from any machine with network connectivity and I can run it

35 If you are interested in containers An introduction to Docker Sebastian Binet

But as developers please do watch this (more than once) How to make a package manager cry

36