Strategy for distributing Python software

Marc Paterno 25 March 2021 | SCD Projects meeting Why we’re here

The HEPCloud project has settled upon using standard Python tools for building the HEPCloud Decision Engine (DE). They would like to distribute the DE using standard Python tools. They want to make sure to be doing it the right way: legally (involving Technology Transfer office), and technically (my particular interest), and politically (want the support of the CS and SCD). We think the tools and techniques used for DE can and should be recommended for use by other projects. management support of a specific strategy will help get it more widely applied.

2/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software Components of the strategy

setuptools Python Package Index (PyPI) conda spack Keeping up

3/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software setuptools

Build using setuptools, the standard Python tool for building source and built distributions (sdist and bdist). HEPCloud has a candidate best practice document at https://github.com/HEPCloud/decisionengine/wiki/Setuptools. This is suitable for building Python projects. This includes ones that contain components that are compiled. It does not include compiled libraries that have nothing to do with Python.

4/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software PyPI

The Python Package Index (PyPI) is a repository of software for the Python programming language. . . . Package authors use PyPI to distribute their software.

When you install something with install . . . , you are using PyPI. setuptools (used appropriately) creates packages suitable for distribution on PyPI.

5/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software PyPI packages have an author and one or more maintainers, but not owners.

Sometimes the author is a group of some sort (e.g. “NumPy Developers”) Maintainers are mostly individuals (e.g. even for numpy), but can be an organization (e.g. “microsoft” and “azure-sdk” for azure-keyvault-keys: https://pypi.org/project/azure-keyvault-keys)

6/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software PyPI Terms of use Full terms of use at https://pypi.org/policy/terms-of-use/. Some important bits: I retain all right, title, and interest in the Content (to the same extent I possessed such right, title and interest prior to uploading the Content), but by uploading I grant or warrant (as further set forth below) that the PSF is free to disseminate the Content, in the form provided to the PSF. Specifically, that means: If I upload Content covered by a royalty-free license included with such Content, giving the PSF the right to copy and redistribute such Content unmodified on PyPI as I have uploaded it, with no further action required by the PSF (an “Included License”), I represent and warrant that the uploaded Content meets all the requirements necessary for free redistribution by the PSF and any mirroring facility, public or private, under the Included License. I asked Aaron Sauers to look at the terms of use, and he has said they look OK. He reminded me that getting the necessary reviews before going public with software will take a couple of weeks.

7/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software conda

Conda is an open source package management system and environment man- agement system that runs on Windows, macOS and . Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.

conda is excellent for managing complete environments that rely upon compiled code as well as Python. conda-forge (https://conda-forge.org) is a critical part of producing well-curated (and reliable, on multiple platforms) package distributions. Packages built properly with setuptools and distributed on PyPI are very easy to package for conda.

8/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software spack

Spack is a package management tool designed to support multiple versions and configurations of software on a wide variety of platforms and environments. It was designed for large supercomputing centers, where many users and application teams share common installations of software on clusters with exotic architectures, using libraries that do not have a standard ABI. Spack is non-destructive: installing a new version does not break existing installations, so many configurations can coexist on the same system.

Packages built properly with setuptools and distributed on PyPI are very easy to package for spack.

9/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software Keeping up

Python best practice changes over time. We should follow changes in best practice as they arise. This includes moving to a successor to setuptools, when a successor arises. This is already being discussed in the PyPA (Python Packaging Authority). See https://www.pypa.io/en/latest/roadmap/. Some items that seem to get common mention: twine (https://github.com/pypa/twine) for uploading to PyPI flit (https://flit.readthedocs.io/en/latest) for pure Python packages poetry (https://github.com/python-poetry/poetry) for dependency management

10/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software Informal survey of projects current on PyPI 1

Core projects cigcert: Get an X.509 certificate with SAML ECP and store proxies: Duncan MacLeod (not FNAL); author Dave Dykstra. pyfnalsnow: SNOW JSON API access : Tim Skirvin Experiment-related muonic: Software to work with QNet DAQ cards : Several maintainers, none found in Fermilab telephone directory.

1Search conducted with https:// pypi.org/search/ ?q=f nal+fermilab.

11/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software Informal survey of projects current on PyPI (continued)

Research projects awkwardql: SQL-like language for awkward arrays : Lindsey Gray coffea: Tools for doing Collider HEP style analysis with columnar operations : Lindsey Gray fnal-column-analysis-tools: Tools for doing Collider HEP style analysis with columnar operations at Fermilab : Lindsey Gray hepaccelerate: Fast kernels for analyzing jagged columnar data common in HEP : Joosep Pata (CERN) False positives openPMD-api: C++ & Python API for Scientific I/O with openPMD : Found because Jim Amundson was the contributor of a compilation fix. sepaxml: Python SEPA XML implementations : Found because “FNAL” appears in example data, appears to be an abbreviation of “final”.

12/12 25 March 2021 | SCD Projects meeting Paterno | Strategy for distributing Python software