EPython: Extending Python to the Future

Background

Python has had enormous success in part due to its ease of extensibility and interoperability with other systems. In particular, this has led to huge ecosystems of code that heavily leverage the existing -API for Python. One such ecosystem is the NumFOCUS ecosystem that includes NumPy, Pandas, SciPy, Scikit-Learn, Numba, Dask, and Jupyter. This ecosystem has propelled Python to become the de facto language for data-science and machine-learning. ​ ​

The downside of this success is that the particular dependency on the C-API of these extensions has also made this ecosystem an anchor to the further developing of the C-Python runtime as well as other emerging run-times of Python such as PyPy, , and RustPython.

Many of the core Python developers and the steering committee members recognize this problem and are eager to see something done about it. Something can be done, but it will take a concerted technical and social effort lasting at least 5 years.

Abstract

We will create an embedded domain specific language (DSL) using the Python language itself along with the typing module and specialized objects as necessary to allow nearly all the existing extensions to be written using this DSL. For reference this language is called EPython. ​ ​ We will then port several key libraries such as NumPy, SciPy, and Pandas to use this DSL. We will also provide a runtime so these extensions can work on PyPy, C-Python, Jython, and RustPython.

Timeline

The initial MVP will be completed in 6 months from project kick-off in early 2020 and then a version of NumPy will be ported in the next 12 months. An initial run-time for PyPy and Jython will be completed in the next 6 months. The remaining 3.5 years of the funded project will be improving the language by porting existing modules to use the new Technical Approach

Over the past 15 years, Cython has emerged as a Domain Specific Language (DSL) that is formally a superset of Python with additional non-Python syntax that many extensions are written with: Pandas, scikit-learn, and many other popular libraries are written in Cython.

While popular, Cython predates Python 3.x typing and the rise of typing.py and mypy. It uses non-Python syntax and the code-base is heavily based to producing C-code and interfacing with the C-Python run-time and its C-API. Re-factoring Cython itself to produce EPython would not be straightforward. However, we can use Cython as an example and a guide to what is needed to support extensions and as a target for the C-Python run-time.

MyPy is a static typing analysis engine that also comes with a limited compiler mypyc which compiles typed Python to native code. This will serve as primarily an example and reference. The mypy team is also involved with the typing.py module in Python which will be an important module for defining high-level and low-level types in EPython.

Nuitka compiles Python code into Python extensions and stand-alone native code.

Numba is a dynamic compiler that compiles array-based Python code to native code (including native GPU code) and integrates with Python at run-time. It enables Python developers to create very fast implementations of algortihms using Python syntax.

EPython can be seen as a hybrid of Cython and Numba informed by the typing module, mypy, and Nuitka as visualized in the following image.

Funding Needed

We are seeking commitments for at least $6million over 5 years with a 10% down-payment to kick-off the project and additional 10% payments provided every 6 months as the project hits milestones. To ensure this project is a Community-First project led with many stake-holders we are seeking at least 6 but as many as 30 organizations that can commit $250k to $1million over the life-time of the project.

Work on the project will be managed by Quansight Labs and fulfilled by employees and contractors to the Labs.

Core Developer -- 2 Community evangelists -- 2 Project management -- 0.5 Tech Writer -- 0.5