DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING 300 CREDITS, SECOND CYCLE STOCKHOLM, SWEDEN 2016

Benchmarking Python Interpreters

MEASURING PERFORMANCE OF CPYTHON, , AND PYPY

ALEXANDER ROGHULT

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Benchmarking Python Interpreters

Measuring Performance of CPython, Cython, Jython and PyPy

ALEXANDER ROGHULT

Master’s Thesis in Computer Science School of Computer Science and Communication(CSC) Royal Institute of Technology, Stockholm Supervisor: Douglas Wikström Examiner: Johan Håstad

Abstract

For the Python there are several dierent interpreters and implementations. In this thesis project the performance regarding execution time is evalu- ated for four of these; CPython, Cython, Jython and PyPy. The performance was measured in a test suite, created dur- ing the project, comprised of tests for Python dictionaries, lists, tuples, generators and objects. Each test was run with both integers and objects as test data with varying prob- lem size. Each test was implemented using Python code. For Cython and Jython separate versions of the test were also implemented which contained syntax and data types specific for that . The results showed that Jython and PyPy were fastest for a majority of the tests when running code with only Python syntax and data types. Cython uses the Python /API and is therefore dependent on CPython. The per- formance of Cython was therefore often similar to CPython. Cython did perform better on some of the tests when using Cython syntax and data types, as it could therefore de- crease its reliance to CPython. Each interpreter was able to perform fastest on at least one test, showing that there is not an interpreter that is best for all problems. Referat

Jämförelse av Pythoninterpreterarna CPython, Cython, Jython och PyPy

Det existerar flera olika implementationer och interpretera- re för programmeringsspråket Python. I detta examensar- bete evalueras prestandan avseende exekveringstid för fyra av dessa; CPython, Cython, Jython och PyPy. Prestandan mättes i en testsvit som skapades i detta projekt. Testsviten bestod av tester för Pythons dictionary, list, tuple, genera- tor och objekt. Varje test kördes med både heltal och objekt som testdata med varierande problemstorlek. Varje test var implementerat i programmeringsspråket Python. För Cyt- hon och Jython implementerades ytterliggare en version av testerna som innehöll syntax och datatyper specifika för dessa interpreterare. Resultaten visade att Jython och PyPy var snabbast för en majoritet av testerna som endast använde sig av Pyt- hons syntax och datatyper. Cython använder sig av Pyt- hons C/API och är därför beroende av CPython. Prestan- dan av Cython var därför lik CPythons. Cython presterade bättre på vissa av testerna som utnyttjade Cythons syntax och datatyper, då den därmed kunde minska sitt beroende av CPython. Varje interpreterare lyckades prestera snab- bast på minst ett test. Detta visar att det inte finns en interpreterare som är mest lämpad för alla problem. Contents

1 Introduction 1 1.1 Purpose ...... 1 1.2 Motivation ...... 1 1.3 Limitations ...... 2 1.4 ProjectPrincipal ...... 2

2 Background 3 2.1 Python ...... 3 2.2 Cython ...... 4 2.3 PyPy...... 6 2.4 Jython...... 7 2.5 PythonDataTypes ...... 8 2.5.1 List ...... 8 2.5.2 Tuple ...... 9 2.5.3 Dictionary ...... 9 2.5.4 Generators ...... 9 2.6 Profiling...... 9 2.6.1 gtime ...... 10 2.6.2 cProfile ...... 10 2.6.3 profilehooks ...... 10 2.7 Benchmarking...... 10

3 Method 13 3.1 ProjectBreakdown...... 13 3.2 Hardware ...... 14 3.2.1 Mac Pro Intel® Xeon® W3530 2.80 GHz 16 GB RAM . . . . 14 3.2.2 Dell Intel® Core™ i7-5600U CPU @ 2.60GHz 8 GB RAM . . 14 3.3 Software Environment ...... 14 3.4 Profiling...... 14 3.4.1 gnu-time ...... 15 3.4.2 cProfile ...... 15 3.5 GeneralizingtheProblem ...... 16 3.5.1 func_d ...... 18 3.5.2 func_e ...... 18 3.5.3 func_n-func_q ...... 18 3.6 Tests...... 19 3.6.1 Dictionary ...... 20 3.6.2 List ...... 21 3.6.3 Tuple ...... 22 3.6.4 Generator ...... 22 3.6.5 Objects ...... 23 3.7 TestData...... 23 3.8 InterpreterSpecificCode...... 24

4 Results 25 4.1 Discrepancies ...... 25 4.1.1 Jython ...... 25 4.1.2 PyPy ...... 28 4.2 InconsistentResults ...... 28 4.3 CPython...... 31 4.4 Cython ...... 31 4.5 Jython...... 31 4.6 PyPy...... 32

5 Discussion 33 5.1 Cython ...... 33 5.2 Jython...... 34 5.3 PyPy...... 35 5.4 Compatibility ...... 36 5.5 Hardware Dependencies ...... 37

6 Conclusion 39

7 Future Work 41 7.0.1 Memory Benchmarks ...... 41 7.0.2 OtherInterpreters ...... 41

Bibliography 43

Appendices 46

A Cython Code Compilation 47 A.1 List Comprehension Using Python Object ...... 47 A.2 List Comprehension Using Cython Extension Type ...... 50

B Decompiled Class File Code 53 B.1 Dictionary Insert Using Python Dictionary ...... 53 B.2 Dictionary Insert Using java.util.HashMap ...... 53 C PyPy JIT Optimizations 55 C.1 List Comprehension Optimizations ...... 55

D Charts 59 D.1 DictionaryInsertTests...... 59 D.2 Dictionary Merge All Keys Match Tests ...... 63 D.3 Dictionary Merge Half Keys Match Tests ...... 68 D.4 Dictionary Merge No Keys Match Tests ...... 73 D.5 Dictionary Overwrite Tests ...... 78 D.6 Dictionary Read Tests ...... 83 D.7 GeneratorTests...... 86 D.8 ListAppendTests ...... 89 D.9 ListComprehensionTests ...... 93 D.10 List Sort Tests ...... 96 D.11ObjectsAddTests ...... 99 D.12ObjectsGenerateTests ...... 102 D.13 Tuple Append Tests ...... 104 D.14 Tuple Sort Tests ...... 107

Chapter 1

Introduction

Python is a dynamic programming language that was created in the early 1990s. Python uses an interpreter to execute Python code. Through the years several dierent interpreters and implementations have been created. In this degree project four of these interpreters will be analyzed and benchmarked in order to determine their performance for dierent problems.

1.1 Purpose

The objective of this degree project is to analyze the variation in performance when using a dierent Python interpreter than the original CPython implementation. The interpreters that will be evaluated are Cython [1], Jython [2] and PyPy [3]. The analysis will be conducted as described in chapter 3. The scientific question that is aimed to be answered is:

How do the dierent Python interpreters Cython, Jython and PyPy dier in performance compared to CPython and what are the causes for this?

1.2 Motivation

Python is a high-level programming language that allows the to trans- late ideas into working code without much diculty. As with many other program- ming languages that have an abstraction layer, one of the downsides of Python is performance [4]. Over the years, alternative interpreters have been created in at- tempts to maintain Python’s easy syntax while increasing its performance. Perfor- mance evaluation of Python interpreters has already been done by Riccardo Murri [5], the evaluation did not include Jython and involved versions of PyPy and Cython that are now outdated (2.1 and 0.19 respectively). Current versions of PyPy and Cython are 4.0.1 and 0.23.4 which will be used in this project. It is the hypothesis of the author that the interpreters PyPy and Cython in this study will perform faster than CPython for each of the tests, as these interpreters

1 CHAPTER 1. INTRODUCTION were designed for this task. It is, on the other hand, dicult to make assumptions on which interpreter will have the highest performance.

1.3 Limitations

In this project the tests that were created were limited to being written in the Python programming language or in the tested interpreters own programming lan- guage. This means that all code was written in Python or Cython. Cython and Jython allow importing modules written in C/C++ and Java respectively, but when implementing the interpreter specific tests this was not done. It is the wish of Tri- Optima to continue using Python and the tests were therefore limited to this.

1.4 Project Principal

This degree project was conducted at TriOptima, a global software company de- livering services for the financial sector. One of their products, triReduce, is a multilateral portfolio compression service for OTC derivatives, helping clients man- age post trade risk. A part of triReduce is written in Python 2 and outputs a proposal agreement to be signed by all the participating banks, based on the data in the clients portfolios. A proposal is passed through several verification steps to insure its correctness. As proposals only remain valid for a fixed amount of time due to the market moving, it is important that triReduce can generate a correct proposal within a very short time frame. As client portfolios have grown larger, so has the execution time. As TriOptima wish to continue using Python, it is in their interest to evaluate if using a dierent interpreter is a viable method in order to increase performance in their systems.

2 Chapter 2

Background

2.1 Python

Python is a programming language created by in the early 1990s [6]. Python supports imperative, functional and object-oriented programming paradigms with focus on software quality, simple syntax and developer productivity [7]. Python is a dynamically typed interpreted programming language, meaning that one does not need to declare a variables type in the code and the code does not need to be compiled beforehand. Instead, when a Python program is run, the Python has an interpreter that compiles the Python code into which the Python (PVM) then executes. The PVM simply iterates over the instructions in the bytecode, executing them one after the other, until the program exits. During execution the PVM determines the types of the variables and translates the bytecode instructions to machine instructions that the CPU understands [8]. Both translation to bytecode and execution performed by the PVM are included in the Python interpreter. In short the Python runtime structure can be summed down to two tasks; Python is compiled to bytecode which is executed by the PVM. The benefit of this method of execution is that Python code is platform in- dependent, it is only the interpreter that needs to know the specifics of dierent platforms. Translating to bytecode before execution is also an optimization step as bytecode can be executed faster than the original source code [7]. A downside is that interpretation of bytecode and determining variable types during runtime takes its toll on performance compared to running precompiled code [7]. CPython is a source code interpreter and the original and standard implemen- tation of Python [7]. It is open source and implemented in portable ANSI C [9]. CPython is available from https://www.python.org/. As most popular open source projects, CPython has a large development community with several contrib- utors and core developers. Often when Python is mentioned in books or websites it is CPython that is being referred to [7]. CPython provides a C-level API known as the Python/C API [10]. It allows developers to create extension modules or to

3 CHAPTER 2. BACKGROUND embed Python into other applications. A Python extension module is a compiled C dynamic library that can be run by the Python interpreter. While working like a standard Python module, extension modules are compiled into by a standard C . When the PVM executes code in an extension module it no longer interprets bytecode. Instead it directly runs the compiled code in the extension module. This removes the overhead of the interpreter [8]. The intellectual property rights of Python are held by the Python Software Foundation with Guido van Rossum as president [11]. The Python Software Foun- dation is a non-profit organization that works to "promote, protect, and advance the Python programming language...". Though CPython is the standard implementation of Python there exist other interpreters and language extensions. The interpreters evaluated in this project, Cython, Jython and PyPy, are among these and will be presented in the following sections.

2.2 Cython

Cython is a Python language extension based on Pyrex [12], a programming lan- guage that mixes Python with C data types and a compiler which compiles into C extensions for Python. Cython was ocially forked from Pyrex as a new project in 2007 by developers at Sage, now known as SageMath [13] [14]. Cython is both a programming language as well as an optimizing static compiler [1]. The programming language is a superset of the Python language which allows easy access to C/C++ functions as well as C types. Because of this, Python code is, with a few exceptions, already valid Cython code. The Cython programming language adds a few keywords, for example the cdef keyword in order to add static typing. The Cython compiler transforms Cython code into ecient C/C++ code which can be compiled into a Python extension module or a standalone executable. Writing an extension module normally requires knowledge of the Python/C API. Cython removes this knowledge requirement, allowing the user to write Cython code which the Cython compiler compiles into platform-specific extension modules. Cython uses the Python/C API and is therefore dependent on CPython. As Robert Bradshaw, one of the core developers of Cython, describes it in a presentation of Cython [13], the goal of Cython is to be an optimizing Python compiler. Developers using Cython should be able to provide the Cython compiler with Python code. The compiler then produces equivalent C code that is as fast as possible, without the need for the developer to know the Python/C API [15]. If one wants the code to run even faster, Cython allows adding static types to variables [16]. C is a statically typed language in which the compiler requires the developer to declare the type of each variable, whereas Python is a dynamically typed language. Dynamically typed languages are often slower than static languages, but allows the developer to write code that does more with less code [17]. Cython is not limited to only typing C types, it also allows statically typing Python types. It is for example

4 2.2. CYTHON possible to statically type a variable as a Python list or tuple. By adding static typing to Cython variables, it is possible for the interpreter to execute code without the need to determine the types of these variables. According to Robert Bradshaw this can lead to hundredfold speedups [13]. Cython is not only valuable for Python developers wishing to speed up certain areas of their code. Cython can be used by C/C++ developers that wish to wrap their libraries for use in Python. Cython has the possibility of automatically inferring the type of variables. As a precaution this is by default only done when typing variables cannot change the semantics of the code. By using the infer_types compiler directive the user can instruct Cython to be more free in its automatic typing. Using this option puts more responsibility on the user. It is important that the user is sure that semantics do not change or that integers do not overflow. The speedup achieved by Cython depends on what the code is designed to do. According to the Cython FAQ, Cython is very good at increasing performance for if-elif-else control structures, for-loops as well as common built in types, such as lists, dicts and strings [18]. According to Smith, Cython also performs really well, compared to CPython, for mathematical operations and function calls [8]. As previously mentioned, adding static type declarations prevents the interpreter from needing to look up the type of a variable. When for example two objects, let us call them a and b, are added together in Python, the interpreter first needs to determine their type and call its __add__ method with a and b as parameters. In the __add__ method the parameters need to be unpackaged to extract the underlying C type. If a and b would for example be floats their underlying C type would be C doubles. The actual add operation can only be performed when the underlying C type is unpackaged. In Python, numbers, strings and tuples are immutable, meaning that their value is unchangeable [19]. This means that the result needs to be stored in a new float object. This whole procedure is performed during runtime and is a lot of overhead. When static typing is used this process is not needed as the types of a and b are already known. The addition is compiled into one machine code instruction which is then called during runtime. In Python, functions are first-class citizens. This means that they behave just as objects, containing a state, and have a more advanced behavior than C functions [8]. Function pointers in C add some advanced functionality, but Python functions can be created both at import time and dynamically at runtime, created anonymously using the lambda keyword, defined inside other functions, called with keyword ar- guments and defined with default parameter values, all of which is not possible in C. A downside of this is that it comes at a cost of performance. Calling Python functions is several orders of magnitude slower than calling C functions [8]. Cython allows typing of functions which are able to perform faster than Python functions as they are compiled into highly optimized C code and able to directly call functions in the Python/C API. Just like variables, the overhead performed by the interpreter is bypassed. In fact, calling a statically typed function in Cython is as ecient as calling a pure C function [8]. The cdef keyword used to add static typing to variables and functions can also

5 CHAPTER 2. BACKGROUND be used in front of class definitions. This indicates that the class is to be compiled into a type in the C level [8], called extension types. Extension types access class data and methods directly through the compiled C code, bypassing the interpreter overhead.

2.3 PyPy

PyPy is an implementation of the Python programming language that is written in RPython, a subset of the Python language, with its own interpreter [20] [21] [22]. It implements Python 2.7.10 and passes the Python test suite with some minor modifications [23]. The project was started in 2003 by Armin Rigo and Holger Krekel but has since then gathered several contributors and core developers [24]. PyPy is designed to perform faster than CPython by using a tracing Just-in- Time (JIT) compiler. A JIT compiler, as the name suggests, compiles code during execution instead of before, as an Ahead-of-Time (AOT) compiler would do. The JIT used in PyPy is a meta-tracing JIT compiler. It does not encode any language semantics or profile the execution of the program. Instead it profiles the execution of the interpreter running the program [25]. The PyPy team chose to implement the JIT this way in order to make it independent of programming languages. By tracing the execution of the interpreter instead of the program execution the JIT picks up the language semantics from the interpreter. The interpreter on the other hand has to be written in RPython for PyPy to be able to do this [20]. By tracing the interpreter, the JIT compiler attempts to find areas of code that are executed several times, so called hot spots. These are found by counting the number of times certain parts of the code are run. When a hot spot is found, PyPy switches to a special mode, called tracing mode, where all the operations of the next execution are recorded into what is called a trace. The trace is a list of operations, their operands as well their results from one run of the hot spot, for example one run of a loop. The trace is compiled into optimized assembler code, which runs extremely fast, and is ready to be used the next time the hot spot is executed [20]. PyPy uses several optimizing techniques in their compiler; constant folding, common subexpression elimination, function inlining and loop invariant code motion among others [25]. The trace also contains guards for each point in the recorded code that could branch o into another direction, for example at an if-statement. When the trace is compiled to machine code, each guard is compiled into a check that the execution is still correct. If it is not, the interpreter once again takes over execution. If a guard failure occurs more times than a certain limit, PyPy will attempt to compile the new execution branch as well. By default this value is set at 200. PyPy uses a set of parameters to determine when it should enter tracing mode. These are viewable through the command " –jit help". This shows for example that loops are by default counted as hot after 1039 iterations and that functions are traced from the start after 1619 runs. The guard failure limit, also known

6 2.4. JYTHON as trace_eagerness, is also set through this command. According to Maciej Fi- jalkowski, the reason the values 1039 and 1619 are used is because they are prime numbers. Using prime numbers decreases the chance of the tracer hitting obscure cases and increases the tracers independence [26]. Increasing the aggressiveness of the JIT compiler also increases the overhead of the tracer and compiler. If the work needed for tracing and compiling exceeds the gains of running the compiled machine code PyPy will perform slower than the regular Python interpreter. Therefore how well PyPy performs depends on how well these parameters fit with the running program [27]. Despite this, the mindset of the PyPy developers is that if a program runs slower in PyPy than in CPython it is a bug in PyPy [28].

2.4 Jython

Jython, originally known as JPython, was created by Jim Hugunin in 1997 [29]. Barry Warsaw, the primary maintainer of JPython after Jim Hugunin, changed the name to Jython in 1999 with the release of Jython 2.0. Jython is an implementation of the Python programming language. Unlike CPython, which is implemented in C/C++, Jython is implemented in Java. Jython code follows the Python syntax but upon execution the Jython code is compiled into , which in turn is executed by the instead of the Python Virtual Machine. This allows Jython users to access any Java class as well as all libraries available for the Java platform from their Python code. Jython is not only valuable for Python developers wanting to utilize the strengths of Java. Java developers wishing to make use of Pythons language semantics can also use Jython to mix in Python code into their Java project. The authors of the book Jython Essentials [30] mean that one of Pythons strengths is the way it simplifies writing small utility tools and scripts, something that Java was not developed to be good at. With Jython, these can easily be written and incorporated into Java projects or simply run on the JVM. Jython was developed to be a bridge between Java and Python, not focusing on performance issues. In the book Jython Essentials the speed question is brought up in one of the chapters. Pedroni and Rappin state that CPython takes 75% of the time Jython takes. Note that this book was published in 2002, with the test using Python 2.1, Jython 2.1a3 and Sun’s JDK 1.3. Since Jython runs on the JVM it is highly reliant on the performance of and optimizations done by it. Over the last couple of years the JVM has seen several improvements to performance. In the Java SE 6 Performance White Paper [31] it can be read that performance improvements have been done to, among other things, garbage collection, multi- threading and array copying. Though the paper, being from October 2007, is quite old it shows that performance has been on the agenda in the releases since Java 1.3. It is possible that improvements done to the JVM in the last couple of years will show dierent results in this project. The Oracle Java HotSpot JVM, used in

7 CHAPTER 2. BACKGROUND this project, is just like PyPy equipped with a JIT. The Java HotSpot VM gathers statistics of how the bytecode is executed, looking for methods which are run many times. Up until some of the later releases of Java SE 7 the Java HotSpot VM used two dierent compilation techniques, distinguishing between applications requiring a quick startup time and long running applications gaining from more aggressive optimization techniques. One could choose which of these to use by using either -client or -server switches respectively. In Java SE 8, used in this project, the standard is to use what is called tiered compilation. Tiered compilation uses the client compilation mode at the start of the application and then switches over to the server compilation mode when the application is ”warmed up” [32]. This allows for a quick start of the program while later on switching to an optimization technique that provides a better performance in the long run. When running in server mode the Java HotSpot VM will regard methods as hot when they have been called 10 000 times [33]. Methods regarded as hot will be compiled to highly optimized machine code on a separate JVM while the interpreted code is being executed, al- lowing for the application to continue executing without interruption [32]. When the method is compiled the JVM will switch over to using the compiled version. The Java HotSpot VM also uses guards in order to guarantee that the compiled version is correct and should be executed. If a guard fails the Java HotSpot VM reverts back to executing the interpreted code. When compiling to machine code the JIT uses several optimization techniques; inlining, monomorphic dispatch, loop optimization, type sharpening and dead-code elimination are just a few of these [32]. Each Jython 2.x release corresponds to the equivalent CPython 2.x release. This project will use Jython 2.7 which corresponds to CPython 2.7. According to the Jython General Information Wiki page Jython 2.7 implements nearly all of the Core Python standard library modules and uses the same regression tests, with some minor modifications, as CPython [29].

2.5 Python Data Types

This project will be using the built in data types, list, tuple and dictionary as well as python generators. A brief explanation of these can be found below.

2.5.1 List Lists in Python are mutable containers where the most basic functionality is adding and removing items. Any sort of object is allowed within a list and the values are stored in a specific order, meaning that values can be retrieved from a certain position as well as ordered within the list. Python even allows mixing the type of object that is stored withing a list, it is for example possible to store integers, strings and instances of any class within the same list. Python allows the creation of lists using a concept known as list comprehension. List comprehensions build

8 2.6. PROFILING lists by iterating over each element in another iterable or sequence and adding each value to the new list, possibly with some operation performed on each value.

2.5.2 Tuple The tuple data structure is a container of values, just like the list. Though in contrary to the list, tuples are immutable and do therefore not allow adding or removing items. Once a tuple has been created it cannot be modified. Tuples are also cached by the Python runtime [4]. The Python runtime does not always free the memory used by a tuple that is deallocated, but instead keeps the memory for later use. This way a request to the to reserve memory may be skipped when a new tuple is created if a previously saved memory slot can be used.

2.5.3 Dictionary The Python dictionary is a container that uses keys to reference values. While keys used in a dictionary need to be immutable the values may be mutable. Dictionaries themselves are mutable, so adding and removing new values as well as modifying values after creation is possible.

2.5.4 Generators Generators are functions that behave like iterable data types. Generator functions use the yield keyword to indicate the next value in the sequence. By using generators it is possible to iterate over large sets of data without the need of storing all the values in memory at the same time [34]. This is useful for applications that iterate over large sets of data, where the possibility of storing all the values in memory might not be possible.

2.6 Profiling

Profiling of code is an analysis which measures the characteristics of a program. Profiling can be done to analyze dierent aspects of a program. One could for instance measure a programs memory usage, program execution time or the amount of time spent in dierent areas of the code. Optimization is often the main goal when profiling software. When attempting to increase performance of an algorithm or system, it is most ecient to start improving the areas where any gains will be most noticeable. In other words, where the algorithm or system is spending most of its time during execution or where bottlenecks occur. In order to know which areas of code are taking the longest time and acting as bottlenecks it is important to use profiling tools. Though one may sometimes be able to guess where the code is slow by analyzing it, it is likely that one may be wrong. Spending time improving the performance of an area which has a very small overall impact on the whole execution time is time consuming work that does not give relevant results. By using a profiling

9 CHAPTER 2. BACKGROUND tool one can get get detailed information on a programs execution time, memory used, functions called etc. This is important information to use when determining areas to focus on when attempting to improve performance.

2.6.1 gtime The gtime command prints timing statistics about a program. The OS X time command [35] does not provide a method of displaying verbose output like the GNU time command [36]. Therefore, in this project, GNU time version 1.7 was installed using Homebrew. The GNU time command will from here on out be referred to as gnu-time. As the manual page for gnu-time states, calling the command with a verbose flag allows output of more detailed information that may be valuable when analyzing performance of a program.

2.6.2 cProfile The cProfile profiler provides deterministic profiling for Python programs [37]. It ouputs the following information about each function called during execution: ncalls - the number of calls to the function. tottime - the total time spent in the function, excluding time made in calls to sub- functions. percall - the quotient of tottime divided by ncalls. cumtime - the cumulative time spent in the function, including time made in all subfunctions. percall - the quotient of cumtime divided by primitive calls, i.e. calls not induced via recursion.

2.6.3 profilehooks The profilehooks tool is a collection of decorators that ease profiling Python pro- grams [38]. It allows profiling of single functions, timing functions and profiling each line in a function. As profilehooks is not a profiler, simply a decorator, it requires a profiler to do the actual profiling. For this cProfile was used.

2.7 Benchmarking

When benchmarking performance of a or task it is important that timing is performed in the right place. Only the task or block of code that is relevant to the test should be included in the measurements. By not being precise in this manner it is possible that results could be obscured and results vary because of other components than the task in question.

10 2.7. BENCHMARKING

It is also important to minimize the eect of outer factors when performing the tests. As described in the book High Performance Python ”Your computer will be performing other tasks while running your code, such as accessing the network, disk, or RAM, and these factors can cause variations in the execution time of your program” [4]. Therefore all user initiated programs and non required processes, except the program in question, should be exited when benchmarks are performed, as well as disconnecting and disabling all network connections.

11

Chapter 3

Method

3.1 Project Breakdown

To reach a result the project was broken down into dierent tasks, detailed below.

Code Analysis This was the first task performed in the project. It consisted of understanding and profiling a code base in order to find generally applicable use cases that could be broken out into tests used in the benchmarks. The code base was provided by TriOptima. Profiling helped to find areas of the code that were slow, which should always be prioritized when optimizing. The bottlenecks of the code that were found through profiling were isolated and modified into generally applicable use cases. The reason for this was to perform the study on programming problems that are general and relevant to the whole programming community, and not specifically for TriOptima. These independent modules were the test cases for all benchmarks and analysis in the study.

Setup Development Environment All work was performed on two computers provided by TriOptima. Before bench- marking or coding could be commenced, the projects development environment needed to be setup. This included installing the interpreters to be used during the project, setting up version control and testing that the code could be run.

Coding The practical part of this project was to implement the independent modules in a manner that each interpreters performance could be measured and recorded. For Cython and Jython a second version of each test was implemented with interpreter specific code. Cython, for example, allows static typing of variables while Jython enables the use of Java data types. The reason for this was to compare the dif-

13 CHAPTER 3. METHOD ference between running each interpreter with pure Python code and running with interpreter specific syntax and data types.

Benchmarking Benchmarks were performed for each interpreter, measuring the execution time of each test. The benchmarks were run on two dierent computers provided by TriOptima. The data gathered from the benchmarks are the result presented in this report, showing the performance of each interpreter for the created test suite.

3.2 Hardware

All experiments and work for this project has been performed on a Mac Pro (Mid 2010) and Dell Latitude E7250 (from here on out referred to as ”Dell”), both pro- vided by TriOptima with the specifications described below.

3.2.1 Mac Pro Intel® Xeon® W3530 2.80 GHz 16 GB RAM Processor: Intel® Xeon® W3530 2.80 GHz Cores: 4 L2 Cache: 256 KB L3 Cache: 8 MB Memory: 16 GB (2x8 GB) 1066 MHz DDR3 ECC

3.2.2 Dell Intel® Core™ i7-5600U CPU @ 2.60GHz 8 GB RAM Processor: Intel® Core™ i7-5600U CPU @ 2.60GHz Cores: 2 L2 Cache: 256 KB L3 Cache: 4096 KB Memory: 8 GB (1x8 GB) 1600MHz DDR3

3.3 Software Environment

The software used in this project and their respective versions are listed in this section.

3.4 Profiling

Profiling of the TriOptima code base was performed in dierent steps. During each profiling run, all other running user initiated processes were terminated in order to decrease contention for system resources. Each step was run three times. The purpose of this profiling was to gain knowledge of where most of the execution time

14 3.4. PROFILING

Figure 3.1. Software used in this project.

was spent. The tests were run with realistic test data that, for disclosure reasons, can not be described in this report.

3.4.1 gnu-time Firstly gnu-time was used to mainly get information regarding execution time and CPU usage. By using the –verbose flag gnu-time outputs user time (seconds), system time (seconds) and percent of CPU the job got. As can be seen in figure 3.2, out of three runs an average of 44% of CPU was given to the jobs. This is either an indication that other processes were contending for the CPU or the amount that was spent waiting on I/O.

3.4.2 cProfile The code provided by TriOptima that was analyzed is part of a large [39] project. Therefore when the program was profiled profilehooks became extremely valuable, allowing profiling of specific functions instead of the whole execution of the program. Despite this, several Django functions showed up in the profiling output since data is fetched from the database through the Django ORM. As these functions were not to be modified in any way, they were disregarded. If any of the runs had a function performing under 10 percent of the total execution time, the method was disregarded out of each run. The results of running cProfile over three runs can be seen in figure 3.3. Profiling using cProfile gave information about each functions execution time. Most parts of TriOptimas code is not public and can therefore not be published in

15 CHAPTER 3. METHOD

Figure 3.2. Profiling results using gtime.

this work. The functions have been renamed in order to be able to show the results from the profiling. Using profilehooks the profiling was started in the programs main entry point, called func_a. Profiling was started here in order to disregard several Django function calls prior to this function. When examining the output, as seen in figure 3.3, it is important to be aware that several of the functions called are Python generators [34]. These functions are marked with ”(g)”. cProfile will mark each retrieval of the next value from a generator as a call to the function. This explains the high values in ncalls for generators. The functions that drew the authors attention were func_d, func_e, func_n, func_o, func_p and func_q. These have either a high cumtime, a high tottime or both. These functions will provide the basis for the general modules that are to be created later on in the project. Even though functions f-j have high cumtimes these are not relevant to this study as analyzing the code shows that these functions mainly perform retrievals from a database.

3.5 Generalizing the Problem

In this section the code from the relevant functions from figure 3.3 will be described.

16 3.5. GENERALIZING THE PROBLEM Profiling results using cProfile. Figure 3.3.

17 CHAPTER 3. METHOD

3.5.1 func_d

This is a function that takes an iterable as input. The value in the iterator is a tuple with a key and value. The function adds the value into a dict at the position pointed out by the corresponding key. The pseudocode for func_d can be seen in algorithm 1.

Algorithm 1: func_d - reducing values into a dictionary input : Iterable i containing key, value pairs output: Dictionary with reduced values in each key

1 foreach key, value pair in i do 2 d[key] d[key]+value Ω 3 end 4 return d

3.5.2 func_e

This function is a generator that takes an iterable as input. A set of factory classes are retrieved. The generator yields a tuple of two objects which are created using each factory for each value of the iterable. The pseudocode for func_e can be seen in algorithm 2.

Algorithm 2: func_e - generator yielding key, value pairs input : Iterable i containing instances of data class output: tuple of two objects

1 factories RetrieveFactories() Ω 2 foreach data_instance in i do 3 foreach factory in factories do 4 object1 factory.createObject1(data_instance) Ω 5 object2 factory.createObject2(data_instance) Ω 6 yield (object1, object2 ) 7 end 8 end

3.5.3 func_n-func_q

These functions are all part of the factory classes used in func_e. The functions create objects and set dierent values depending on the created object.

18 3.6. TESTS

3.6 Tests

From the examined functions, described in section 3.5, the following problems were recognized:

• dictionary usage,

• list iteration,

• tuple iteration,

• generators, andt

• creating a large collection of objects.

From these problems a set of tests were created in order to determine the per- formance of each interpreter. The tests are described in sections 3.6.1 to 3.6.5. Note that for each test there exists two implementations. Each test was first im- plemented using pure Python code, referred to as pure Python code in this report. For Cython and Jython a second implementation of each test was created using data types and syntax which these interpreters allow. These implementations are referred to as interpreter specific code. For each test, the test data is generated beforehand and is not part of the time measurement. The test data is identical for each interpreter. For tests that perform an execution several times, as part of a loop, a maximum running time of 15 minutes is allowed to prevent extremely long running times. Generating test data is not part of the measurement. The maximum running time is not upheld for tests that perform one operation, for example the update operation performed in the merge test for Python dictionaries, as it would not be possible measuring the operation time during execution without aecting the operation. Time measurement was done using the time.clock() function, as this is the recommended function to use for benchmarking Python [40]. Each execution of a test was done by starting a new process of the desired interpreter with the desired amount of data to use in the test. After a test had finished, that test exited its running process and the next test was started by the script. This was done in order to execute each test in isolation from the others. If all tests were run in sequence in the same process, it would be possible for a JIT to utilize compiled code from a previously run test, obscuring the result of that test. All user initiated programs except the terminal running the tests were terminated before benchmarks were performed in this project and all network connections disconnected and disabled. This was done in order to minimize the amount of contention for the CPU. Below are descriptions of each test. The code for these are available at the project git repository [41]. Under src/actions are three folders: cython, jython and python.Thepython folder contains the pure Python code for all tests. The jython and cython folders contain the interpreter specific code for Jython and Cython

19 CHAPTER 3. METHOD respectively. As PyPy has no interpreter specific tests, PyPy simply uses the tests in the python folder. All tests for Jython were run with the option -J-Xms8g. This option sets the initial and minimum Java heap size to 8 gigabyte. This was done since the problem sizes in the tests were quite large and so that both the Dell and Mac computers would start with the same heap size.

3.6.1 Dictionary The tests that were performed on the Python dictionary data type, dict,werein- sertion, overwriting values for existing keys, merging two dictionaries as well as reading values from a given key. All dictionary tests can be found in the dictio- nary_actions.py files in the project git repository [41].

Insert Code for this test can found in the function dictionary_insert. The insertion test was constructed to test how well each interpreter is able to insert a value for a key previously not occupied. The test is started with an empty dictionary instance and test data, stored in a list of tuples where the tuple contains the key and value to be inserted. All data values are iterated and inserted one by one using the __setitem__ method ([]), the standard way of setting a value in Python dictionaries. Timestamps are recorded before and after the insert operation and the elapsed time between these values is added to a sum of each iteration. This way only the time of the insertions are recorded, excluding the time of looping through the data. As this test consists of looping through the test data the test will abort if it continues past the maximum running time.

Overwrite Code for this test can be found in the function dictionary_insert. The overwrite test was constructed to test how well each interpreter is able to change the value at an index already containing a value. The test is started with a dictionary instance and test data, stored in a list of tuples where the tuple contains the key and value to be inserted. All values in the test data are already present in the dictionary instance when the test begins. Apart from this the test is identical to the insertion test.

Merge Code for this test can be found in the function dictionary_merge. The merge test was constructed to test how well each interpreter is able to merge two dictionary instances using the dictionary method update.Thetestis started with two dictionary instances. The update method is then called on the first dictionary with the second dictionary as input, meaning that the second dictionary is

20 3.6. TESTS merged into the first. This test is run in three dierent scenarios; using dictionaries that have no matching keys, half of the keys match and all of the keys match. Each method is run separately in order to prevent the tests aecting each others performance. Timestamps are recorded before and after the update method and the elapsed time between these values is recorded. As this test consists of only the update method, it is not possible to limit this test to the maximum running time.

Read Code for this test can be found in the function dictionary_read_all_values. The read test was constructed to test how well each interpreter is able to read values from a dictionary. The test is started with a dictionary already containing the test data. The value stored for each existing key in the dictionary is retrieved using the __getitem__ method ([]). Timestamps are recorded before and after the get operation and the elapsed time between these values is added to a sum of each iteration. As this test consists of looping through the test data, the test will abort if it continues past the maximum running time.

3.6.2 List The tests that were performed on the Python list data type, list,werebuildinglists from list comprehension, appending new values, and sorting. All list tests can be found in the list_actions.py files in the project git repository [41].

List Comprehension Code for this test can be found in the function list_comprehension. The list comprehension test was constructed to test how well each interpreter is able to create a list using list comprehension. The test is started with an integer representing the number of elements that the list should consist of. Timestamps are recorded before and after the list comprehension and the elapsed time between these values is recorded.

Append Code for this test can be found in the function list_append. The append test was constructed to test how well each interpreter is able to append values to a list. The test is started with a list of values that are iteratively appended to another list which is initially empty. Timestamps are recorded before and after the list comprehension and the elapsed time between these values is added to a sum of each iteration. As this test consists of looping through the test data, the test will abort if it continues past the maximum running time.

Sort Code for this test can be found in the function list_sort.

21 CHAPTER 3. METHOD

The sort test was constructed to test how well each interpreter is able to sort a list. The test is started with an unsorted list containing either integers or objects. The test uses the Python function sorted to sort the list. Timestamps are recorded before and after the call to sorted and the elapsed time between these values is recorded. As this test consists of only calling sorted, it is not possible to limit this test to the maximum running time.

3.6.3 Tuple The tests that were performed on the Python tuple data type were appending new values and sorting. All tuple tests can be found in the tuple_actions.py files in the project git repository [41].

Append Code for this test can be found in the function tuple_append. The append test was constructed to test how well each interpreter is able to append values to a tuple. The test is started with a list of values that are iteratively appended to a tuple which is initially empty. Since tuples are immutable, the append action actually creates a new tuple with which is a copy of the previous tuple as well as the new value. Timestamps are recorded before and after the append action and the elapsed time between these values is added to a sum of each iteration. As this test consists of looping through the test data, the test will abort if it continues past the maximum running time.

Sort Code for this test can be found in the function tuple_sort. The sort test was constructed to test how well each interpreter is able to sort a tuple. The test is started with an unsorted tuple containing either integers or objects. The test uses the Python function sorted to sort the tuple. Timestamps are recorded before and after the call to sorted and the elapsed time between these values is recorded. As this test consists of only calling sorted, it is not possible to limit this test to the maximum running time.

3.6.4 Generator Code for this test can be found in the function generators_iterate in the genera- tor_actions.py files, located in the project git repository [41]. The test that was performed on Python generators was constructed to test how well each interpreter is able to retrieve the next value from a generator that does nothing but yield the next value of the iterator. The test is started with a list of values that are iterated in the generator. As long as there are values to be fetched from the generator the test function gets the next value with the next method. Timestamps are recorded before and after the call to next and the elapsed time

22 3.7. TEST DATA between these values is added to a sum of each iteration. As this test consists of looping through the test data, the test will abort if it continues past the maximum running time.

3.6.5 Objects The tests that were performed on objects were creating instances of a class as well as adding two objects together. When two objects are added together in Python the __add__ method is called. This method was therefore overloaded in the class. The __str__ and __repr__ methods are called when a string representation of the object is needed and were also overloaded in the class to make debugging easier. Other than these methods the class only consists of an integer index. All generator tests can be found in the object_actions.py files in the project git repository [41].

Create Code for this test can be found in the function objects_generate. The create test was constructed to test how well each interpreter is able to create objects of a class. The test is started with the amount of objects to be created. Timestamps are recorded before and after each creation of an instance and the elapsed time between these values is added to a sum of each iteration. As this test consists of looping through the test data, the test will abort if it continues past the maximum running time.

Add Code for this test can be found in the function objects_add. The add test was constructed to test how well each interpreter is able to add two objects together of a class, and store the value into a new variable. The test is started with two lists containing instances of the class. Each object in position i in the first list is added with the object in position i in the other list and stored in a variable. Timestamps are recorded before and after the add operation and the elapsed time between these values is added to a sum of each iteration. As this test consists of looping through the test data, the test will abort if it continues past the maximum running time.

3.7 Test Data

All tests were run with both integers and instances of a class, created for this project, as values. The class can be found in src/data/__init__.py as well as src/data/cython/data.pyx for the Cython interpreter specific in the project git repository [41]. The problem size used for each test, referred to as n, was varied between dierent runs of each test. Each test was run with n being equal to 100, 1 000, 10 000, 100 000, 1 000 000 and 5 000 000 except for the append test on tuples

23 CHAPTER 3. METHOD which did not run the test for n equal to 5 000 000, as the test would run for an extremely long time. The reason for varying n was to register the dierence in performance for both small and large sets of data. By varying n for the tests it was also believed that the dierence in performance for interpreters using a JIT would be visible for the same set. The keys used for tests performed on the Python dictionary were generated in advance of the tests and stored to text files. The keys are strings of random length between 1 and 1 000 inclusively, consisting of a random selection of upper case characters, lower case characters and integers. This was done in order to guarantee that the same keys were used across all computers and tests, decreasing the variance in each test run. A result of this was also a speed up in data generation, since each run did not require generation of new keys.

3.8 Interpreter Specific Code

Each test was modified to interpreter specific versions where possible. The Cython and Jython interpreters allow usage of non Python libraries and, in the case of Cython, syntax. PyPy on the other hand does not extend the Python language and does therefore not require any change to the original Python tests. Below are de- scriptions of the changes made to the Cython and Jython specific implementations.

Cython Calling the cython command line program with -a on a Cython file generates an HTML file with the Cython code interleaved with the generated C code. In the generated report some lines are highlighted in a shade of yellow. The darker the yellow, the more calls to the Python/C API. Lines that are not highlighted translate to pure C code. This tool was used to add static typing in appropriate areas to decrease the dependence of the Python/C API. For most tests this included adding static typing to variables. When tests were run with objects it was not possible to statically type these. Therefore a check is performed in the beginning of several tests to see if the test is for integers or objects. If the test is for integers the test uses the statically typed variables. The Python class was modified to be an extension type, in order to decrease the dependence of the Python/C API. In the tests performed on lists a C++ vector was used instead of the Python list.

Jython For the Jython specific implementation the Python built in types were replaced by Java counterparts, as these can be imported into the Python source file without the need of writing any Java code. The Python dictionary was replaced with the java.util.HashMap and the Python list with java.util.ArrayList. By default Python dictionaries translate to a java.util.concurrent.ConcurrentHashMap [42] and Python lists translate to a java.util.ArrayList [43].

24 Chapter 4

Results

In this section the results from the performed benchmarks will be presented. Note that graphs for all benchmarks will not be presented in this report. The data from the benchmarks performed on the Dell computer will not be presented when the graphs follows the same pattern as the graphs for the benchmark performed on the Mac. On the other hand when the graphs do not follow the same pattern, the data will be presented in section 4.1. As previously mentioned, the benchmarks were performed with dierent problem sizes, referred to as n. The case at TriOptima showed that performance was critical when handling large amounts of data. Therefore only the results for n equal to 100 000, 1 000 000 and 5 000 000 are presented. The graphs are presented in appendix D. All data, including that which is not presented in the report, is available at the project git repository, under data/saved_time/final/ [41]. In figures 4.1 and 4.2 it is presented which interpreter performed best on each test for n equal to 5 000 000 on both the Mac and Dell computers.

4.1 Discrepancies

While the majority of the tests gave similar results on both the Dell and Mac computers, there were some dierences as can be seen in figures 4.1 and 4.2. In sections 4.1.1 and 4.1.2 the noticeable dierences between the two computers that the tests were run on are presented.

4.1.1 Jython For the tests listed in this section, Jython performed much better on the Mac than on the Dell, in comparison to the other interpreters, not in comparison to runtime.

Dictionary Insert The figs. D.1 to D.4, D.6 and D.7 show the results for the dictionary insert tests except for the test running interpreter specific code with objects. Looking at the

25 CHAPTER 4. RESULTS

Figure 4.1. Fastest interpreter per test running pure Python code, n equal to 5000000.

26 4.1. DISCREPANCIES

Figure 4.2. Fastest interpreter per test running interpreter specific code, n equal to 5 000 000 values.

27 CHAPTER 4. RESULTS results for n equal to 5 000 000 in these figures, shows that Jython performs fastest out of all interpreters for the Mac. It performs slowest out of all interpreters on the Dell.

Dictionary Overwrite

In figs. D.32 to D.39 one can see that Jython performs better on the Mac than on the Dell. When overwriting integers using pure Python code Jython is the slowest interpreter on Dell while being the fastest on the Mac.

List Append

In the list append tests running interpreter specific code, figs. D.48, D.49, D.51 and D.52, one can see that Jython performs noticeably better on the Mac than on the Dell.

Objects Add

The figs. D.62 to D.65 show results for the objects add test. In these figures we see that Jython is fastest on the Mac when running pure Python code. However, on the Dell PyPy is faster than Jython for the same test. For the interpreter specific code Jython is slowest on the Dell while being faster than CPython on the Mac.

4.1.2 PyPy

There is also some variation noticeable in PyPy. For the dictionary merge test where half of the keys match, PyPy is slowest on Mac while being fastest or second fastest on Dell. The results can be seen in figs. D.20 to D.23.

4.2 Inconsistent Results

Even though other processes were terminated before running the tests, in an attempt to lower contention of the CPU:s, there occurred variations in running time between each of the five runs of each test. All variations can be found in the project git repository under the variations folder [41]. The 20 largest variations in seconds consisted mostly of Jython tests, both for the Mac and Dell computers. When Tmax T looking at the dierence between the lowest and largest recorded value, ( ≠ min ), Tmin there was a larger variation between the Dell and Mac computers. When running the pure Python code, the Dell computer showed a high relative dierence when running PyPy while the Mac was much more stable as can be seen in figures 4.3 and 4.4.

28 4.2. INCONSISTENT RESULTS 20 largest variations in percent, Dell computer running pure Python Figure 4.3. code.

29 CHAPTER 4. RESULTS code. 4.4. Figure 0lretvrain npret a optrrnigpr Python pure running computer Mac percent, in variations largest 20

30 4.3. CPYTHON

4.3 CPython

CPython is only the fastest interpreter for a few cases when it is barely faster than Cython, for example in figure D.17. It did on the other hand, several times, perform at almost the same performance speed as Cython, for example when sorting integers, figure D.59, when yielding integers from a generator, figure D.44, and when inserting integers into a dictionary, figure D.2. CPython was not able to complete the tuple append test for n equal to 1 000 000 within the set time frame.

4.4 Cython

Cythons performance was for many of the tests similar to that of CPython, even when running interpreter specific code. For operations such as merging dictionaries, inserting new values or sorting lists Cython uses the CPython function, resulting in a similar performance. This can be seen in figs. D.4, D.15 and D.59. Creation of lists using list comprehension containing a class that has been declared as an extension type showed incredible speedup compared to all other interpreters, as seen in fig. D.56. Cython was not able to complete the tuple append test for n equal to 1 000 000 within the set time frame.

4.5 Jython

Jython was the only interpreter not to complete all of the tests. Looking at figs. D.8 to D.31, D.36, D.37, D.42 and D.43 shows that Jython is missing results for all of the dictionary merge tests, the dictionary overwrite test for dictionaries contain- ing objects or the dictionary read test for objects. The dictionary read test for objects was stopped due to exceeding the time limit. The other tests were not able to complete since the test crashed due to a Java memory error. Java threw a java.lang.OutOfMemoryError with the message ”GC overhead limit exceeded”. Java throws this exception when the Java process is spending more than 98% of the time performing garbage collection while only freeing less than 2% of the heap for the last five consecutive garbage collections [44]. For the merge tests the error occured when the second dictionary object was being created, in other words before the test is actually able to start. The overwrite test on the other hand failed during the test, after a value was inserted into the dictionary. In the dictionary read tests for integers, figs. D.40 and D.41, dictionary overwrite tests, figs. D.33, D.35, D.37 and D.39, as well as the list append tests, figs. D.49, D.50, D.52 and D.53, one can see that the tests running pure Python code performs better than the tests running interpreter specific code. Remember that in these tests the interpreter specific code uses java.util.HashMap and java.util.ArrayList instead of the Python dict and list. Jython was fastest for several of the tests running pure Python code, including dictionary read for integers, dictionary overwrite with objects and sorting lists of

31 CHAPTER 4. RESULTS objects, as can be seen in figs. D.39, D.41 and D.61. Jython was also the only interpreter to complete the tuple append test for n equal to 1 000 000, figs. D.68 to D.71. All the other interpreters were stopped due to exceeding the time limit.

4.6 PyPy

PyPy was the interpreter that overall performed best across all tests and computers, only being beaten by Jython on the Mac running pure Python code. When running interpreter specific code PyPy is fastest out of all interpreters in 13 of 26 tests, followed by Cython which is fastest in 7 tests on the Mac and 8 tests on the Dell. Running pure Python code PyPy was the fastest interpreter for 10 tests on the Mac and 11 tests on the Dell. PyPy performed extremely well for the list comprehension, generator and creating objects tests as well as sorting integers, as can be seen in figs. D.44 to D.47, D.54 to D.59, D.66, D.67, D.72 and D.73. PyPy was not able to complete the tuple append test for n equal to 1 000 000 within the set time frame.

32 Chapter 5

Discussion

In this chapter the results presented in the previous chapter will be explained and motivated.

5.1 Cython

As was mentioned in the results, Cython’s performance is for many of the tests very similar to that of CPython. This is mainly due to Cython’s reliance on CPython and how the tests were built. The goal of the tests was to determine how well each interpreter was able to perform certain operations. Therefore the only time that was measured were those operations, not the entire test suite or even the loop that the operation was performed in. Cython is dependent of CPython and uses several of its functions and data types. Looking for example at the list sort test for regular Python code, figure D.59, one can see the similarity between Cython and CPython. The real speedup of Cython becomes obvious when the code is able to run pure C code without the need of CPython functions, data types or the Python C/API. For the tests in this project this becomes most apparent in the tests utilizing the extension type, for example when creating a list using list comprehension, adding objects together or merging dictionaries, as seen in figs. D.13, D.56 and D.63. Running the list comprehension test with a Cython extension type provided a speed up of approximately factor 19. The C code generated by Cython can be seen in appendix A. Listing A.1 shows the code for list comprehension using Python objects and listing A.2 show the code for list comprehension using a Cython extension type. Comparing these two one can see a great dierence in the amount of code generated. The code with Python objects has a total of 134 statements, 87 of which are within the for-loop which creates the list. The code using the Cython extension type has a total of 66 statements, 55 of which are within the for-loop which creates the list. This shows that Cython is able to compile C-code with much fewer statements when using extension types and statically typed variables. When using a Python object the for-loop does not either contain any bounds, but instead contains several checks to see if the loop should continue and thereafter breaks if it should not.

33 CHAPTER 5. DISCUSSION

For the interpreter specific test using an extension type, the size of the list is also statically typed as an integer. Cython is then able to use this variable to set the bound of the for-loop, completely removing the need for the checks within the loop body. Since a majority of the statements are within the for-loop, decreasing the amount of statements within the loop body greatly decreases the total number of statements that are executed. Several of the statements within the loop body of the code using Python objects are also calls to the Python C/API, by removing these the increase in performance becomes even more noticeable. Cython’s means to increasing performance is just that, decreasing the reliance to CPython and instead running pure C/C++ code. As mentioned in the background of this report, one of Cython’s strengths is increasing performance of loops. If time was measured over the complete tests and not just single operations it is possible that the results would be more in Cythons favor as the optimizations performed on the loops would then be included in the time measurements. The results show that when using Cython it is important to make sure that the changes one is making to the code actually impact performance in a positive way. Looking at the dictionary insert test for integers, figures D.2 and D.4, or the generator test for objects, figures D.46 and D.47, one can see that the test running regular Python code is actually faster than the test running the Cython code with statically typed variables. When optimizing code it is important to make constant checks that the changes one is making are in fact improving the performance. Otherwise there is a risk that one will simply add more overhead to the code.

5.2 Jython

It is surprising that running the tests with explicit Java types actually decreased the performance, there are but a few cases where explicitly using a Java data type is faster than running pure Python code. The natural initial hypothesis of the author was that by using java.util.HashMap instead of the Python dict, which essentially is a wrapped java.util.concurrent.ConcurrentHashMap, a minor speed increase could be achieved by removing overhead and unnecessary wrapping. As can be seen in the dictionary insert test for objects, figures D.5 and D.7, this was not the case. In this test, explicitly using java.util.HashMap decreases performance by more than a factor 2. The Java class file which is created when running a python script with Jython can be decompiled into a Java file. Appendix B contains the relevant lines of decompiled code for the dictionary insert test. Listing B.1 shows the code for the test using a Python dictionary and listing B.2 shows the code for the test using java.util.HashMap. The main dierence between these two can be seen on line 6 of listing B.1 and on line 5 of listing B.2. When using the Python dictionary the generated Java code is able to directly access the insert method by calling the __setitem__ method. When using java.util.HashMap on the other

34 5.3. PYPY hand, the generated code does not seem to have direct access to java.util.HashMap’s put method but instead needs to retrieve it using the __getattr__ method. The __getattr__ method is in Jython used to look up attributes of a class [45]. By calling the insert method directly instead of first requiring to find it within the class and then calling it, Jython is able to finish the tests faster when using the Python dictionary than when using java.util.HashMap. That Jython was unable to complete all of the tests shows that Jython is much more sensitive to applications requiring high memory. Not being able to complete these tests is a poor result for Jython. Perhaps by applying some configurations to the JVM or modifying the code to be more suited for completing the test in Java would allow for the test to complete. This would on the other hand be unfair to the other interpreters, which were able to complete the tests as they were written. The literature that was read regarding Jython during this project all stated that Jython was slower than CPython. In this literature an older version of the JVM was used than in this project and therefore it was the hypothesis of the author that some improvement would be noticed in this project. This being said the author was still surprised of how well Jython actually performed for several of the tests. Looking back at figures 4.1 and 4.2 one can see that Jython was actually able to perform fastest in 8 and 12 out of the 26 tests when running regular Python code, on the Dell and Mac computers respectively. When running interpreter specific code Jython was fastest in 4 and 5 out of the 26 tests, on the Dell and Mac computers respectively. This is most likely because of the improvements done to the JVM as well as optimizations that the JIT is able to make. Despite Jython being fastest in several of the tests, one should be aware of the fact that it was not able to complete all of the tests. The results show that Jython would not be suited for applications using large dictionaries.

5.3 PyPy

PyPy’s results show that it is a powerful interpreter when performance is crucial. Since PyPy was developed to be a faster alternative of CPython it was on the other hand the initial hypothesis that PyPy would be able to perform better than CPython on all of the tests, which was not the case. On several of the dictionary tests as well as when sorting lists of objects CPython was faster than PyPy. The PyPy team has created a tool called jitviewer which visualizes what the PyPy JIT is doing with the code [46]. Running a program with this tool gives information on which areas of code have been optimized by the JIT as well as how many times this code has been run. In appendix C the JIT optimized code from jitviewer is presented. The list comprehension test with integers was one of the tests where PyPy was fastest of out of all interpreters. In listing C.1 the optimized code created by the JIT for the list comprehension test where n is equal to 5 000 000 is shown. It shows that the a vast majority of the bytecode operations have been optimized into assembler code, resulting in an increase in PyPys performance.

35 CHAPTER 5. DISCUSSION

For the dictionary tests PyPy was slower than CPython in 8 out of 12 tests. In the dictionary merge test for objects where none of the keys match, PyPy was a factor 3 slower than CPython. Running this test with jitviewer shows that the PyPy JIT has not done any optimizations to the code in the test. Optimizations have only been done when preparing the test data. This would explain why PyPy’s performance is slower for this test. On the other hand, in the dictionary merge test for objects where all of the keys match, fig. D.15, PyPy was faster than CPython by a factor of 2. Running this test with jitviewer also shows that no optimizations were done during the test but only during preperation of test data. If only taking the JIT optimizations into consideration, one would expect PyPy to perform worse than CPython for this test as well. The PyPy Project team have made changes in their dictionary data type compared to CPython’s. It is possible that this implementation dierence is the cause for PyPy’s faster performance when merging dictionaries that match all keys. Looking at the results for the merge tests, the more keys that match between the two dictionaries that are merged the faster PyPy performs compared to CPython. It can be read on the PyPy website under Performance that one of the weak- nesses of PyPy is long-running runtime functions, for example sorting large lists [47]. While still performing well when sorting lists and tuples of integers in the tests developed in this project, PyPy was not very ecient, comparing to the other interpreters when sorting lists and tuples containing objects.

5.4 Compatibility

One of the strengths with the Python language is the vast amount of third party li- braries and packages that are available. Thanks to the Python community solutions to regular problems are available for all developers. TriOptima is no exception when it comes to using third party libraries. Throughout their products several tools de- veloped by others are used. By using these tools TriOptima can focus development time on what is important in their products. Though many Python packages are written entirely in Python, there are those that contain C/C++ code and use the Python C/API. Packages entirely written in Python will in almost all cases work across all interpreters. A common tool for installing Python packages is pip [48]. This tool works across all interpreters evaluated in this project. The packages that contain C/C++ code on the other hand are by default not compatible with PyPy and Jython as these do not support the Python C/API. PyPy does provide workarounds for some of the packages and is working on support for the Python C/API, which is at the time of this report in alpha/beta-level [49]. Compatibility with the Python C/API does not seem to be worked on for Jython, but there exists a project called JyNI which is a compatibility layer between native CPython extensions and Jython. JyNI is currently in alpha level and does not yet support the complete Python C/API. There also exists a tool called jip for Jython which can be used to install jars for ones Jython environment

36 5.5. HARDWARE DEPENDENCIES

[50].

5.5 Hardware Dependencies

It is clear from the results that hardware does have an eect on the interpreters performance, especially PyPy and Jython. Looking for example at the dictionary merge tests, figs. D.20 and D.21, the results show an extreme dierence in PyPy’s performance between the Mac and Dell computers. The figs. D.34 and D.35 show the results for the dictionary overwrite tests for integers. The graphs show that Jython performs fastest out of all the interpreters on the Mac computer while performing slowest on the Dell computer.

37

Chapter 6

Conclusion

In this thesis project the author has created a test suite, and benchmarked these tests for the Python interpreters CPython, Cython, Jython and PyPy. The tests are inspired by problems found in an existing Python code base. For Cython and Jython a second version of each test was created which used data types and syntax which these interpreters allowed. The tests were run, with varying problem size, for each interpreter while the execution time the interpreter required to complete the test was recorded. All tests were run on two dierent computers, a Dell computer running and a Mac running OS X Yosemite. All data gathered from the benchmarks as well as the source code is available at the project git repository [41]. The results showed that there was not one interpreter that had the highest performance on all of the tests. The interpreters diered in which problems they excelled in. Each interpreter was also able to have the highest performance on at least one test, disproving the hypothesis of the author that CPython would not be fastest in any of the tests. On some of the tests there was a dierence in results for the Mac and Dell computers. The largest dierence could be seen for the Jython interpreter, which varied greatly between the computers when running tests such as dictionary insert, dictionary overwrite, list append as well as adding objects together. When running the tests that consisted of regular Python code, Jython and PyPy were able to perform the fastest time for the most tests. This shows that the best speedup without applying any change to the code is possible by using one of these interpreters. Looking for example at the tests for adding objects together, fig. D.65, shows that Jython and PyPy finish the tests a bit over two times faster than CPython and Cython when running the same code. When running the tests that used interpreter specific data types and syntax Jython performed worse on several tests, showing that using Java data types does not necessarily mean an increase in performance compared to using the Python data types. Cython showed great performance when it was able to perform operations without the use of the Python C/API, for example when creating lists containing extension types or adding extension types together. For the tests where Cython

39 CHAPTER 6. CONCLUSION is bound to using the Python C/API the performance was very similar to that of CPython. While the results from these benchmarks provide a good insight into the per- formance of the interpreters it is still dicult to determine which interpreter that would be best suited when one wishes to increase performance. If one does not wish to change their code base, it is clear that PyPy or Jython would be great options. Jython was on the other hand the only interpreter which was not able to complete all of the tests, due to memory issues. It is possible that many of the third party libraries that one wishes to use do not work without the Python C/API. If this is the case then changing to PyPy or Jython might result in great work, changing libraries or developing new ones, in order to maintain system functionality. Cython does still show an increase in performance for many of the tests, especially when using Cython syntax and data types. Using Cython it is possible to isolate the most performance critical code areas and only modify those. The other areas could be left unchanged. As Cython uses CPython all third party libraries would still work as usual.

40 Chapter 7

Future Work

The benchmarks run in this project only test a small portion of the Python language. The author would wish to see more tests added in the future, resulting in a more extensive test suite. In this project the tests only record performance for certain operations, but leave out measuring the total execution time of the process. This means that startup time for the test is disregarded, something that might be of importance to some. The total time spent in a test, including looping, is also not measured in this project. It would be interesting to compare these results to a test suite that included a full algorithm, measuring several operations and loops in the same test.

7.0.1 Memory Benchmarks It is interesting to see the dierences in memory usage for programming languages that have automatic memory management. While Cython uses the same garbage collector as CPython, PyPy use their own garbage collector and Jython uses the Java garbage collector. Memory usage was not measured in this project since at the time of the project there was no Python library available that works for each interpreter and that measures the memory allocated for the running process. There is neither an available method, working for all interpreters, to halt the garbage collector at a specific point in the source code. Attempting to measure the memory allocation in the test suite created in this project would therefore result in measurements that could not be guaranteed to be correct.

7.0.2 Other Interpreters There exists several other Python interpreters and implementations other than the ones used in this thesis project, for example IronPython, and Pyston. The test suite created in this thesis project could act as a base for anyone wishing to run benchmarks on any of these interpreters. Adding these interpreters to the test suite would give a wider view of the performance of dierent Python interpreters and implementations.

41

Bibliography

[1] Cython. Cython Homepage. [Online]. Available: http://cython.org/

[2] Jython. Jython Homepage. [Online]. Available: http://www.jython.org/

[3] PyPy Project. PyPy Homepage. [Online]. Available: http://pypy.org/

[4] M. Gorelick and I. Ozsvald, High Performance Python: Practical Performant Programming for Humans, 1st ed. O’Reilly Media.

[5] R. Murri, “Performance of Python runtimes on a non-numeric scientific code,” p. 5. [Online]. Available: http://arxiv.org/pdf/1404.6388v2.pdf

[6] Python Software Foundation. History and License. [Online]. Available: https://docs.python.org/3/license.html

[7] M. Lutz, Learning Python, 5th Edition, 5th ed. O’Reilly Media, Inc. [Online]. Available: http://proquest.safaribooksonline.com.focus.lib.kth.se/ book/programming/python/9781449355722

[8] K. W. Smith, Cython. O’Reilly Media, Inc. [Online]. Available: http://proquest.safaribooksonline.com.focus.lib.kth.se/book/ programming/9781491901731

[9] Python Software Foundation. CPython Mercurial Repository. [Online]. Available: https://hg.python.org/cpython/

[10] ——. Python C API. [Online]. Available: https://docs.python.org/2/c-api/ intro.html

[11] ——. Python Software Foundation. [Online]. Available: https://www.python. org/psf/

[12] G. Ewing. Pyrex. [Online]. Available: http://www.cosc.canterbury.ac.nz/greg. ewing/python/Pyrex/

[13] “Sage Days 29 talk: Robert Bradshaw - Cython.” [Online]. Available: https://www.youtube.com/watch?v=osjSS2Rrvm0

43 BIBLIOGRAPHY

[14] SageMath. SageMath Mathematical Software System - Sage. [Online]. Available: http://www.sagemath.org/

[15] S. Behnel, R. Bradshaw, C. Citro, L. Dalcin, D. Seljebotn, and K. Smith, “Cython: The Best of Both Worlds,” vol. 13, no. 2, pp. 31–39.

[16] Cython. Faster code via static typing. [Online]. Available: http://docs.cython. org/src/quickstart/cythonize.html

[17] B. Lubanovic, Introducing Python. O’Reilly Media, Inc. [On- line]. Available: http://proquest.safaribooksonline.com.focus.lib.kth.se/book/ programming/python/9781449361167

[18] Cython. Is Cython faster than CPython? [Online]. Available: https: //github.com/cython/cython/wiki/FAQ#is-cython-faster-than-

[19] Python Software Foundation. Data model. [Online]. Available: https: //docs.python.org/2/reference/datamodel.html

[20] C. F. Bolz, A. Cuni, M. Fijalkowski, and A. Rigo, “Tracing the Meta-level: PyPy’s Tracing JIT Compiler,” in Proceedings of the 4th Workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems, ser. ICOOOLPS ’09. ACM, pp. 18–25. [Online]. Available: http://doi.acm.org/10.1145/1565824.1565827

[21] PyPy Project. What is PyPy? [Online]. Available: http://pypy.org/

[22] ——. RPython documentation. [Online]. Available: https://rpython. readthedocs.org/en/latest/

[23] ——. Python compatibility. [Online]. Available: http://pypy.org/compat.html

[24] ——. People of PyPy. [Online]. Available: http://pypy.org/people.html

[25] C. F. Bolz, A. Cuni, M. Fijalkowski, M. Leuschel, S. Pedroni, and A. Rigo, “Runtime Feedback in a Meta-tracing JIT for Ecient Dynamic Languages,” in Proceedings of the 6th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems, ser. ICOOOLPS ’11. ACM, pp. 9:1–9:8. [Online]. Available: http://doi.acm.org/10.1145/2069172.2069181

[26] “How to get the most out of your PyPy.” [Online]. Available: https: //www.youtube.com/watch?v=oZw8m_lyhvo&feature=youtu.be

[27] S. Yoo, “Amortised Optimisation of Non-functional Properties in Production Environments,” in Search-Based Software Engineering, ser. Lecture Notes in Computer Science, M. Barros and Y. Labiche, Eds. Springer International Publishing, no. 9275, pp. 31–46. [Online]. Available: http: //link.springer.com.focus.lib.kth.se/chapter/10.1007/978-3-319-22183-0_3

44 BIBLIOGRAPHY

[28] “How PyPy runs your program.” [Online]. Available: https://www.youtube. com/watch?v=mHTu723RDNI

[29] Jython. General Information, Jython Wiki. [Online]. Available: https: //wiki.python.org/jython/JythonFaq/GeneralInfo

[30] S. Pedroni and N. Rappin, Jython Essentials. O’Reilly Media, Inc. [Online]. Available: http://proquest.safaribooksonline.com.focus.lib.kth.se/ book/programming/python/9781449397364

[31] , Inc., “Java SE 6 Performance White Paper.” [Online]. Available: http://www.oracle.com/technetwork/java/6-performance-137236. html

[32] B. Evans, “Understanding Java JIT Compilation with JITWatch, Part 1.” [Online]. Available: http://www.oracle.com/technetwork/articles/java/ architect-evans-pt1-2266278.html

[33] B. Evans and P. Lawrey, “Introduction to JIT Compilation in Java HotSpot VM,” vol. May/June 2012, p. 66. [Online]. Available: http://www. oraclejavamagazine-digital.com/javamagazine_open/20120506?pg=49#pg49

[34] Python Software Foundation. Python Generators. [Online]. Available: https://wiki.python.org/moin/Generators

[35] Apple. time. [Online]. Available: https://developer.apple.com/library/mac/ documentation/Darwin/Reference/ManPages/man1/time.1.html

[36] GNU. time. [Online]. Available: http://man7.org/linux/man-pages/man1/ time.1.html

[37] Python Software Foundation. The Python Profilers. [Online]. Available: https://docs.python.org/2/library/profile.html

[38] M. Gedminas. mgedmin/profilehooks. [Online]. Available: https://github. com/mgedmin/profilehooks

[39] Django Software Foundation. Django. [Online]. Available: https://www. djangoproject.com/

[40] Python Software Foundation. time.clock documentation. [Online]. Available: https://docs.python.org/2.7/library/time.html#time.clock

[41] A. Roghult. Project git repository. [Online]. Available: https://bitbucket.org/ roghult/exjobb_2015/src

[42] Jython. PyDictionary.java - . [Online]. Available: https://bitbucket. org/jython/jython/src/6855fa289d48cf741a3daed2a0bb9c9c87cdf07f/src/org/ python/core/PyDictionary.java?at=default&fileviewer=file-view-default

45 BIBLIOGRAPHY

[43] ——. PyList.java - Bitbucket. [Online]. Available: https://bitbucket. org/jython/jython/src/6855fa289d48cf741a3daed2a0bb9c9c87cdf07f/src/org/ python/core/PyList.java?at=default&fileviewer=file-view-default

[44] Oracle. Understand the OutOfMemoryError Exception. [Online]. Avail- able: https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/ memleaks002.html

[45] J. Juneau, J. Baker, V. Ng, L. Soto, and F. Wierzbicki, The Definitive Guide to Jython: Python for the Java Platform. Apress. [Online]. Available: http://proquest.safaribooksonline.com.focus.lib.kth.se/9781430225270

[46] PyPy Project. pypy / jitviewer. [Online]. Available: https://bitbucket.org/ pypy/jitviewer

[47] ——. Performance. [Online]. Available: http://pypy.org/performance.html

[48] PyPA. pip. [Online]. Available: https://pip.readthedocs.org/en/stable/

[49] PyPy Project. PyPy - Python compatibility. [Online]. Available: http: //pypy.org/compat.html

[50] S. Ning. jip 0.9.6 : Python Package Index. [Online]. Available: https: //pypi.python.org/pypi/jip

46 Appendix A

Cython Code Compilation

A.1 List Comprehension Using Python Object

Listing A.1. List Comprehension using Python object 1__pyx_t_2=PyList_New(0);if (unlikely(!__pyx_t_2)) { __pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 2__Pyx_GOTREF(__pyx_t_2); 3__pyx_t_4=PyTuple_New(1);if (unlikely(!__pyx_t_4)) { __pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 4__Pyx_GOTREF(__pyx_t_4); 5__Pyx_INCREF(__pyx_v_amount); 6__Pyx_GIVEREF(__pyx_v_amount); 7PyTuple_SET_ITEM(__pyx_t_4,0,__pyx_v_amount); 8__pyx_t_3=__Pyx_PyObject_Call(__pyx_builtin_range, __pyx_t_4 , NULL) ; if (unlikely(!__pyx_t_3)) { __pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 9__Pyx_GOTREF(__pyx_t_3); 10 __Pyx_DECREF(__pyx_t_4) ; __pyx_t_4 = 0 ; 11 if (likely(PyList_CheckExact(__pyx_t_3)) || PyTuple_CheckExact(__pyx_t_3) ) { 12 __pyx_t_4 = __pyx_t_3 ; __Pyx_INCREF(__pyx_t_4) ; __pyx_t_5 =0; 13 __pyx_t_6 = NULL; 14 } else { 15 __pyx_t_5 = 1; __pyx_t_4 = PyObject_GetIter (__pyx_t_3) ; ≠ if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;}

47 APPENDIX A. CYTHON CODE COMPILATION

16 __Pyx_GOTREF(__pyx_t_4) ; 17 __pyx_t_6 = Py_TYPE(__pyx_t_4) >tp_iternext ; if (unlikely ≠ (!__pyx_t_6)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 18 } 19 __Pyx_DECREF(__pyx_t_3) ; __pyx_t_3 = 0 ; 20 for (;;) { 21 if (likely(!__pyx_t_6)) { 22 if (likely(PyList_CheckExact(__pyx_t_4))) { 23 if (__pyx_t_5 >= PyList_GET_SIZE(__pyx_t_4)) break ; 24 #if CYTHON_COMPILING_IN_CPYTHON 25 __pyx_t_3 = PyList_GET_ITEM(__pyx_t_4 , __pyx_t_5) ; __Pyx_INCREF( __pyx_t_3 ) ; __pyx_t_5++; if (unlikely (0 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno =22;__pyx_clineno=__LINE__; goto __pyx_L1_error ;} 26 #else 27 __pyx_t_3 = PySequence_ITEM(__pyx_t_4 , __pyx_t_5) ; __pyx_t_5++; if (unlikely(!__pyx_t_3)) { __pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 28 __Pyx_GOTREF(__pyx_t_3) ; 29 #e n d i f 30 } else { 31 if (__pyx_t_5 >= PyTuple_GET_SIZE(__pyx_t_4)) break ; 32 #if CYTHON_COMPILING_IN_CPYTHON 33 __pyx_t_3 = PyTuple_GET_ITEM(__pyx_t_4 , __pyx_t_5) ; __Pyx_INCREF( __pyx_t_3 ) ; __pyx_t_5++; if (unlikely (0 < 0)) {__pyx_filename = __pyx_f[0]; __pyx_lineno =22;__pyx_clineno=__LINE__; goto __pyx_L1_error ;} 34 #else 35 __pyx_t_3 = PySequence_ITEM(__pyx_t_4 , __pyx_t_5) ; __pyx_t_5++; if (unlikely(!__pyx_t_3)) { __pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 36 __Pyx_GOTREF(__pyx_t_3) ; 37 #e n d i f 38 } 39 } else { 40 __pyx_t_3 = __pyx_t_6(__pyx_t_4) ; 41 if (unlikely(!__pyx_t_3)) { 42 PyObject exc_type = PyErr_Occurred() ;

48 A.1. LIST COMPREHENSION USING PYTHON OBJECT

43 if (exc_type) { 44 if (likely(exc_type ==PyExc_StopIteration || PyErr_GivenExceptionMatches(exc_type , PyExc_StopIteration))) PyErr_Clear() ; 45 else {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 46 } 47 break ; 48 } 49 __Pyx_GOTREF(__pyx_t_3) ; 50 } 51 __Pyx_XDECREF_SET(__pyx_v_x, __pyx_t_3) ; 52 __pyx_t_3 = 0 ; 53 __pyx_t_7 = __Pyx_GetModuleGlobalName (__pyx_n_s_Foo) ; if ( unlikely (!__pyx_t_7)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 54 __Pyx_GOTREF(__pyx_t_7) ; 55 __pyx_t_8 = NULL; 56 if (CYTHON_COMPILING_IN_CPYTHON && u n l i k e l y (PyMethod_Check (__pyx_t_7))) { 57 __pyx_t_8 = PyMethod_GET_SELF(__pyx_t_7) ; 58 if (likely(__pyx_t_8)) { 59 PyObject function = PyMethod_GET_FUNCTION(__pyx_t_7) ; 60 __Pyx_INCREF(__pyx_t_8) ; 61 __Pyx_INCREF( f u n c t i o n ) ; 62 __Pyx_DECREF_SET(__pyx_t_7 , f u n c t i o n ) ; 63 } 64 } 65 if (!__pyx_t_8) { 66 __pyx_t_3 = __Pyx_PyObject_CallOneArg (__pyx_t_7 , __pyx_v_x) ; if (unlikely(!__pyx_t_3)) {__pyx_filename =__pyx_f[0];__pyx_lineno=22;__pyx_clineno= __LINE__ ; goto __pyx_L1_error ;} 67 __Pyx_GOTREF(__pyx_t_3) ; 68 } else { 69 __pyx_t_9 = PyTuple_New(1+1); if (unlikely(!__pyx_t_9)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 70 __Pyx_GOTREF(__pyx_t_9) ; 71 __Pyx_GIVEREF(__pyx_t_8) ; PyTuple_SET_ITEM(__pyx_t_9 , 0 , __pyx_t_8) ; __pyx_t_8 = NULL; 72 __Pyx_INCREF(__pyx_v_x) ;

49 APPENDIX A. CYTHON CODE COMPILATION

73 __Pyx_GIVEREF(__pyx_v_x) ; 74 PyTuple_SET_ITEM(__pyx_t_9 , 0+1, __pyx_v_x) ; 75 __pyx_t_3 = __Pyx_PyObject_Call (__pyx_t_7 , __pyx_t_9 , NULL) ; if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno = 22; __pyx_clineno = __LINE__ ; goto __pyx_L1_error ;} 76 __Pyx_GOTREF(__pyx_t_3) ; 77 __Pyx_DECREF(__pyx_t_9) ; __pyx_t_9 = 0 ; 78 } 79 __Pyx_DECREF(__pyx_t_7) ; __pyx_t_7 = 0 ; 80 if (unlikely(__Pyx_ListComp_Append(__pyx_t_2, (PyObject) __pyx_t_3) ) ) {__pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno =22;__pyx_clineno=__LINE__; goto __pyx_L1_error ;} 81 __Pyx_DECREF(__pyx_t_3) ; __pyx_t_3 = 0 ; 82 } 83 __Pyx_DECREF(__pyx_t_4) ; __pyx_t_4 = 0 ; 84 __pyx_v_l = ( ( PyObject )__pyx_t_2) ; 85 __pyx_t_2 = 0 ; A.2 List Comprehension Using Cython Extension Type

Listing A.2. List Comprehension using Cython extension type 1__pyx_t_2=PyList_New(0);if (unlikely(!__pyx_t_2)) { __pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 2__Pyx_GOTREF(__pyx_t_2); 3__pyx_t_5=__pyx_v_amount; 4 for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) { 5__pyx_v_x=__pyx_t_6; 6__pyx_t_3=__Pyx_GetModuleGlobalName(__pyx_n_s_Foo); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f [0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 7__Pyx_GOTREF(__pyx_t_3); 8__pyx_t_7=__Pyx_PyInt_From_int(__pyx_v_x);if ( unlikely (!__pyx_t_7)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 9__Pyx_GOTREF(__pyx_t_7); 10 __pyx_t_8 = NULL; 11 if (CYTHON_COMPILING_IN_CPYTHON && u n l i k e l y ( PyMethod_Check(__pyx_t_3) ) ) {

50 A.2. LIST COMPREHENSION USING CYTHON EXTENSION TYPE

12 __pyx_t_8 = PyMethod_GET_SELF(__pyx_t_3) ; 13 if (likely(__pyx_t_8)) { 14 PyObject function = PyMethod_GET_FUNCTION( __pyx_t_3) ; 15 __Pyx_INCREF(__pyx_t_8) ; 16 __Pyx_INCREF( f u n c t i o n ) ; 17 __Pyx_DECREF_SET(__pyx_t_3 , f u n c t i o n ) ; 18 } 19 } 20 if (!__pyx_t_8) { 21 __pyx_t_4 = __Pyx_PyObject_CallOneArg (__pyx_t_3 , __pyx_t_7) ; if (unlikely(!__pyx_t_4)) { __pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 22 __Pyx_DECREF(__pyx_t_7) ; __pyx_t_7 = 0 ; 23 __Pyx_GOTREF(__pyx_t_4) ; 24 } else { 25 __pyx_t_9 = PyTuple_New(1+1); if (unlikely(! __pyx_t_9) ) {__pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 26 __Pyx_GOTREF(__pyx_t_9) ; 27 __Pyx_GIVEREF(__pyx_t_8) ; PyTuple_SET_ITEM(__pyx_t_9 ,0,__pyx_t_8);__pyx_t_8=NULL; 28 __Pyx_GIVEREF(__pyx_t_7) ; 29 PyTuple_SET_ITEM(__pyx_t_9 , 0+1, __pyx_t_7) ; 30 __pyx_t_7 = 0 ; 31 __pyx_t_4 = __Pyx_PyObject_Call (__pyx_t_3 , __pyx_t_9 ,NULL); if (unlikely(!__pyx_t_4)) { __pyx_filename = __pyx_f [ 0 ] ; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 32 __Pyx_GOTREF(__pyx_t_4) ; 33 __Pyx_DECREF(__pyx_t_9) ; __pyx_t_9 = 0 ; 34 } 35 __Pyx_DECREF(__pyx_t_3) ; __pyx_t_3 = 0 ; 36 if (unlikely(__Pyx_ListComp_Append(__pyx_t_2, ( PyObject)__pyx_t_4))) {__pyx_filename = __pyx_f [0]; __pyx_lineno = 22; __pyx_clineno = __LINE__; goto __pyx_L1_error ;} 37 __Pyx_DECREF(__pyx_t_4) ; __pyx_t_4 = 0 ; 38 } 39 __pyx_v_l = ( ( PyObject )__pyx_t_2) ; 40 __pyx_t_2 = 0 ;

51

Appendix B

Decompiled Java Class File Code

B.1 Dictionary Insert Using Python Dictionary

Listing B.1. Decompiled code from Java .class-file for dictionary insert test using Python dictionary 1var5_5=var1_1.getglobal("time").__getattr__("clock"). __call__( var2_2 ) ; 2var1_1.setlocal(6,var5_5); 3var5_5=null ; 4var1_1.setline(12); 5var5_5=var1_1.getlocal(5); 6var1_1.getlocal(0).__setitem__(var1_1.getlocal(4),var5_5); 7var5_5=null ; 8var1_1.setline(13); 9var5_5=var1_1.getglobal("time").__getattr__("clock"). __call__( var2_2 ) ; B.2 Dictionary Insert Using java.util.HashMap

Listing B.2. Decompiled code from Java .class-file for dictionary insert test using java.util.HashMap 1var5_5=var1_1.getglobal("time").__getattr__("clock"). __call__( var2_2 ) ; 2var1_1.setlocal(7,var5_5); 3var5_5=null ; 4var1_1.setline(14); 5var1_1.getlocal(3).__getattr__("put").__call__(var2_2, var1_1 . getlocal (5) , var1_1 . getlocal (6) ) ; 6var1_1.setline(15); 7var5_5=var1_1.getglobal("time").__getattr__("clock"). __call__( var2_2 ) ;

53

Appendix C

PyPy JIT Optimizations

C.1 List Comprehension Optimizations

Listing C.1. PyPy JIT optimizations for list comprehension using integers 1FOR_ITERto110 2i66=i65>=i41 3103bc2f1d:4839facmprdx,rdi 4103bc2f20:0f8d66020000jge0x103bc318c 5guard(i66isfalse) 6i67=i65+1 7103bc2f26:4c8d7201lear14,[rdx+0x1] 8STORE_FASTx 9LOAD_FASTx 10 LIST_APPEND 2 11 i6 8 = i 5 2 + 1 12 103bc2f2a:4c8d4901 lea r9,[rcx+0x1] 13 i69 = arraylen_gc(p59, descr=) 14 103bc2f2e: 4c8b6808 mov r13,QWORDPTR [rax+0x8] 15 i7 0 = i 6 9 < i6 8 16 103bc2f32:4d39cd cmp r13,r9 17 cond_call(i70, ConstClass( _ll_list_resize_hint_really_look_inside_iff__listPtr_Signed_Bool ), p49, i68, 1, descr=) 18 103bc2f35:7d3c jge 0x103bc2f73 19 103bc2f37: 49 bb 38 2c bc 03 01 movabs r11,0 x103bc2c38 20 103bc2f41: 4c895d20 mov QWORDPTR[rbp +0x20 ] , r11 21 103bc2f45: 48897d78 mov QWORDPTR[rbp +0x78 ] , r d i

55 APPENDIX C. PYPY JIT OPTIMIZATIONS

22 103bc2f49: 48895560 mov QWORDPTR[rbp +0x60 ] , rdx 23 103bc2f4d: 48894d50 mov QWORDPTR[rbp +0x50 ] , r c x 24 103bc2f51:4889df mov rdi,rbx 25 103bc2f54:4c89ce mov rsi,r9 26 103bc2f57:ba01000000 mov edx,0x1 27 103bc2f5c: 48 b8 60 35 04 00 01 movabs rax,0 x100043560 28 103bc2f66: 49 bb 20 23 bc 03 01 movabs r11,0 x103bc2320 29 103bc2f70:41ffd3 call r11 30 ((pypy.objspace.std.iterobject.W_AbstractSeqIterObject) p27) . inst_index = i67 31 103bc2f73: 4d89 74 24 08 mov QWORDPTR[r12 +0x8 ] , r14 32 103bc2f78: 49 bb 60 80 8d 02 01 movabs r11,0 x1028d8060 33 103bc2f82: 49833b00 cmp QWORDPTR[r11 ],0x0 34 103bc2f86: 0f 85 25 02 00 00 jne 0x103bc31b1 35 guard_no_exception(descr=) 36 p71 = getfield_gc_r(p49, descr=) 37 103bc2f8c: 4c8b6b10 mov r13,QWORDPTR [rbx+0x10] 38 setarrayitem_gc(p71, i52, i65, descr=) 39 103bc2f90: 49 89 54 cd 10 mov QWORDPTR[r13 +rcx8+0x10 ] , rdx 40 JUMP_ABSOLUTE 95 41 ((list)p49).length = i68 42 103bc2f95: 4c894b08 mov QWORDPTR[rbx +0x8 ] , r9 43 guard_not_invalidated(descr=) 44 i72 = getfield_raw_i(4337855760, descr=) 45 103bc2f99: 49 bb 10 6d 8e 02 01 movabs r11,0 x1028e6d10 46 103bc2fa3:498b0b mov rcx,QWORDPTR [r11] 47 i7 3 = i 7 2 < 0 48 103bc2fa6:4883f900 cmp rcx,0x0 49 103bc2faa: 0f 8c4b020000 jl 0x103bc31fb 50 guard ( i 7 3 i s f a l s e ) 51 FOR_ITER to 110

56 C.1. LIST COMPREHENSION OPTIMIZATIONS

52 jump(p0, p1, p6, p7, p8, p11, p13, p15, i65, p19, p21, p23 , p25 , p27 , p31 , p36 , i67 , i41 , p49 , i68 , p71 , descr=TargetToken(4357171472)) 53 103bc2fb0: 48 89 95 70 01 00 00 mov QWORDPTR [rbp +0x170 ] , rdx 54 103bc2fb7:4c89f2 mov rdx,r14 55 103bc2fba:4c89c9 mov rcx,r9 56 103bc2fbd:4c89e8 mov rax,r13 57 103bc2fc0: e94b ff ff ff jmp 0x103bc2f10 58 103bc2fc5: 66 66 2e 0f 1f 84 00 data16 nopWORDPTR cs :[ rax+rax1+0x0 ]

57

Appendix D

Charts

D.1 Dictionary Insert Tests

Figure D.1. Dictionary, Insert Integers, 100000-5000000 Interpreter Code, Dell 7.0 CPython

PyPy 6.0 Jython Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

59 APPENDIX D. CHARTS

Figure D.2. Dictionary, Insert Integers, 100000-5000000 Interpreter Code, Mac Pro

CPython 8.0

PyPy 7.0 Jython 6.0 Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.3. Dictionary, Insert Integers, 100000-5000000 Python Code, Dell

CPython PyPy 7.0

Jython 6.0 Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

60 D.1. DICTIONARY INSERT TESTS

Figure D.4. Dictionary, Insert Integers, 100000-5000000 Python Code, Mac Pro

CPython 8.0

PyPy 7.0 Jython 6.0 Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.5. Dictionary, Insert Objects, 100000-5000000 Interpreter Code, Mac Pro

CPython 14.0 PyPy 13.0 12.0 Jython 11.0 Cython 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 100000 1000000 5000000

61 APPENDIX D. CHARTS

Figure D.6. Dictionary, Insert Objects, 100000-5000000 Python Code, Dell

CPython 11.0 PyPy 10.0 Jython 9.0 Cython 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 100000 1000000 5000000

Figure D.7. Dictionary, Insert Objects, 100000-5000000 Python Code, Mac Pro

CPython 8.0

PyPy 7.0 Jython 6.0 Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

62 D.2. DICTIONARY MERGE ALL KEYS MATCH TESTS

D.2 Dictionary Merge All Keys Match Tests

Figure D.8. Dictionary, Merge All Match Integers, 100000-5000000 Interpreter Code, Dell 3.0 CPython 2.8 PyPy 2.6 2.4 Jython 2.2 Cython 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 100000 1000000 5000000

63 APPENDIX D. CHARTS

Figure D.9. Dictionary, Merge All Match Integers, 100000-5000000 Interpreter Code, Mac Pro

CPython 3.6 PyPy 3.2 Jython 2.8 Cython 2.4 2.0 1.6 1.2 0.8 0.4 0.0 100000 1000000 5000000

Figure D.10. Dictionary, Merge All Match Integers, 100000-5000000 Python Code, Dell

CPython 2.8 PyPy 2.6 2.4 Jython 2.2 Cython 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 100000 1000000 5000000

64 D.2. DICTIONARY MERGE ALL KEYS MATCH TESTS

Figure D.11. Dictionary, Merge All Match Integers, 100000-5000000 Python Code, Mac Pro

CPython 3.6 PyPy 3.2 Jython 2.8 Cython 2.4

2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

Figure D.12. Dictionary, Merge All Match Objects, 100000-5000000 Interpreter Code, Dell

CPython 18.0

PyPy 16.0 Jython 14.0 Cython 12.0

10.0

8.0

6.0

4.0

2.0

0.0 100000 1000000 5000000

65 APPENDIX D. CHARTS

Figure D.13. Dictionary, Merge All Match Objects, 100000-5000000 Interpreter Code, Mac Pro 8.0 CPython PyPy 7.0

Jython 6.0 Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.14. Dictionary, Merge All Match Objects, 100000-5000000 Python Code, Dell

CPython 18.0 PyPy 16.0 Jython 14.0 Cython 12.0

10.0

8.0

6.0

4.0

2.0

0.0 100000 1000000 5000000

66 D.2. DICTIONARY MERGE ALL KEYS MATCH TESTS

Figure D.15. Dictionary, Merge All Match Objects, 100000-5000000 Python Code, Mac Pro

CPython 8.0

PyPy 7.0 Jython 6.0 Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

67 APPENDIX D. CHARTS

D.3 Dictionary Merge Half Keys Match Tests

Figure D.16. Dictionary, Merge Half Match Integers, 100000-5000000 Interpreter Code, Dell

CPython 2.8 2.6 PyPy 2.4 Jython 2.2 Cython 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 100000 1000000 5000000

68 D.3. DICTIONARY MERGE HALF KEYS MATCH TESTS

Figure D.17. Dictionary, Merge Half Match Integers, 100000-5000000 Interpreter Code, Mac Pro

CPython 5.0 PyPy Jython 4.0 Cython 3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.18. Dictionary, Merge Half Match Integers, 100000-5000000 Python Code, Dell

CPython 2.8 2.6 PyPy 2.4 Jython 2.2 Cython 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 100000 1000000 5000000

69 APPENDIX D. CHARTS

Figure D.19. Dictionary, Merge Half Match Integers, 100000-5000000 Python Code, Mac Pro 6.0 CPython PyPy 5.0 Jython Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.20. Dictionary, Merge Half Match Objects, 100000-5000000 Interpreter Code, Dell

CPython PyPy 6.0 Jython 5.0 Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

70 D.3. DICTIONARY MERGE HALF KEYS MATCH TESTS

Figure D.21. Dictionary, Merge Half Match Objects, 100000-5000000 Interpreter Code, Mac Pro 6.0 CPython PyPy 5.0 Jython Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.22. Dictionary, Merge Half Match Objects, 100000-5000000 Python Code, Dell

CPython 6.0 PyPy Jython 5.0 Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

71 APPENDIX D. CHARTS

Figure D.23. Dictionary, Merge Half Match Objects, 100000-5000000 Python Code, Mac Pro

CPython

PyPy 5.0 Jython Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

72 D.4. DICTIONARY MERGE NO KEYS MATCH TESTS

D.4 Dictionary Merge No Keys Match Tests

Figure D.24. Dictionary, Merge None Match Integers, 100000-5000000 Interpreter Code, Dell 2.8 CPython 2.6 PyPy 2.4 Jython 2.2 2.0 Cython 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 100000 1000000 5000000

73 APPENDIX D. CHARTS

Figure D.25. Dictionary, Merge None Match Integers, 100000-5000000 Interpreter Code, Mac Pro

CPython

PyPy 5.0 Jython Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.26. Dictionary, Merge None Match Integers, 100000-5000000 Python Code, Dell

CPython 2.8 2.6 PyPy 2.4 Jython 2.2 Cython 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 100000 1000000 5000000

74 D.4. DICTIONARY MERGE NO KEYS MATCH TESTS

Figure D.27. Dictionary, Merge None Match Integers, 100000-5000000 Python Code, Mac Pro

CPython

PyPy 5.0 Jython Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.28. Dictionary, Merge None Match Objects, 100000-5000000 Interpreter Code, Dell

CPython 3.2 PyPy 2.8 Jython Cython 2.4 2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

75 APPENDIX D. CHARTS

Figure D.29. Dictionary, Merge None Match Objects, 100000-5000000 Interpreter Code, Mac Pro 6.0 CPython PyPy 5.0 Jython Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.30. Dictionary, Merge None Match Objects, 100000-5000000 Python Code, Dell

CPython 3.2 PyPy 2.8 Jython Cython 2.4 2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

76 D.4. DICTIONARY MERGE NO KEYS MATCH TESTS

Figure D.31. Dictionary, Merge None Match Objects, 100000-5000000 Python Code, Mac Pro

CPython

PyPy 5.0 Jython Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

77 APPENDIX D. CHARTS

D.5 Dictionary Overwrite Tests

Figure D.32. Dictionary, Overwrite Integers, 100000-5000000 Interpreter Code, Dell

CPython 16.0

PyPy 14.0 Jython 12.0 Cython 10.0

8.0

6.0

4.0

2.0

0.0 100000 1000000 5000000

78 D.5. DICTIONARY OVERWRITE TESTS

Figure D.33. Dictionary, Overwrite Integers, 100000-5000000 Interpreter Code, Mac Pro

CPython 3.6 PyPy 3.2 Jython Cython 2.8 2.4

2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

Figure D.34. Dictionary, Overwrite Integers, 100000-5000000 Python Code, Dell

CPython 4.0 PyPy Jython 3.0 Cython

2.0

1.0

0.0 100000 1000000 5000000

79 APPENDIX D. CHARTS

Figure D.35. Dictionary, Overwrite Integers, 100000-5000000 Python Code, Mac Pro

CPython 3.6

PyPy 3.2 Jython 2.8 Cython 2.4

2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

Figure D.36. Dictionary, Overwrite Objects, 100000-5000000 Interpreter Code, Dell

CPython 2.2 PyPy 2.0 Jython 1.8 Cython 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 100000 1000000 5000000

80 D.5. DICTIONARY OVERWRITE TESTS

Figure D.37. Dictionary, Overwrite Objects, 100000-5000000 Interpreter Code, Mac Pro

CPython 3.6

PyPy 3.2 Jython 2.8 Cython 2.4

2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

Figure D.38. Dictionary, Overwrite Objects, 100000-5000000 Python Code, Dell

CPython 2.2 PyPy 2.0 Jython 1.8 Cython 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 100000 1000000 5000000

81 APPENDIX D. CHARTS

Figure D.39. Dictionary, Overwrite Objects, 100000-5000000 Python Code, Mac Pro

CPython 3.6

PyPy 3.2 Jython 2.8 Cython 2.4

2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

82 D.6. DICTIONARY READ TESTS

D.6 Dictionary Read Tests

Figure D.40. Dictionary, Read Integers, 100000-5000000 Interpreter Code, Mac Pro

CPython 8.0

PyPy 7.0 Jython 6.0 Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

83 APPENDIX D. CHARTS

Figure D.41. Dictionary, Read Integers, 100000-5000000 Python Code, Mac Pro 6.0 CPython PyPy 5.0 Jython

Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.42. Jython unable to complete within time frame for n equal to 5 000 000. Dictionary, Read Objects, 100000-5000000 Interpreter Code, Mac Pro

CPython 6.0 PyPy Jython 5.0 Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

84 D.6. DICTIONARY READ TESTS

Figure D.43. Jython unable to complete within time frame for n equal to 5 000 000. Dictionary, Read Objects, 100000-5000000 Python Code, Mac Pro

CPython 6.0 PyPy Jython 5.0 Cython 4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

85 APPENDIX D. CHARTS

D.7 Generator Tests

Figure D.44. Generator, Generate Integers, 100000-5000000 Interpreter Code, Mac Pro

CPython 7.0 PyPy

Jython 6.0 Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

86 D.7. GENERATOR TESTS

Figure D.45. Generator, Generate Integers, 100000-5000000 Python Code, Mac Pro

CPython 7.0 PyPy 6.0 Jython

Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.46. Generator, Generate Objects, 100000-5000000 Interpreter Code, Mac Pro 8.0 CPython PyPy 7.0 Jython 6.0 Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

87 APPENDIX D. CHARTS

Figure D.47. Generator, Generate Objects, 100000-5000000 Python Code, Mac Pro

CPython 7.0 PyPy Jython 6.0 Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

88 D.8. LIST APPEND TESTS

D.8 List Append Tests

Figure D.48. List, Append, Integers 100000-5000000 Interpreter Code, Dell 1.5 CPython 1.4 PyPy 1.3 Jython 1.2 1.1 Cython 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 100000 1000000 5000000

89 APPENDIX D. CHARTS

Figure D.49. List, Append, Integers 100000-5000000 Interpreter Code, Mac Pro

CPython 3.2 PyPy Jython 2.8 Cython 2.4

2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

Figure D.50. List, Append, Integers 100000-5000000 Python Code, Mac Pro

CPython 3.2 PyPy 2.8 Jython 2.4 Cython 2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

90 D.8. LIST APPEND TESTS

Figure D.51. List, Append, Objects 100000-5000000 Interpreter Code, Dell

CPython 1.8 PyPy 1.6 Jython Cython 1.4 1.2

1.0

0.8

0.6

0.4

0.2

0.0 100000 1000000 5000000

Figure D.52. List, Append, Objects 100000-5000000 Interpreter Code, Mac Pro

CPython 3.2 PyPy 2.8 Jython 2.4 Cython 2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

91 APPENDIX D. CHARTS

Figure D.53. List, Append, Objects 100000-5000000 Python Code, Mac Pro

CPython 3.2 PyPy 2.8 Jython Cython 2.4 2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

92 D.9. LIST COMPREHENSION TESTS

D.9 List Comprehension Tests

Figure D.54. List, List Comprehension, Integers 100000-5000000 Interpreter Code, Mac Pro

CPython 1.6 PyPy Jython 1.4

Cython 1.2

1.0

0.8

0.6

0.4

0.2

0.0 100000 1000000 5000000

93 APPENDIX D. CHARTS

Figure D.55. List, List Comprehension, Integers 100000-5000000 Python Code, Mac Pro

CPython PyPy 1.6 Jython 1.4

Cython 1.2

1.0

0.8

0.6

0.4

0.2

0.0 100000 1000000 5000000

Figure D.56. List, List Comprehension, Objects 100000-5000000 Interpreter Code, Mac Pro

CPython 10.0 PyPy 9.0 Jython 8.0 Cython 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 100000 1000000 5000000

94 D.9. LIST COMPREHENSION TESTS

Figure D.57. List, List Comprehension, Objects 100000-5000000 Python Code, Mac Pro

CPython 10.0 PyPy 9.0 Jython 8.0 Cython 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 100000 1000000 5000000

95 APPENDIX D. CHARTS

D.10 List Sort Tests

Figure D.58. List, Sort, Integers 100000-5000000 Interpreter Code, Mac Pro

CPython PyPy 4.0 Jython Cython 3.0

2.0

1.0

0.0 100000 1000000 5000000

96 D.10. LIST SORT TESTS

Figure D.59. List, Sort, Integers 100000-5000000 Python Code, Mac Pro

CPython PyPy 4.0 Jython Cython 3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.60. List, Sort, Objects 100000-5000000 Interpreter Code, Mac Pro

CPython 12.0 PyPy 11.0 Jython 10.0 Cython 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 100000 1000000 5000000

97 APPENDIX D. CHARTS

Figure D.61. List, Sort, Objects 100000-5000000 Python Code, Mac Pro

CPython 12.0 PyPy 11.0 Jython 10.0 Cython 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 100000 1000000 5000000

98 D.11. OBJECTS ADD TESTS

D.11 Objects Add Tests

Figure D.62. Objects, Add 100000-5000000 Interpreter Code, Dell

CPython 3.6 PyPy 3.2

Jython 2.8 Cython 2.4

2.0

1.6

1.2

0.8

0.4

0.0 100000 1000000 5000000

99 APPENDIX D. CHARTS

Figure D.63. Objects, Add 100000-5000000 Interpreter Code, Mac Pro

CPython 7.0 PyPy 6.0 Jython Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.64. Objects, Add 100000-5000000 Python Code, Dell 2.6 CPython 2.4 PyPy 2.2 Jython 2.0 Cython 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 100000 1000000 5000000

100 D.11. OBJECTS ADD TESTS

Figure D.65. Objects, Add 100000-5000000 Python Code, Mac Pro 7.0 CPython

PyPy 6.0 Jython Cython 5.0

4.0

3.0

2.0

1.0

0.0 100000 1000000 5000000

101 APPENDIX D. CHARTS

D.12 Objects Generate Tests

Figure D.66. Objects, Generate 100000-5000000 Interpreter Code, Mac Pro 14.0 CPython 13.0 PyPy 12.0 Jython 11.0 Cython 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 100000 1000000 5000000

102 D.12. OBJECTS GENERATE TESTS

Figure D.67. Objects, Generate 100000-5000000 Python Code, Mac Pro

CPython 13.0 PyPy 12.0 Jython 11.0 10.0 Cython 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 100000 1000000 5000000

103 APPENDIX D. CHARTS

D.13 Tuple Append Tests

Figure D.68. Tuple, Append, Integers 100000-1000000 Interpreter Code, Mac Pro 900.0 CPython PyPy 800.0 Jython 700.0

Cython 600.0

500.0

400.0

300.0

200.0

100.0

0.0 100000 1000000

104 D.13. TUPLE APPEND TESTS

Figure D.69. Tuple, Append, Integers 100000-1000000 Python Code, Mac Pro 900.0 CPython PyPy 800.0 Jython 700.0

Cython 600.0

500.0

400.0

300.0

200.0

100.0

0.0 100000 1000000

Figure D.70. Tuple, Append, Objects 100000-1000000 Interpreter Code, Mac Pro 900.0 CPython PyPy 800.0 Jython 700.0

Cython 600.0

500.0

400.0

300.0

200.0

100.0

0.0 100000 1000000

105 APPENDIX D. CHARTS

Figure D.71. Tuple, Append, Objects 100000-1000000 Python Code, Mac Pro 900.0 CPython PyPy 800.0 Jython 700.0

Cython 600.0

500.0

400.0

300.0

200.0

100.0

0.0 100000 1000000

106 D.14. TUPLE SORT TESTS

D.14 Tuple Sort Tests

Figure D.72. Tuple, Sort, Integers 100000-5000000 Interpreter Code, Mac Pro

CPython PyPy 4.0 Jython Cython 3.0

2.0

1.0

0.0 100000 1000000 5000000

107 APPENDIX D. CHARTS

Figure D.73. Tuple, Sort, Integers 100000-5000000 Python Code, Mac Pro

CPython PyPy 4.0 Jython Cython 3.0

2.0

1.0

0.0 100000 1000000 5000000

Figure D.74. Tuple, Sort, Objects 100000-5000000 Interpreter Code, Mac Pro

CPython 12.0 PyPy 11.0 Jython 10.0 Cython 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 100000 1000000 5000000

108 D.14. TUPLE SORT TESTS

Figure D.75. Tuple, Sort, Objects 100000-5000000 Python Code, Mac Pro

CPython 12.0 PyPy 11.0 Jython 10.0 Cython 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 100000 1000000 5000000

109 www.kth.se