Understanding the Potential of Interpreter-Based Optimizations for Python

Understanding the Potential of Interpreter-Based Optimizations for Python

Understanding the Potential of Interpreter-based Optimizations for Python UCSB Technical Report #2010-14 August 11, 2010 Nagy Mostafa Chandra Krintz Calin Cascaval David Edelsohn Priya Nagpurkar Peng Wu Computer Science Department IBM T.J.Watson Research Center University of California, Santa Barbara Yorktown Heights, NY {nagy,ckrintz}@cs.ucsb.edu {cascaval,edelsohn,pnagpurkar,pengwu}@us.ibm.com ABSTRACT used for embedded server-side and client-side embedded scripting, The increasing popularity of scripting languages as general pur- respectively; Google uses JavaScript for their desktop applications, pose programming environments calls for more efficient execu- including GMail and Google Docs, while PHP is the most com- tion. Most of these languages, such as Python, Ruby, PHP, and monly used language for writing wiki software [30]. JavaScript are interpreted. Interpretation is a natural implemen- Development frameworks (e.g. Rails for Ruby and PHP tation given the dynamic nature of these languages and interpreter (TRAX), Django for Python, etc.), high level syntax, and language portability has facilitated wide-spread use. In this work, we analyze dynamism (runtime resolution of variable types and extensive use the performance of CPython, a commonly used Python interpreter, of runtime reflection) all play a key role in the use of dynamic lan- to identify major sources of overhead. Based on our findings, we guages for increasingly complex applications. investigate the efficiency of a number of optimizations and explore Although researchers and open source efforts continue to pur- the design options and trade-offs involved. sue ways of dynamically compiling these programs [21, 22, 26, 29, 11, 9], the most popular versions of the distributed runtimes Keywords for dynamic scripting languages employ interpreters for execution. dynamic scripting language, performance profiling, language inter- Interpretation plays a key role in the wide-spread use of these pretation languages – the ease of development facilitates rapid prototyping and language evolution. Moreover, the runtime is architecture- 1. INTRODUCTION independent (e.g., written in C) and thus can be compiled to any system without significant porting effort, as opposed to retreating a Scripting languages such as Python, Ruby, PHP, and JavaScript compiler or JIT system. Even when compilation is an option, dy- have experienced rapid uptake in the commercial sector for soft- namic languages are challenging to compile because types are not ware development and general widespread use in recent years. The known statically, and must be inferred and speculatively translated. popularity of these languages stems from two factors: 1) rapid pro- Interpretation, however, is an inherently inefficient execution en- totyping through high level syntax, language flexibility and dy- gine. The interpreter must decode and dispatch each bytecode namism, and a large collection of frameworks and libraries, thus (compact representation of source instructions). This incurs an enabling programmer productivity; and 2) cross-platform portabil- overhead not present in the execution of native (compiled) code. In ity – interpreters and runtime systems are available on a large num- addition, interpretation of each bytecode is done independent of ev- ber of systems. These features make it easy for non-experts and ery other, resulting in poor code quality compared to code produced experts alike to employ these languages quickly for a wide variety by even the simplest code generator. There has been significant of tasks and to deploy fairly complex applications on a variety of research to identify ways of optimizing the interpretation process platforms. for non-scripting, statically typed, high-level languages (e.g. Java, Many of these languages were originally designed for “script- OCaML), while maintaining the portability of the runtime [23, 17, ing”, i.e., for writing short programs for text processing or as “glue 7, 31]. Such approaches target the interpreter dispatch loop since code” between (or embedded within) modules or components writ- significant time is spent there for these languages. ten in other languages. Recently, they are increasingly employed Modern scripting languages however are very different from for more complex, general-purpose, and self-contained applica- OCaML and Java given their dynamic nature. In this paper, we tions. For example, Python is used in Linux for software pack- show that dispatch optimization provides only minimal benefits to aging and distribution, for peer-to-peer sharing [3] and computer Python programs. We, hence, investigate the primary sources of aided design [8, 19], for cloud computing [10, 5] as well as compu- overhead and based on our findings, we identify and evaluate a set tationally intensive tasks [24]. PHP and JavaScript are commonly of optimizations that target these sources of overhead in an attempt to improve the performance of Python programs. Using caching of attributes to avoid name lookup, elimination of loads/stores to/from the operand stack, and inlining, we are able to extract performance improvement of up to 28%. 2. BACKGROUND Python is a general-purpose high-level object-oriented program- ming language with dynamic typing. There are several implemen- Benchmark Description 9 2to3 A Python 2 to Python 3 translator translating itself 8 django Django Python web framework building a 150x150-cell HTML table 7 html5lib Parse the HTML 5 specification using html5lib 6 pickle Use the pure-Python pickle module to pickle a variety of datasets 5 pybench Run the standard Python PyBench benchmark suite richards The classic Richards benchmark 4 rietveld Macrobenchmark for Django using the Rietveld [20] code review app % 3 spambayes Run a canned mailbox through a SpamBayes [25] ham/spam classifier 2 unpickle Uses the cPickle module to un-serialize a variety of datasets 1 0 Table 1: The Unladen-Swallow Benchmarks -1 tations of the Python language [13, 14, 27]. In this work, we focus on the most widely used reference implementation: CPython [6]. The CPython is a simple implementation of the language written Figure 1: Effect of Direct Threaded Interpretation on CPython in C that is portable and easy to understand and extend. CPython employs switch-based interpretation and a combination of simple reference counting and cycle-detecting garbage collection. It com- DTI is a technique that replaces the opcode of every bytecode piles Python source to high-level type-generic bytecodes that run (upon first execution or ahead-of-time) with the address or label of on a stack-based virtual machine. Being type-generic bytecodes, its handler [1]. This replacement increases the size of each instruc- they must defer associating abstract operations with specific im- tion (and still requires replicated code for decoding at the end of plementations until runtime. This is carried out via a sequence each handler) but reduces the overhead of indirect branches in two of type-checks and indirect-branches which makes bytecode han- ways. (1) For every indirect branch, the number of possible targets dling slower than for statically-typed languages. CPython performs decreases, which enhances target prediction. (2) Any biased rela- name-based late binding on variables and methods (attributes). Ev- tion between opcodes is exploited. Such optimization and others ery Python object has a dictionary (a hash table), that maps at- reduce the overhead of conditional and indirect branches, which tributes names to values. When loading an attribute, a chain of hash are typically hard to predict and are thus costly on modern archi- table lookups are performed on the receiver object and its type and tectures, and proves efficient for statically-typed languages where super-types. Not only attributes are stored in dictionaries, so are the dispatch process is the bottleneck [7, 2]. global variables. Figure 1 shows the effect of Direct-Threaded interpretation on CPython also implements descriptors which are setter/getter ob- CPython performance. On average, there is a small speedup of jects that are used to associate an action with attribute access. 2.5%. For most benchmarks, speedup is around 2% and in one Among these actions is binding methods to objects dynamically at case we actually get a slowdown. This results validates our claim runtime. When a method is loaded from an object dictionary, its de- and shows that bytecode dispatching is not a bottleneck anymore scriptor is found instead which, when invoked, created the method for interpreters with high-level of abstraction. object on-the-fly and binds it to the receiver. To understand, at a high level, where time is being spent in Python programs, we classify the bytecodes into several classes 3. METHODOLOGY based on their actions. For each class, we measure the number To understand the behavior of Python programs, their sources of of bytecodes executed and the percentage of time spent there. The overhead and the impact of interpreter optimizations on CPython, classes we use are as follows: we evaluate the performance of benchmarks out of the Un- Type Resolution and Field Access includes LOAD/STORE_ATTR LOAD/STORE_GLOBAL laden Swallow project [28] which are listed and described in Ta- and . When given LOAD/STORE_ATTR ble 1. These benchmarks exercise common activities found in an attribute name and a receiver object, Python programs including parsing and translation, (de-)serializing perform the necessary work to resolve the attribute name. This datasets, and HTML manipulation. Also included in this list is operation is expensive and involves a number of indirections pybench, which implements a set of microbenchmarks that ex- and dictionary lookups, specially if the attribute lies up in the ercise low level Python activities including function calls, compar- inheritance chain. LOAD/STORE_GLOBAL ison operators, looping constructs, string manipulation, basic arith- perform the same task for global vari- metic, and others [18]. ables. They look for the attribute name in the global dictio- We modified the CPython-2.6 [6] source to collect a variety of nary, if not found they look for it in the builtins dictionary. profiles and to experiment with different optimizations.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us