Optimizing Tcl Bytecode

Optimizing Tcl Bytecode Donal Fellows The University of Manchester / Tcl Core Team [email protected] 1. Abstract The Tcl interpreter has an evaluation strategy of parsing a script into a sequence of commands, and compiling each of those commands into a sequence of bytecodes that will produce the result of the command. I have made a number of extensions to the scope of commands that are handled this way over the years, but in 2012 I started looking at a new way to do the compilation, with an aim to eventually creating an "interpreter" suitable for Tcl 9. This paper looks at the changes made (some of which are present in 8.6.0, and the rest of which will appear in 8.6.1) and the prospects for future directions. and continue, both of which had long been 1. Introduction known to be problematic in cases previously During the development of Tcl 8.6, Kevin only known in an operational sense. Kenny and Ozgur Dogan Ugurlu demonstrat- When we considered the implications of this, ed[1] (through the implementation of the we realized that it also made it substantially command ::tcl::unsupported::assemble) that it easier to consider compilation of Tcl to actual was possible to create an assembler for Tcl native code. Previous attempts[2] had focused bytecodes that was sufficiently safe that it was on a simplistic transformation of the existing suitable for exposure in a Safe Interpreter. In bytecodes to their machine-code equivalents, particular, it became clear that there were a but that is a strategy that is unlikely to yield set of constraints that could be applied that significant benefits for several reasons: would ensure that a Tcl assembler would never generate code that could crash; access to 1. The bytecode that they are starting parts of the stack not “owned” by the code from is significantly non-optimal in the was prohibited. Though infinite loops were first place. still possible — excluding infinite loops re- 2. Tcl commands are potentially highly quires either totally emasculating the capabili- dynamic, with the option for their im- ties of Tcl or the possibility of proving excep- plementations to be changed substan- tionally complex mathematical theorems — tially as a script executes. those loops would never exceed properly cal- 3. The value-model of Tcl is very strong- culable stack bounds. ly rooted in the concept of an immuta- I found this absolutely fascinating, as it ble reference with typed views, which showed that the bytecode that we had been is substantially different to that of ma- generating was not nearly as unruly as I had chine code (mutable references to ma- previously feared; my problems with writing chine words) or languages like C (mu- compilation commands such as for switch and table references to typed variables). dict had been more due to my not being aware The combination of these issues means that of those implicit constraints, rather than their compilation of Tcl to machine code is a sig- absence. It also allowed us to quantify exactly nificant challenge. This paper will be primari- what was wrong with the compilation of break ly looking at dealing with the first part of this the result of the command being pushed on problem, so that the bytecode that the compi- the stack; in the simplest case, a command is lation starts from is at least a stronger founda- compiled into a push of all the argument tion. The advantage of working on this part is words and an invoke via Tcl’s basic com- that the results of doing this can be made mand dispatch, which will in turn replace available to the community more rapidly than those argument words with the single result the other parts; full compilation will require a value. The pushing of the argument words lot more work to support than tweaking may be non-trivial if the words are complex things to work better within current con- compounds of various substitutions, but the straints. overall model is comparatively simple. In this paper, I present a summary of Tcl’s When this simple universal execution strategy bytecode system in Section 2. In Section 3, I is used, the command in question is referred describe how I have been improving the cov- to as uncompiled. This is obviously not actual- erage of commands that bytecode is generated ly true — we have just discussed what the for. In Section 4, I describe the improvements compilation is! — but the key is that the actu- I have made to the bytecode generated for a al embodiment of the semantics of the com- number of existing Tcl commands. In Section mand is still the standard implementation 5, I talk about an improvement to the intro- function; all that is bytecode-compiled is the spection tools for Tcl bytecode so as to more assembly of the arguments and the lookup simply expose the information that is there to and invocation of that function. With a scripts. In Section 6, I describe the simple op- “compiled” command, the semantics of the timizer that I have created for Tcl bytecode, command are embodied by a direct sequence and in Section 7 I present some performance of bytecode instructions. Those instructions measurements to examine whether I am mak- will produce the result of the command with- ing any progress on the performance front. out (typically) going through command dis- Finally, I examine possible future directions patch. in Section 8. Command compilation 2. About Bytecode Each compiled command has its own strategy Tcl’s current system of compilation uses a for producing instructions, the command custom target called Bytecode[3], designed compiler, which is asked to consider how to primarily to be an in-memory and on-disk da- do the compilation in a particular case. Any ta structure1 that has minimal space consump- failure of the command compiler will cause tion as a primary goal. the standard compilation to be used, which is also used when the command name itself is The fundamental model of bytecode execu- dynamically generated (e.g., the value of a tion is that there is a stack of values that rep- variable or the result of a command), as at resent intermediate working values, argu- that point Tcl is unable to statically determine ments and results. Every command becomes the command compiler to use.2 a sequence of instructions that ends up with 1 Support for what became the TclPro compiler 2 It is a consequence of this that TclOO instance and the tbcload extension was part of the original dispatch is unlikely to ever be compiled; a key us- mandate, though it is not part that is officially age pattern of TclOO is to hold the name of the supported for general free use. However, the con- instance to invoke in a variable, clearly a case that sequences of that support are subtly scattered cannot ever be compiled to anything substantially through the code. better than the current dispatch mechanism. Command compilers can fail for many rea- 3. Improving Coverage sons. One of the main reasons is if one of In order to improve the overall generation of their arguments is not a literal despite the bytecode by Tcl, it is necessary to increase the compiler requiring it to be; this is what hap- fraction of Tcl code that can be compiled to pens when any of the arguments to while is pure bytecode. That is, an instruction se- not a literal. The other two common reasons quence is pure bytecode if it does not contain for failure are if there is a lack of a local vari- any of the instructions invokeStk1, invokeStk4 able table (LVT) in the compilation context, (commonly just referred to as invokeStk, as or if given the wrong number of arguments, they are a linked pair) or 3. though this is very much not an exhaustive invokeExpanded set. Script fragments that have their instruction sequences entirely free of those instructions The lack of an LVT case requires some expla- are entirely predictable in their behaviour, at nation. The local variable table is a numerical- least at an operational/type-theoretic level. ly indexed collection of variables that is used to But what was the status of Tcl’s compilation hold the formal arguments to a procedure, the of commands back in mid-2012? Well, there local variables inside that procedure, and had been some adjustments done during the whatever extra information is necessary (local development of 8.6, but they had been rather temporary variables that hold values in pat- piecemeal. After the introduction of compila- terns that would interfere with the stack). tion strategies for subst (in 2009), unset (in Some instructions for compiling commands 2010) and (in 2011), not much had only exist in a form that accesses the local dict with variable table, so compilations that necessari- really changed; the set of scripts that would be ly use those instructions cannot be done with- likely to become pure bytecode was indeed out an LVT present: this is exactly why very small. foreach is not efficient except when used in a Improving key loop types procedure (or other procedure-like entity, such as a lambda term or TclOO method). To change this, I instead took a different tack and looked at scripts where I wanted them to Exposure of bytecode in Tcl scripts become pure bytecode. An example of the sort There has been a disassembler for Tcl of script that I wanted to be pure was an inner bytecodes since they were introduced in Tcl loop of a simple value-generating coroutine.

Load more