<<

1 Static TypeScript 56 2 57 3 An Implementation of a Static for the TypeScript Language 58 4 59 5 60 6 Thomas Ball Peli de Halleux Michał Moskal 61 7 Research Microsoft Research 62 8 Redmond, WA, United States Redmond, WA, United States Redmond, WA, United States 63 9 [email protected] [email protected] [email protected] 64 10 Abstract 65 11 66 12 While the programming of microcontroller-based embed- 67 13 dable devices typically is the realm of the language, such 68 14 devices are now finding their way into the classroom forCS 69 15 education, even at the level of middle school. As a result, the 70 16 use of scripting languages (such as JavaScript and Python) 71 17 for microcontrollers is on the rise. 72 18 We present Static TypeScript (STS), a subset of TypeScript (a) (b) 73 19 (itself, a gradually typed superset of JavaScript), and its com- 74 20 piler/linker toolchain, which is implemented fully in Type- Figure 1. Two Cortex-M0 microcontroller-based educational 75 21 Script and runs in the . STS is designed to be use- devices: (a) the BBC micro:bit has a Nordic nRF51822 MCU 76 22 ful in practice (especially in education), while being amenable with 16 kB RAM and 256 kB flash; (b) Adafruit’s Circuit Play- 77 23 to static compilation targeting small devices. A user’s STS ground Express (https://adafruit.com/products/3333) has an 78 24 program is compiled to machine code in the browser and Atmel SAMD21 MCU with 32 kB RAM and 256 kB flash. 79 25 linked against a precompiled C++ runtime, producing an ex- 80 26 ecutable that is more efficient than the prevalent embedded 81 approach, extending battery life and making it 27 girls, increases confidence in both students and teachers, and 82 possible to run on devices with as little as 16 kB of RAM 28 makes lessons more fun [2, 16]. 83 (such as the BBC micro:bit). 29 To keep costs low for schools, these devices typically em- 84 This paper is primarily a description of the STS system 30 ploy 32 bit ARM Cortex-M microcontrollers (MCUs) with 85 and the technical challenges of implementing embedded 31 16-256kB of RAM and are programmed using an external 86 programming platforms in the classroom. 32 computer (usually a laptop or desktop). Programming such 87 33 Keywords JavaScript, TypeScript, compiler, interpreter, mi- devices in a classroom presents a number of technical chal- 88 34 crocontrollers, virtual machine lenges: 89 35 90 (1) the selection/design of an age-appropriate program- 36 91 ming language and environment; 37 1 Introduction 92 (2) classroom computers running outdated operating sys- 38 Recently, physical computing has been making headway in 93 tems, having intermittent and slow internet connec- 39 the classroom, engaging children to build simple interactive 94 tivity, and locked down by school IT administrators, 40 embedded systems. For example, Figure 1(a) shows the BBC 95 which makes native app installation difficult; 41 micro:bit [1], a small programmable Arduino-inspired com- 96 (3) the transfer of the student’s program from the com- 42 puter with an integrated 5x5 LED display, several sensors 97 puter to the device, where it can run on battery power 43 and Bluetooth Low Energy (BLE) radio technology. The de- 98 (as many projects embed the device in an experiment 44 vice first rolled out in 2015 to all year 7 students (age 10to 99 or “make”). 45 11) in the UK and has since gone global, with four million 100 46 units distributed worldwide to date via the micro:bit Educa- With respect to these challenges, there are various embed- 101 47 tion Foundation (https://microbit.org). Figure 1(b) shows a ded interpreters for popular scripting languages, such as 102 48 different educational device featuring RGB LEDs: Adafruit’s JavaScript (JerryScript [8, 15], Duktape [22], Espruino [23], 103 49 Circuit Playground Express (CPX). mJS [20], and MuJS [19]) and Python (MicroPython [9] and 104 50 Research suggests that using such devices in computer its fork CircuitPython [12]). The interpreters run directly on 105 51 science education increases engagement, especially among the MCU, requiring just the transfer of program text from the 106 52 host computer, but forego the benefits of advanced optimiz- 107 53 MPLR 2019, Under submission, ing JIT (such as V8) that require about two orders 108 54 . of magnitude more memory than is available on MCUs. 109 55 1 110 MPLR 2019, Under submission, Thomas Ball, Peli de Halleux, and Michał Moskal

111 • the STS compiler generates surprisingly efficient and 166 112 compact machine code, which unlocks a range of ap- 167 113 plication domains such as game programming for low- 168 114 resource devices such as those in Figure 2, all of which 169 115 were enabled by STS. 170 116 Deployment of STS user programs to embedded devices 171 117 does not require app or installation, just access 172 118 to a web browser. Compiled programs appear as downloads, 173 119 which are then transferred manually by the user to the device, 174 120 which appears as a USB mass storage device, via file copy 175 121 (or directly through WebUSB, an upcoming standard for 176 122 connecting websites to physical devices). 177 123 The relatively simple compilation scheme for STS (pre- 178 124 sented in Section 3) leads to surprisingly good performance 179 125 on a collection of small JavaScript benchmarks, often com- 180 126 parable to advanced, state of the art JIT compilers like V8, 181 127 with orders of magnitude smaller memory requirements (see 182 128 Section 4). It is also at least an order of magnitude faster 183 129 Figure 2. Three microcontroller-based game handhelds with than the embedded interpreted approach. A novel aspect of 184 130 160x120 color screens. These boards use ARM’s Cortex-M4F evaluation is a comparison of different strategies for dealing 185 131 core: the ATSAMD51G19 (192kB RAM, running at 120Mhz) with field/method lookup spanning classes, interfaces, and 186 132 and STM32F401RE (96kB RAM, running at 84Mhz). dynamic maps. 187 133 188 134 1.2 MakeCode: Easy Embedded for Education 189 135 Unfortunately, such embedded interpreters are between 190 STS is the core language supported by the MakeCode Frame- 136 one and three orders of magnitude slower than V8 (see Sec- 191 work.1 MakeCode enables creation of custom programming 137 tion 4), affecting responsiveness and battery life. Even more 192 experiences for MCU-based devices. Each MakeCode experi- 138 importantly, due to the representation of objects in memory 193 ence (we often call them editors, though they also bundle a 139 as dynamic key-value mappings, the memory footprint can 194 simulator, APIs, tutorials, documentation, etc.) targets pro- 140 be several times that of an equivalent C program. This can 195 gramming of a specific device or device class via STS. [7] 141 severely limit the applications that can be deployed on low- 196 Most MakeCode editors are deployed primarily as web 142 memory devices such as the micro:bit (16 kB RAM) and CPX 197 apps, including a full-featured text editor for developing STS 143 (32 kB RAM). 198 programs based on Monaco (the editor component of Visual 144 199 1.1 Static TypeScript Studio Code), as well as a graphical programming 145 200 based on Google’s Blockly framework (STS metadata in com- 146 As an alternative to embedded interpreters, we present Static 201 ments defines the mapping from STS APIs to Blockly and 147 TypeScript (STS), a syntactic subset of TypeScript,[3] sup- 202 MakeCode translates between Blockly and STS). 148 ported by a compiler (written in TypeScript) that generates 203 The MakeCode editors, including the primary coding ex- 149 machine code that runs efficiently on MCUs in the target 204 periences for BBC micro:bit and for Adafruit CPX,2 have 150 RAM range of 16-256kB. The design of STS and its compiler 205 been used by millions of students and teachers worldwide 151 and supporting runtime were dictated primarily by the above 206 to date. 152 three challenges. In particular: 207 STS supports the concept of a package, a collection of STS, 153 • STS eliminates most of the “bad parts” of JavaScript; 208 C++ and assembly files, that also can list other packages as 154 following StrongScript [14], STS uses nominal typing 209 dependencies. This capability has been used by third parties 155 for statically declared classes and supports efficient 210 to extend the MakeCode editors, mainly to accommodate 156 compilation of classes using classic techniques for v- 211 hardware peripherals for various boards.3 Notably, most 157 tables. 212 of the packages avoid pitfalls of unsafe C/C++ completely 158 • the STS toolchain runs offline, once loaded into aweb 213 and are authored solely in STS, due to the efficiency of the 159 browser, without the need for a C/C++ compiler – the 214 STS compiler and the availability of low-level STS APIs for 160 toolchain, implemented in TypeScript, compiles STS 215 161 to Thumb machine code and links this code against 216 1See https://makecode.com. The framework, along with many editors, is 162 217 a pre-compiled C++ runtime in the browser, which open source under MIT license, see https://github.com/microsoft/pxt. 163 is often the only available execution environment in 2See https://makecode.microbit.org and https://makecode.adafruit.com. 218 164 schools. 3For example for micro:bit, see https://makecode.microbit.org/extensions 219 165 2 220 Static TypeScript MPLR 2019, Under submission,

221 276 222 277 223 278 224 279 225 280 226 281 227 282 228 283 229 284 230 285 231 286 232 287 233 288 234 289 235 290 236 291 237 292 238 293 239 294 240 295 241 296 242 297 243 Figure 3. MakeCode Arcade editor. The left pane is the simulator for the arcade device; the middle pane is the categories of 298 244 APIs available in the editor; the right pane is the Monaco editor with STS user code for a platformer game (https://makecode. 299 245 com/85409-23773-98992-33605). 300 246 The toggle on top is used switch between Blocks and Static TypeScript (labelled JavaScript for marketing reasons). 301 247 302 248 303 accessing hardware via digital/analog pins (GPIO, PWM and added, as needed, which enables better IDE support and error 249 304 servos) and serial protocols (I2C and SPI). checking for larger JavaScript programs. By design, Type- 250 305 Figure 3 shows the MakeCode Arcade editor created for Script provides no type soundness guarantees. Object types 251 306 programming the handheld gaming devices from Figure 2 provide a unification of maps, functions, and classes; struc- 252 307 (in fact, the STS program shown in the editor is the one tural subtyping between object types defines substitutability 253 308 deployed to the three devices - it is a simple platformer and compatibility checks. Type erasure (and minor syntactic 254 309 game). MakeCode Arcade includes a game engine written transformations) yields a raw JavaScript program. 255 310 almost completely in STS, and thus places high requirements STS is a subset of TypeScript that excludes many of the 256 311 on code efficiency to achieve pleasing visual effects athigh highly dynamic parts of JavaScript: the with statement, the 257 312 frame rates. The game engine includes the game loop, stack eval expression, prototype-based inheritance, the this 258 313 of event contexts, physics engine, text and line drawing, pointer outside classes, the arguments keyword, and the 259 314 as well as game-specific frameworks (eg., for platformer apply method. STS retains features such as dynamic maps, 260 315 games), in all about 10,000 lines of STS code, with only the but keeps them separate from the nominal class abstraction. 261 316 most image-blitting primitives implemented in C++. Such restrictions are acceptable because most beginners’ 262 317 The games built with Arcade run either in the web browser programs are quite simple, and there are very few existing 263 318 (on desktop or mobile), or on various models of dedicated JavaScript or TypeScript libraries in the embedded space, so 264 319 hardware featuring a 160x120 pixel 16 color screen and an the chance of these using JavaScript features such as monkey 265 320 MCU running at around 100MHz with around 100kB of RAM. patching is small. 266 321 The primary contribution of this paper is a description of Our goal is not to grow STS to the point of supporting all 267 322 a widely deployed system and ways in which it addresses of TypeScript. Instead, we follow the pragmatic approach of 268 323 the classroom-specific problems listed above. adding features useful in embedded context, as guided by 269 324 user feedback. 270 325 In contrast to TypeScript, where all object types are bags 271 2 Static TypeScript (STS) 326 of properties, STS has at runtime four kinds of unrelated 272 TypeScript [3] is a gradually-typed [18] superset of the Java- 327 object types: 273 Script language. This means that every JavaScript program 328 274 is a TypeScript program and that types can be optionally 329 275 3 330 MPLR 2019, Under submission, Thomas Ball, Peli de Halleux, and Michał Moskal

331 1. a dynamic map type has named (string-indexed) prop- STS supports calling functions from C++ to STS and vice 386 332 erties that can hold values of any type; versa. To simplify this process, STS uses a simple code- 387 333 2.a function (closure) type; generation scheme, where a special comment on a C++ func- 388 334 3. a class type describes instances of a class, which are tion indicates that it should be exported to STS (//% ), as 389 335 treated nominally, via an efficient runtime subtype well as to Blockly (//% block). A build step parses the 390 336 check on each field/method access, as described below; C++ code in search of these comments, it also collects the 391 337 4. an array (collection) type. prototypes (signatures) of these functions, so appropriate 392 338 conversions can be generated when calling them. For exam- 393 339 In this sense, STS is much closer to spirit to Java and C# in ple: 394 340 their treatment of types as “protectors of abstractions”, unlike // source C++ code: 395 341 JavaScript which allows a much more freeform treatment of control { 396 342 an object’s role. As discussed in Section 3.4, runtime type /** Register an event handler */ 397 343 tags are used to distinguish the different kinds of built-in //% block 398 void onEvent(int eventType, Action handler) { 344 object types listed above (as well as primitives like boxed 399 345 // arrange for pxt::runAction0(handler) 400 numbers and strings). // to be called when eventType is triggered 346 As in TypeScript, type casts do not generate any code, and 401 347 } 402 thus always succeed. Instead, STS protects the nominal class } 348 abstraction at the point of field/method access. Just as x. 403 349 f causes a runtime error in JavaScript when x == null, // generated TypeScript: 404 350 executing (x as T).f will cause a runtime error in STS declare namespace control { 405 351 if T is a class with field f, and the dynamic type of x is not /** Register an event handler */ 406 //% block shim=control::onEvent 352 a nominal subtype of T. If T is an interface, any, or some 407 function onEvent(eventType: number, 353 complex type (eg., union or intersection), then the field will 408 354 handler: () => void): void; 409 be looked up by name regardless of dynamic type of x. } 355 Other abstractions are also protected at runtime, as in 410 356 JavaScript: for example, making a function call on something The C++ and function names are mapped di- 411 357 that is not of function type. Currently, it is an error to dy- rectly to their STS equivalents and the documentation com- 412 358 //% block 413 namically add a new property to any type other than the ment is copied verbatim. The comment , which 359 map type. Restrictions to the dynamic JavaScript semantics indicates that the function should be exposed as a Blockly 414 360 may be lifted in the future, depending on user feedback. To graphical block is also copied. There are many other possi- 415 361 date, these restrictions have generated no concerns among ble comments controlling the look and feel of the graphical 416 362 shim 417 our user community (both educators and developers). equivalent. The STS function also gets an additional 363 STS primitive types are treated according to JavaScript annotation, which indicates the name of the corresponding 418 364 semantics. In particulars, all numbers are logically IEEE 64 C++ function (in some cases the STS declaration is written 419 365 bit floating point, but 31-bit signed tagged integers are used by hand, and the name of the C++ function does not have to 420 366 where possible for performance. Implementation of - match the STS function). 421 367 tors, like addition or comparison, branch on the dynamic The C++ types are mapped to STS types. Since in STS all 422 368 4 int 423 types of values to follow JavaScript semantics, with the fast numbers are conceptually doubles , the C++ is mapped 369 number 424 integer path hand-implemented in assembly. to STS . When the C++ function is called, the STS 370 The design goal of STS to be both syntactic and semantic compiler makes sure to convert the value passed to an integer. 425 371 uint16_t 426 subset of TypeScript, by which we that if a program Other C++ integer types (eg., ) are supported in 372 Action 427 compiles successfully in STS, it will have the same semantics a similar way. The C++ type represents a reference 373 pxt::runAction0 428 as the TypeScript program, or it will crash (in the circum- to a closure, which can be called with 374 () 429 stances listed above). . 375 While class methods are not supported directly, regular 430 376 functions can be used to implement objects. For example: 431 377 2.1 C++ Interop // C++ 432 378 433 STS programs running on MCUs are supported by a runtime typedef BoxedBuffer *Buffer; 379 namespace BufferMethods { 434 implemented in C++, C, and assembly. The runtime imple- 380 //% 435 ments language primitives (operators, collections, support 381 int getByte(Buffer self, uint32_t i) { 436 for classes, dynamic maps, etc.), as well as allows access to the 382 return i < self->len ? self->data[i] : 0; 437 underlying device hardware. The runtime can be extended 383 4 438 by packages as explained in Section 2.2. There is some support to store numbers as integers of various sizes to save 384 memory, but it only applies to storage, not intermediate computations. 439 385 4 440 Static TypeScript MPLR 2019, Under submission,

441 } conflicts due to lack of namespace enforcement is low, andit 496 442 } allows for fitting new APIs naturally in existing namespaces. 497 443 498 444 // TypeScript 3 Compiler and Runtime 499 445 interface Buffer { 500 //% shim=BufferMethods::getByte The STS compiler and toolchain (linker, etc.) are written 446 501 getByte(i: number); solely in TypeScript. There currently is no support for sepa- 447 502 } rate compilation of STS files: STS is a whole program com- 448 piler (with support for caching precompiled packages, which 503 449 All functions in BufferMethods namespace must take 504 includes the C++ runtime). The STS device runtime is mainly 450 Buffer as first argument and are exposed as members of 505 written in C++ and includes a bespoke garbage collector. As 451 the Buffer class on the STS side. When such members 506 mentioned before, it is not a goal to generalize STS to support 452 are called, the STS compiler will make sure the first argu- 507 full JavaScript. 453 ment is non-null and a subtype of Foo. These interfaces 508 454 are conceptually better understood as non-extensible classes 3.1 Compiler Toolchain 509 455 with opaque representation, that is they cannot be imple- 510 The source TypeScript program is processed by the regular 456 mented by regular classes, and member resolution is static. 511 TypeScript compiler to perform syntactic and semantic anal- 457 The interface syntax was chosen because TypeScript allows 512 ysis, including type checking. This produces type-annotated 458 extending interfaces with new methods across files. We allow 513 abstract syntax trees (ASTs) that are then checked for con- 459 such additions, provided the new methods have the shim 514 structs outside of STS (eval, arguments, etc.). The AST 460 =... annotation described above, or alternatively an analog 515 is then transformed into a custom intermediate represen- 461 annotation specifying a TypeScript, not C++, replacement 516 tation (IR) with language constructs desugared to calls to 462 function. This usually only concerns authors of advanced 517 runtime functions. This IR is later transformed into one of 463 C++ packages (see below). 518 three forms: 464 519 465 2.2 Packages 1. continuation passing JavaScript for execution within 520 466 STS supports multiple input files. It also supports Type- the browser (inside of a separate “simulator” IFrame) 521 467 Script namespace syntax for scoping. Files do not intro- 2. ARM Thumb machine code, linked with pre-compiled 522 468 duce scopes and JavaScript modules are currently not sup- C++ runtime and executed on bare-metal hardware or 523 469 ported. The input files come from one or more packages. inside of an operating system 524 470 There is one main package, which can list other packages as 3. bytecode for a custom VM interpreter, meant for plat- 525 471 dependencies, which can in turn list further dependencies. forms where where dynamic code loading/generation 526 472 There are multiple ways of specifying versions of packages, is impossible (like or iOS) 527 473 including built-in packages, file paths when operating from Both the ARM Thumb and the custom bytecode are gener- 528 474 command line, and URLs of GitHub repositories. There can ated in form of assembly code, and translated to machine 529 475 be only one version for each package (otherwise we would code by a custom assembler. In this section we focus on the 530 476 likely get redefinition errors). native 32-bit ARM Thumb target (though we compare the 531 477 MakeCode editor builders will generally decide to include performance of the VM, see Section 4.2). 532 478 a number of built-in packages, which ship with the editor. The regular TypeScript compiler, the STS code generators, 533 479 These can be further extended with packages coming from assembler, and linker are all implemented in TypeScript and 534 480 GitHub. The MakeCode web app has support for authoring run both in the web browser and on command line. 535 481 and publishing packages to GitHub. Because namespaces are 536 482 independent of files, it is easy for packages to extend existing 3.2 Linking 537 483 namespaces. Currently, STS does not enforce any discipline The generated machine code is linked to a pre-compiled 538 484 here. C++ runtime. C++ compilation takes place in a cloud service, 539 485 MakeCode comes with a number of packages that editor with the resulting runtime cached both in the cloud content 540 486 builders can include (common packages). They provide sup- delivery network, and in the browser (caching is based on a 541 487 port for various hardware features (pins, buttons, buzzer, strong hash of all the C++ sources, options etc.). Typically, 542 488 screen, etc.), as well as higher-level concepts like sprite- the C++ runtime does not change while the user is working 543 489 handling game . Some of these packages come in vari- on their program, allowing for offline operation.5 544 490 ants, sharing interface but with different implementations 545 491 (eg., drivers for different screens). 546 5While it could be possible to compile the C++ code locally using 492 547 External (GitHub) packages typically provide support for or similar technologies, the compilation toolchain, header files, and libraries 493 other hardware peripherals. Users typically do not use too would likely require tens of megabytes of download straining offline storage 548 494 many external packages at once, so we feel the risk of name in the browser. 549 495 5 550 MPLR 2019, Under submission, Thomas Ball, Peli de Halleux, and Michał Moskal

551 The generated machine code is generally appended at the pointer to the interface table and interface hash (see below), 606 552 end of the compiled runtime. Depending on the file format followed by pointers to methods. 607 553 (in particular for ELF) of the target device, the resulting file The first four method slots are pre-allocated for runtime 608 554 needs to be patched slightly. To generate code, the assembler and GC-related functions related to object marking and de- 609 555 needs to know addresses of runtime functions. These are allocation, which follow C++ calling conventions. The re- 610 556 extracted from the runtime binary. maining functions follow STS calling convention. First of 611 557 It is also possible for packages to include C++ code that ex- STS method pointers is toString() if defined7. Next there 612 558 tends the runtime. Each such combination of C++-containing are methods from the base class if any, followed by methods 613 559 packages have to compiled and cached separately. This is of the current class. If a method is never called with dynamic 614 560 done transparently by the cloud service, and the potential dispatch (for example, because it is never overridden, or 615 561 exponential number of combinations have so far has not never called), it is not included in the table (for example, in 616 562 proven to be a problem in practice, as students do not use the MakeCode Arcade game engine only 13% of methods use 617 563 many external packages at once, and our experience has dynamic dispatch). 618 564 shown that few of the packages written for MakeCode uti- The interface table contains descriptors for all fields and 619 565 lize native C++. methods of the class in order of definition. Each descriptor 620 566 contains the member index (assigned globally to all member 621 567 3.3 Representation of values names in the program) and a function pointer. Field descrip- 622 568 The compiled programs use a regular C++ stack for local tors also contain the offset of the field within the class. The 623 569 variables, return addresses, and temporary computation. Val- descriptors are used in member lookup and also when iter- 624 570 ues stored on the stack and inside of objects generally use ating over object properties (eg., using Object.keys()). 625 571 a uniform, type-independent representation, that is always The descriptors are indexed by a simple hash table, where 626 572 32-bits long. the keys are computed by multiplying the member index by 627 573 The number n such that −230 ≤ n < 230 is represented the interface hash multiplier (computed per-class at compile 628 574 by the 32-bit value 2n + 1, so that the lowest-order bit is set. time) fetched from the virtual table. 629 575 Other numbers, are represented as regular objects, specifi- STS employs three methods of member lookup: 630 576 cally boxed 64 bit doubles. • when the receiver is statically of a class type C, the 631 577 Special constants, such as true, false, and null are compiler generates a runtime subtype check against 632 578 represented by specific 32-bit values, such that the lowest- C (which may fail, as the dynamic type may not be a 633 579 order bit is cleared, and the next to lowest-order is set (there subtype of C) and then uses a direct offset into either 634 580 is encoding space for 230 such values, but only a few are the virtual table for methods, or into the object itself 635 581 currently used). The JavaScript undefined is represented for field accesses; 636 582 by 0 (in other words, the C++ NULL), since it is the default • otherwise, if the receiver is dynamically of a class type 637 583 value for memory allocation and also for JavaScript fields (statically it could be an interface, any, or some more 638 584 and variables. complex structural type), the member descriptor is 639 585 All other values are represented as pointers to either con- looked up in the interface table using the member 640 586 stant flash-allocated objects or objects allocated on the heap. index and the interface hash key; 641 587 All pointers are word-aligned (32 bit), so the lower-order • otherwise, it is a type-specific function for objects 642 588 two bits are cleared. For objects created by the STS runtime implemented in the C++ runtime, in particular the 643 589 the first word is occupied by a pointer to a virtual table (see dynamic map used when compiling object literals or 644 590 Section 3.4).6 new Object(). 645 591 646 Section 4.3 compares the performance of the three methods. 592 3.4 Virtual table layout 647 593 648 STS classes are compiled similarly to Java or C# with single 3.5 Arithmetic operators 594 649 inheritance and static memory layout. The first word of an For some arithmetic operators, the fast integer path is written 595 650 object points to a virtual table, and subsequent words are in assembly for speed. For example, here is the implementa- 596 651 allocated for fields (with fields of the base class, if any, com- tion of arithmetic +: 597 652 ing first). The virtual table contains object size, a statically 598 _numops_adds: 653 allocated class number (this includes the tags for the builtin 599 ands r2, r0, r1 ; r2 := r0 & r1 654 types supported by STS, as well as user-defined types), a ands r2, #1 ; r2 &= 1 600 655 beq .boxed ; last bit clear? 601 656 subs r2, r1, #1 ; r2 := r1 - 1 602 6An alternative representation, where numbers are represented as 2n and 657 603 pointers as p + 1 would be problematic since ARM Thumb only has word- 7toString() and valueOf() play special role in JavaScript runtime 658 604 loading instructions with offsets that are multiples of 4. conversion semantic. valueOf() is currently not supported. 659 605 6 660 Static TypeScript MPLR 2019, Under submission,

661 adds r2, r0, r2 ; r2 := r0 + r2 string (they are both 12 bytes). This makes string concatena- 716 662 bvs .boxed ; overflow? tion, which is fairly common in JavaScript, a constant time 717 663 mov r0, r2 ; r0 := r2 operation. This is an established optimization technique [4], 718 664 bx lr ; return used in all major JavaScript engines. 719 .boxed: 665 The first three kind of strings are emitted by the compiler 720 mov r4, lr ; save return address 666 in flash, while all four can be constructed dynamically. 721 bl numops::adds ; call into runtime 667 722 bx r4 ; return 668 Closures are represented as a pointer to static code of the 723 function followed by read-only locals captured from outer 669 Other operators with specialized implementations are -, 724 scopes (top-level globals are not included). If a variable can 670 |, &, ^ and conversion to integer (used when calling C++ 725 be written after capture, it is transformed (in all scopes) into 671 runtime functions). This specialized assembly gives about 726 a pointer to a heap object that holds its value. A specific reg- 672 2x speedup compared to always calling the C++ function, 727 ister is statically allocated to hold the pointer to the closure 673 which we do for example for multiplication (see Section 4.3). 728 object, during closure execution. 674 Other operators are implemented in C++ as functions taking 729 Functions that are used as values, but do not capture any- 675 abstract values with branches for integers, boxed numbers, 730 thing, are allocated statically in the above format in the flash. 676 and other types. 731 677 Dynamic maps are simply two vectors, one of keys and 732 678 3.6 Representation of built-in objects one of values. Lookup is linear. 733 679 Arrays are similar to C++ standard vectors, but with more 734 680 conservative growing strategy. Sparse arrays are not sup- 3.7 Peep-hole optimizations 735 681 ported. Simple array accesses are implemented directly in After code generation, the resulting assembly is run through 736 682 assembly, including range checking. Cases that involve con- a simple peep-hole optimizer. The particular instructions 737 683 versions of indices or growing the array are handled by the sequences to be optimized were identified by a script run 738 684 C++ runtime. on large piece of generated code. The script would identify 739 685 Buffers are simply continuous chunks of memory, with 2-3 instruction sequences and sort them by number of oc- 740 686 assembly byte accessors, and a number of additional utility currence. The top ones were then inspected by hand to see 741 687 methods implemented in C++. if they could be simplified. Typical peep-hole rules are as 742 688 follows: 743 689 Strings come in four different representations, each with 744 push {lr}; push {X, ...} -> push {lr, X, ...} 690 a v-table pointer at the beginning. All strings are currently 745 pop {X, ...}; pop {pc} -> pop {X, ..., pc} 691 limited to 65,535 bytes. ASCII strings (where all characters push {rX}; pop {rX} -> nothing 746 692 are in range 0-127) are represented using a length prefix push {rX}; pop {rY} -> mov rY, rX 747 693 followed by NUL-terminated character data (there can still pop {rX}; push {rX} -> ldr rX, [sp, #0] 748 694 be NUL characters inside, but the final NUL is added for push {rX}; ldr rX, [sp, #0] -> push {rX} 749 695 convenience of C++ functions). Short Unicode strings are push {rX}; movs rY, #V; pop {rX} -> 750 696 represented similarly, using UTF-8 and length expressed in movs rY, #V (when X != Y) 751 697 bytes, but with a different v-table. The indexing methods For branches, the compiler always generates a fully gen- 752 698 decode the UTF-8 on the fly. eral, but cumbersome instruction sequence, which is then 753 699 Longer Unicode strings are represented as the length of simplified by the peep-hole optimizer, eg. when X is in short 754 700 the string in characters, the size of the string in bytes, and jump range: 755 701 a pointer to data, that contains the actual character data in 756 beq .skip; b X; .skip: -> bne X 702 UTF-8 (again, NUL-terminated), and a skip list that contains 757 703 a byte offset for every character offset divisible by 16. Index- In fact, the b itself also has limited range and sometimes 758 704 ing methods start at closest preceding skip list offset and needs to be done with the bl. See Section 4.3 for evaluation. 759 705 decode UTF-8 data from there. This encoding is used instead 760 706 of standard UTF-16 to save space. It also ensures that all 3.8 Garbage collector 761 707 strings (except for temporary ones described below) contain The STS runtime employs a custom, simple, precise, non- 762 708 a valid UTF-8 NUL-terminated string, making it easier for the compacting, mark-and-sweep garbage collector. The runtime 763 709 C++ runtime functions to handle them without additional keeps track of STS parts of stacks of all threads of execu- 764 710 conversions. tion. STS stacks contain only values in the uniform format 765 711 Finally, cons-strings are allocated when two long strings described above. During collection, pointers in these stacks, 766 712 are concatenated together. They consist of two pointers to as well as in global variables, and in any locations registered 767 713 strings (which themselves can be cons-strings). When a cons- dynamically by the C++ runtime are considered live, and 768 714 string is indexed, it is transformed in-place into a skip-list recursively scanned for further live pointers. All objects start 769 715 7 770 MPLR 2019, Under submission, Thomas Ball, Peli de Halleux, and Michał Moskal

771 with a pointer to v-table, which has a method to determine • a pure C implementation, compiled with gcc, as a base- 826 772 the size of an object. There is no additional space overhead line for comparison; 827 773 incurred by the GC: the lowest bit of the v-table pointer is • Duktape 2.3, an embedded JavaScript interpreter; 828 774 used to mark reachable objects, and the size method8 is used • IoT.js 1.0, the JerryScript embedded JavaScript inter- 829 775 to walk the heap in the sweep phase. preter; 830 776 A per-thread object records the STS stack pointer when • Python 3.6, the regular, full-fledged Python inter- 831 777 an STS function is first called (there can be a C++ stack preter; 832 778 underneath it), and the stack pointer is stored just before • Node.js 11.0, which includes V8, a state of the art JIT 833 779 any C++ function is called. If a C++ function in turn calls engine; 834 780 STS again, which is quite , a linked list segment is added • MicroPython 1.9.4, an embedded Python interpreter. 835 781 to the per-thread object. These per-thread objects are kept We use three different commercially available ARM-based 836 782 by our thread scheduler. systems for testing: 837 783 838 To limit heap fragmentation, we trigger collections more • GHI Brainpad, which uses the STM32F401RE, an 784 839 often than strictly necessary. In particular, after every col- ARM Cortex-M4F core with 96kB of RAM and 512kB 785 840 lection we mark the first quarter of free memory as good of flash running at 84MHz; 786 841 for allocation. When an object cannot be allocated there, a • Adafruit Pybadge, using the ATSAMD51G19, an ARM 787 842 new collection is triggered, likely moving the location of the Cortex-M4F core with 192kB of RAM and 512kB of 788 843 “good” quarter. We then allocate the object regardless if it fits flash running at 120MHz; 789 844 in the (new) good part. This risks triggering a bit too many • Raspberry Pi Zero, using the BCM2835, an ARM11 790 845 collections, but we have found it to limit heap fragmentation, core with 512MB of RAM running at constant 700MHz 791 846 which was critical when a larger object (eg. a screen buffer) (dynamic CPU frequency scaling was disabled). 792 needed to be allocated later in course of the execution. Addi- 847 The Pi can run Node.js, which places it outside of our 793 tionally, given the relatively small size of memory and fast 848 target memory range, but we used it for reference compari- 794 processor speed of the target MCUs, the collections are quite 849 son of performance against V8. Generally, memory access is 795 cheap. 850 much slower on the ARM11 core than on M4F cores (where 796 The C++ code can either allocate regular GC-able objects 851 the entire RAM is similar in performance to L1 cache). The 797 to be used on the STS side, or use traditional malloc()/ 852 FPU on the M4F cores is only single-precision, so it was not 798 free() for other purposes. Such malloc-blocks use a special 853 used in benchmarks. The FPU on the ARM11 core was used. 799 encoding including the size in place of the v-table pointer and 854 Duktape was compiled with default options on the BCM. 800 are never collected, until they are freed. This heap sharing 855 On the STM the default profile immediately runs out of 801 lets us avoid splitting the memory into C++ and GC heaps 856 memory, so we used the low memory profile (we additionally 802 upfront. We still keep a small C++ heap to be used inside of 857 enabled fast integer option). We used the official ARMv6 803 interrupt service routines, as the GC cannot be used there. 858 Node.js binary. We used Python that comes with PiCore 804 Arguments to C++ functions called from STS are, in addi- 859 Linux. We used IoT.js compiled by one of its developers (the 805 tion to registers, also placed on the STS stack, so that they 860 heap seems hard-set to 512kB, hence the OOM in results). 806 are not GC-ed, while the function is running. Any allocation 861 We used the official MicroPython binary for STM32. 807 can trigger a collection, so intermediate objects allocated in 862 808 863 a C++ function, have to be temporarily registered with the 4.1 Benchmarks 809 GC, before they are returned to STS. 864 810 We used a number of benchmarks, described below. We 865 811 Reference-counting We have previously used reference tried to use equivalent code in TypeScript/JavaScript and 866 812 counting for memory management. We generally incremented Python, as we are comparing run times and not program- 867 9 813 reference counts of all arguments to C++ functions for the ming languages. The C programs by necessity are not di- 868 814 duration of their execution. This incurred dramatic time over- rectly feature-comparable, as they lack memory safety and 869 815 heads compared to the GC described above (see Figure 7). use a static integer representation. 870 816 Richards: Martin Richard’s benchmark simulates an oper- 871 817 4 Evaluation ating system’s scheduler queue. We took JavaScript sources 872 818 We evaluate STS compiled machine code (as well as the STS from the JetStream benchmark suite [11] and translated 873 819 virtual machine backend, referenced as VM) on a number them to TypeScript by replacing prototype initialization with 874 820 875 of well-known small performance-intensive benchmarks, 9 821 More specifically, Python fann was using a different algorithm than the C 876 comparing the performance against: or JavaScript versions. We changed it to use the same algorithm and direct 822 877 array accesses. Python binary was changed to use classes and not tuples. 823 We made these alterations because we wanted to measure array accesses 878 824 8The indirect call overhead is only 4-8 cycles on a Cortex-M4. and class implementation. We did not change richards or nbody. 879 825 8 880 Static TypeScript MPLR 2019, Under submission,

881 BCM2835 at 700MHz (Pi Zero) STM32F401RE at 84MHz 936 882 [ms] [times slower than GCC] [ms] [times slower than GCC] 937 883 Benchmark GCC Duktape iotjs Python Node STS VM GCC Duktape µPython STS 938 884 binary(7) 25 320 40 2.4 939 885 binary(8) 20 40 35 75 8.0 2.5 8 74 OOM OOM 2.3 940 886 binary(9) 40 38 43 78 4.5 2.3 9 149 OOM OOM 2.1 941 887 binary(10) 90 46 67 94 2.8 2.2 11 394 OOM OOM OOM 942 888 binary(11) 180 50 OOM 91 1.7 2.2 12 943 889 richards(10k) 10 680 490 1210 68 22 113 90 532 287 14 944 890 richards(100k) 120 591 400 1048 9 16 94 945 891 fann(8) 20 215 284 415 12 23 136 119 198 166 21 946 892 fann(9) 230 190 262 402 3 20 124 947 893 nbody(1k) 3 227 247 170 113 97 213 511 11 6.7 3.7 948 894 nbody(10k) 30 230 246 148 14 85 168 949 895 nbody(100k) 310 226 239 143 4 81 167 950 896 951 897 Figure 4. Benchmark execution times on BCM and STM32 MCUs. VM is the STS virtual machine. OOM = Out of memory. 952 898 953 899 954 equivalent class code, adding type information where neces- On the benchmarks heavy on property access (binary and 900 955 sary. The benchmark uses a number of classes with derived richards) STS is well over an order of magnitude faster than 901 956 methods for different scheduler tasks, which is meant to the interpreters, and less than two times slower than Node.js. 902 957 test object property access performance. We also used the C Moreover, the interpreters run out of memory much quicker 903 958 and Python versions of this benchmark, which use a single than STS on binary. For straight combinatorial computation 904 959 structure for the tasks, together with a switch statement (fann), STS is about an order of magnitude faster than the 905 960 and a function pointer. We developed equivalent TypeScript interpreters but several times slower than Node.js (which 906 961 code (labeled richards2), but the class-based version (richards) employs quite an advanced optimizer in V8). In floating point 907 962 seems to better reflect typical JavaScript coding patterns. The computation (nbody), STS is still significantly faster than 908 963 benchmark parameter is the number of scheduling iterations interpreters, but 20x slower than Node.js, which utilizes the 909 964 in thousands. FPU much better. 910 965 Generally, the longer running benchmarks are more in- 911 Binary trees: this program comes from The Computer Lan- 966 dicative of performance. We have also included the results 912 guage Benchmarks Game [10]. Within a loop, it allocates a 967 for small iteration counts to be able to compare the BCM 913 full binary tree of a given depth, performs a simple computa- 968 and STM directly on the same programs; on such programs 914 tion on it, and then deallocates it. The benchmark is designed 969 the JIT warmup time is significant in Node.js results. 915 to test memory allocation subsystem performance. The C, 970 The STS VM is about 5-6x slower than ARM Thumb STS, 916 Python, and TypeScript versions come from the Benchmarks 971 except for the floating-pointing-heavy nbody benchmark, 917 Game. The Python version was modified to use a class, in 972 where most of the work is performed in C++ runtime func- 918 order to match the TypeScript version closer. 973 tions. Yet, the VM is still much faster than the interpreters, 919 974 Fannkuch redux: this is another program from the Bench- suggesting that the static memory layout of classes con- 920 975 marks Game that counts the number of specific permutations tribute significantly to overall STS performance. 921 976 and tests array access and integer performance. The Python The results on STM32 and SAMD have very small variance 922 977 program was modified to use the same algorithm with ex- (around 0.1%). On BCM we ran benchmarks 10 times and 923 978 plicit, single-element array accesses, the way that the C and chosen the fastest results. 924 979 TypeScript versions do. 925 980 4.3 Performance of member access 926 N-body: this program is a physics simulation of planets in 981 927 the solar system, also from the Benchmarks Game, to mea- Figure 5 shows measurements of various forms (see Sec- 982 928 sure floating point performance. tion 3.4) of member access in isolation. The first four rows 983 929 show the number of cycles needed for a field access on 984 930 4.2 STS and VM performance this (just an indexed memory lookup), an access with dy- 985 931 Figure 4 compares STS performance against various other namic class subtype check, an access resolved via an interface 986 932 systems. We present the run time of a C program directly in lookup, and finally an access on a dynamic map object. 987 933 milliseconds and otherwise as slowdown (number of times) For functions, the figure lists a direct call to a procedure, 988 934 with respect to C. with no type checks, a call into non-virtual class method, a 989 935 9 990 MPLR 2019, Under submission, Thomas Ball, Peli de Halleux, and Michał Moskal

991 Access STM32 SAMD BCM The fourth row shows the slowdown when objects are repre- 1046 992 this.field 6.5 7.5 11.8 sented as dynamic maps with linear lookup, in benchmarks 1047 993 (x as Class).field 23.2 24.1 35.2 that allow for enabling this in a non-invasive way. The slow- 1048 994 (x as Iface).field 52.2 51.6 76.0 downs for field accesses are rather dramatic, on the order of 1049 995 ({ ... }).field 136.9 136.8 156.3 2x. 1050 996 staticFunction(x) 14.1 15.1 20.6 The next row show effects of dropping dynamic subtype 1051 997 this.nonVirtual() 14.2 15.1 20.7 checks when accessing member of this (as its type was 1052 998 this.method() 28.3 29.2 41.7 already checked when entering the method). Disabling this 1053 999 (x as Class).nonVirt() 34.4 35.2 45.2 optimization would allow us to fall back to dynamic lookup 1054 1000 (x as Class).method() 34.5 35.2 49.8 also in methods. 1055 1001 (x as Iface).method() 57.7 55.4 85.4 The next line compares performance of the simple mark- 1056 1002 and-sweep garbage collector to reference counting (see Sec- 1057 1003 Figure 5. Measured cycles taken by various forms of member tion 3.8). 1058 1004 access on the three MCUs. Each operation was run a million Next, we show impact of folding argument conversions 1059 1005 times and average time consumed was converted to cycles. into helper functions. It is very small, but it saves 4% of 1060 1006 code size of Arcade. Then, we show the effects of the peep- 1061 1007 Operation STM32 SAMD BCM hole optimizer, which saves 14% of code size on Arcade, and 1062 1008 x = 1 1.1 1.8 6.3 improves performance by a few percent. 1063 1009 x = y 2.0 1.9 4.6 The final line measures the impact of all dynamic subtype 1064 1010 x++ 16.2 16.2 20.6 checks (enabled in Baseline) that we employ to protect the 1065 1011 x += y 16.2 16.0 22.1 class abstraction. This is the price we pay for following Type- 1066 1012 x *= y 41.4 41.6 58.7 Script semantics, where casts always succeed, but member 1067 1013 x = Math.idiv(x, y) 56.0 56.1 162.0 access may fail. 1068 1014 x = {} (allocation) 212.5 206.0 294.5 1069 1015 1070 x += y (double) 414.0 406.5 413.5 4.4 Performance of language primitives 1016 x *= y (double) 412.0 402.0 442.0 1071 Figure 6 shows cycle measurements of various language 1017 x /= y (double) 968.5 963.0 443.5 1072 1018 primitives. As with Figure 5, we timed one million operations 1073 of each type and subtracted the time consumed by an empty 1019 Figure 6. Measured cycles taken by various language primi- 1074 loop. The time was then converted to main clock cycles for 1020 tives on the three MCUs. Double math operations include 1075 comparison. 1021 allocation and amortized GC (boxing). 1076 1022 The counts are quite similar, but not the same, for the two 1077 1023 M4 cores—in both cases programs run from flash, which is 1078 1024 call into a virtual class method, and finally an interface call. much slower than RAM, and the MCUs cache it differently. 1079 1025 We also distinguish between calls on this, which do not The BCM numbers are worse, likely because main memory 1080 1026 require a dynamic type check, and other non-interface calls access takes more cycles there, and caches do not always fix 1081 1027 that do. it. This is particularly visible for allocation (which includes 1082 1028 The non-virtual and virtual calls on a variable are dom- amortized cost of GC), which then also dominate the floating 1083 1029 inated by the subtype check, which is performed slightly point operations. Also, there is no hardware integer division 1084 1030 differently, ending up with almost exactly the same perfor- on ARM11. 1085 1031 mance. The interface lookup does not need a type check The integer division and multiplication do not currently 1086 1032 (because the method is looked up in the virtual table of the have a fast path implementation in assembly (due to differ- 1087 1033 object on which it is called), which makes it not that much ences between ARM MCUs), so they are slower than addition, 1088 1034 slower than a regular call. The dynamic map lookup depends subtraction and bit operations. 1089 1035 on the number of fields, but other timings are mostly input- 1090 1036 independent. 4.5 MakeCode Arcade Performance 1091 1037 Figure 7 shows the performance impact of these different 1092 The MakeCode Arcade handhelds, as shown in Figure 2, are 1038 ways of accessing members in a full benchmark. The first 1093 about 100 times faster than the original arcade machines of 1039 row (Baseline) lists run times in the default mode, where 1094 the 1980s. However, the games in 1980s were programmed 1040 classes have Java-like virtual tables, in addition to slower 1095 in assembly, with heavy use of accelerated graphics (sprites 1041 interface tables, and fields of classes are accessed by memory 1096 etc.). Arcade, on the other hand, offers very high-level and 1042 lookup at statically computed offset. 1097 beginner-friendly APIs in a fully managed, garbage-collected 1043 The second and third row show slowdowns incurred when 1098 language. As a result, the more complex games (eg., the one 1044 methods and fields are looked up via the interface tables. 1099 1045 10 1100 Static TypeScript MPLR 2019, Under submission,

1101 Compiler modification richards richards2 bintree nbody 1156 1102 Baseline 1265ms 1172ms 324ms 1869ms 1157 1103 Methods via interface 13% 0% 4% 0% 1158 1104 Methods+fields via interface 102% 34% 39% 10% 1159 1105 Objects as dynamic maps 143% 1160 1106 Subtype checks on this 19% 0% 4% 1% 1161 1107 Reference counting 165% 128% 100% 64% 1162 1108 Inline conversions 0.1% -0.7% 0.0% -0.1% 1163 1109 No peep hole 6% 6% 5% 0.3% 1164 1110 Disable all subtype checks -12% -17% -7% -3% 1165 1111 1166 1112 Figure 7. Increases in run time when disabling certain optimizations on STM32. 1167 1113 1168 1114 from Fig. 3) are quite playable with STS at 30fps; extrapolat- above. StrongScript allows an object of class type to be ex- 1169 1115 ing from our benchmarks, an embedded interpreter would tended with extra properties. STS currently does not allow 1170 1116 run at 1-5fps. such an extension; 1171 1117 We have also evaluated code size on Arcade games. The Hop.js [17] is a static compiler, with advanced type in- 1172 1118 STS compiler generates about 37 bytes of ARM Thumb ma- ference, that translates JavaScript programs to Scheme and 1173 1119 chine code per input (JavaScript) statement, which works then compiles them using Bigloo. Reported performance 1174 1120 out to about 20 bytes per input line of code. Note that unused numbers indicate much better than STS performance on the 1175 1121 parts of code are removed, so the game engine is never fully combinatorial benchmarks (like fann) and somewhat worse 1176 1122 present in the generated binaries. As a comparison, the STS on richards (however, these numbers are from x86 and we 1177 1123 VM, which uses a specialized 16/32 bit encoding achieves estimate from numbers relative to V8; we were unable to run 1178 1124 about twice the code density. tests on RPi ourselves). 1179 1125 The C++ runtime occupies about 125kB of flash, while the SJS [5, 6] is a similar system, except that it is implemented 1180 1126 bootloader and space for internal filesystem take another in Java, and generates C instead of Scheme. Similarly, it 1181 1127 64kB, so there is about 320k left for the program and game seems to exhibit better performance than STS on combina- 1182 1128 assets. In practice, this limits the user application (excluding torial benchmarks, and slightly worse on richards. Neither 1183 1129 game engine) to 5-10k lines of code, which we have not yet of these is suitable for running in a web browser. In general, 1184 1130 found to be a significant limitation. it seems these compilers are able to produce very efficient 1185 1131 code, when they can infer static types, falling back to much 1186 1132 slower runtime path when they cannot. STS, on the other 1187 1133 hand, exhibits rather flat and predictable performance. This 1188 1134 5 Related Work should allow for combining the approaches in future. 1189 1135 Safe TypeScript [13] and StrongScript [14] both shore up There are other embedded interpreters in addition to the 1190 1136 TypeScript’s with soundness guarantees, backed ones we have compared against such as XS7 [21] or Espru- 1191 1137 by runtime checking. Our work is closest to StrongScript, as ino [23]. These seem to have similar performance perfor- 1192 1138 STS uses a nominal interpretation of classes for code gener- mance as the ones we have used. 1193 1139 ation and the STS runtime distinguishes between dynamic 1194 1140 objects, created with { x = ... } syntax in JavaScript, 1195 1141 and class objects, created with new C(...) syntax. 6 Conclusion 1196 1142 1197 STS differs from StrongScript in a variety of ways. First, Static TypeScript (STS) fills an interesting niche in compilers 1143 1198 StrongScript’s system guarantees that in the absence of for embedded systems. Implemented fully in TypeScript it- 1144 1199 downcasts, a variable of concrete class type C is guaran- self, the STS toolchain can run in a web browser and produce 1145 1200 teed to refer to an object of a nominal subtype of C or to ARM (Thumb) machine code for a large subset of TypeScript. 1146 1201 null. STS uses TypeScript’s and checking For efficiency of compiled code, STS relies on a nominal in- 1147 1202 as is, with no modification. In STS, a variable of class type C terpretation of classes. Via MakeCode, STS has been widely 1148 1203 is checked dynamically (upon lookup of a field/method) to deployed across a range of devices with small amounts of 1149 1204 determine if its value is a nominal subtype of class type C. RAM. Evaluation of STS on a set of small benchmarks shows 1150 1205 The check may fail. There is no type checking performed dy- that STS’s generated code is much faster than various em- 1151 1206 namically on arguments. Second, STS erases casts, whereas bedded intepreters for scripting languages. The largest STS 1152 1207 StrongScript checks them at runtime; instead, STS performs application to date is MakeCode Arcade, whose game engine 1153 1208 checks at a dereference (member/field lookup), as noted comprises over 10,000 lines of STS. 1154 1209 1155 11 1210 MPLR 2019, Under submission, Thomas Ball, Peli de Halleux, and Michał Moskal

1211 Acknowledgments We would like to thank current and [16] Sue Sentance, Jane Waite, Steve Hodges, Emily MacLeod, and Lucy 1266 1212 former members of the MakeCode team: Abhijith Chatra, Yeomans. 2017. "Creating Cool Stuff": Pupils’ Experience of the BBC 1267 1213 Sam El-Husseini, Caitlin Hennessy, Steve Hodges, Guillaume Micro:Bit. In Proceedings of the 2017 ACM SIGCSE Technical Symposium 1268 on Computer Science Education (SIGCSE ’17). ACM, 531–536. https: 1214 1269 Jenkins, Shannon Kao, Richard Knoll, Jacqueline Russell, and //doi.org/10.1145/3017680.3017749 1215 Daryl Zuniga. We also express our gratitude to James Devine [17] Manuel Serrano. 2018. JavaScript AOT compilation. In Proceedings 1270 1216 and Joe Finney at Lancaster University, the authors of CO- of the 14th ACM SIGPLAN International Symposium on Dynamic Lan- 1271 1217 DAL used as a layer of our C++ runtime. Finally, we would guages. ACM, 50–63. 1272 1218 like to thank the anonymous reviewers for their helpful [18] Jeremy G. Siek and Walid Taha. 2007. for Objects. In 1273 ECOOP 2007 - Object-Oriented Programming, 21st European Conference, 1219 1274 comments, and Edd Barrett for his help in getting the final Berlin, Germany, July 30 - August 3, 2007, Proceedings. 2–27. https: 1220 version of this paper ready. //doi.org/10.1007/978-3-540-73589-2_2 1275 1221 [19] Artifex Software. 2018. MuJS. https://mujs.com/. 1276 1222 [20] Cesanta Software. 2018. mJS. https://github.com/cesanta/mjs. 1277 1223 References [21] Patrick Soquet. 2017. XS7. https://www.moddable.com/XS7-TC-39. 1278 [22] Sami Vaarala. 2018. DukTape. https://duktape.org/. 1224 [1] Jonny Austin, Howard Baker, Thomas Ball, James Devine, Joe Finney, 1279 Peli de Halleux, Steve Hodges, Michal Moskal, and Gareth Stockdale. [23] Gordon Williams. 2017. Making Things Smart: Easy Embedded 1225 1280 2019. The BBC micro:bit – from the UK to the World. Commun. ACM JavaScript Programming for Making Everyday Objects into Intelligent 1226 (to appear) (2019). Machines. Maker Media. 1281 1227 [2] BBC. 2017. BBC micro:bit celebrates huge impact in first year, with 1282 1228 90% of students saying it helped show that anyone can code. https: 1283 1229 //www.bbc.co.uk/mediacentre/latestnews/2017/microbit-first-year. 1284 [3] Gavin M. Bierman, Martín Abadi, and Mads Torgersen. 2014. Under- 1230 1285 standing TypeScript. In ECOOP 2014 - Object-Oriented Programming - 1231 28th European Conference, Uppsala, Sweden, July 28 - August 1, 2014. 1286 1232 Proceedings. 257–281. https://doi.org/10.1007/978-3-662-44202-9_11 1287 1233 [4] Hans-J Boehm, Russ Atkinson, and Michael Plass. 1995. Ropes: an 1288 1234 alternative to strings. Software: Practice and Experience 25, 12 (1995), 1289 1315–1330. 1235 1290 [5] Satish Chandra, Colin S. Gordon, Jean-Baptiste Jeannin, Cole 1236 Schlesinger, Manu Sridharan, Frank Tip, and Young-Il Choi. 2016. 1291 1237 Type inference for static compilation of JavaScript. In Proceedings of 1292 1238 the 2016 ACM SIGPLAN International Conference on Object-Oriented 1293 1239 Programming, Systems, Languages, and Applications, OOPSLA 2016, part 1294 of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 1240 1295 2016. 410–429. https://doi.org/10.1145/2983990.2984017 1241 [6] Wontae Choi, Satish Chandra, George Necula, and Koushik Sen. 2015. 1296 1242 SJS: A type system for JavaScript with fixed object layout. In Interna- 1297 1243 tional Static Analysis Symposium. Springer, 181–198. 1298 1244 [7] James Devine, Joe Finney, Peli de Halleux, Michal Moskal, Thomas Ball, 1299 and Steve Hodges. 2018. MakeCode and CODAL: intuitive and efficient 1245 1300 embedded systems programming for education. In Proceedings of the 1246 19th ACM SIGPLAN/SIGBED International Conference on Languages, 1301 1247 Compilers, and Tools for Embedded Systems, LCTES 2018, Philadelphia, 1302 1248 PA, USA, June 19-20, 2018. 19–30. 1303 1249 [8] Evgeny Gavrin, Sung-Jae Lee, Ruben Ayrapetyan, and Andrey Shitov. 1304 2015. Ultra Lightweight JavaScript Engine for Internet of Things. In 1250 1305 SPLASH Companion 2015. 19–20. 1251 [9] Damien George. 2018. MicroPython. http://www.micropython.org. 1306 1252 [10] Isaac Gouy. 2018. The Computer Language Benchmarks Game. https: 1307 1253 //benchmarksgame-team.pages.debian.net/benchmarksgame/. 1308 1254 [11] Apple Inc. 2018. JetStream Benchmarks 1.1. https://www. 1309 browserbench.org/JetStream/in-depth.html. 1255 1310 [12] Adafruit Industries. 2018. CircuitPython. https://github.com/adafruit/ 1256 circuitpython. 1311 1257 [13] A. Rastogi, N. Swamy, C. Fournet, G. M. Bierman, and P. Vekris. 2015. 1312 1258 Safe & Efficient Gradual Typing for TypeScript. In Proceedings of the 1313 1259 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Pro- 1314 gramming Languages. 167–180. http://doi.acm.org/10.1145/2676726. 1260 1315 2676971 1261 [14] G. Richards, F. Z. Nardelli, and J. Vitek. 2015. Concrete Types for Type- 1316 1262 Script. In 29th European Conference on Object-Oriented Programming, 1317 1263 ECOOP 2015. 76–100. https://doi.org/10.4230/LIPIcs.ECOOP.2015.76 1318 1264 [15] Samsung. 2018. JerryScript. http://jerryscript.org. 1319 1265 12 1320