Optimizing for the Modern Web on the ARM Architecture
Evens Pan Strategic Software Alliance ARM
1 Question… What is the most popular Programing Language?
Source: American art and American art collections; essays on artistic subjects by the best art writers. Volume 2. Boston, E.W. Walker & Co. 1889 Author Walter Montgomery
2 The 3 pillars of the modern Web
. 4 Billion Web Pages in the World today . Or is that 4 Billion Web Apps?
Creative Commons Attribution-Share Alike 3.0 Unported license. Author: Matthias Kabel
3 From Browser to OS Platform . Web Apps are now:
. Offline first
. Out of browser
. Rich and immersive
. >100,000 lines of JavaScript
. Providing access to device peripherals
4 What Web App performance really means . Benchmarking is irrelevant!*
. Neither Google or Mozilla care that much
. What matters is end-user experience
. Dropped frames are the currency of performance
. https://wiki.mozilla.org/Project_Eideticker
. http://jankfree.org/ * This is a systems integration perspective that assumes you have done your best with the components already
5 Building the new Smartphone OS'
云OS
6 JavaScript: The Assembly Language for the Web
Mozilla Google Web Toolkit emscripten
Java C/C++
Google GWT LLVM Compiler
JavaScript JavaScript
7 JavaScript Improvements . Single Page Apps (SPA) exceed 100,000 lines of JavaScript . Google Web Apps spend 50-70% of their time in V8 JavaScript Engine . ARM working on Google V8 JavaScript Engine for 3 years . 2010: Cortex-A9 was 35% slower than Atom on V8 benchmark* . 2013: Cortex-A9 is 25% faster than Atom on V8 benchmark* . 2012-2013 JavaScript on Desktop improved 24% and 57% in Mobile+ . Cortex-A15 optimizations to V8 made this possible . ARM Team now has Commit Rights to V8 Java Script codebase
*Clock-for- clock +Google IO 2013
8 Profiling JavaScript HTML5 Execution . ARM created an extension to Mozilla and Webkit Browsers . Developers can see hotspot analysis while specific JavaScript is executing . You can zero-in on key areas to optimize in your browser engine for web Apps . You can find bottlenecks in specific web Apps
9 Firefox Mobile OS: A True Web-based Platform
. Firefox Mobile OS uses Android Kernel . ARM has integrated Streamline . Full profiling of Firefox Mobile OS from Web Apps to Kernel
User Interface & APPS
Mozilla Gecko Web Engine
Standard API’s (Javascript)
Contacts NFC Camera Bluetooth SMS Telephony Audio Loca on Se ngs
OSAndroid Kernel (e.g Kernel., Android &Lin Deviceux, etc.) Driver Framework Device Hardware Improved performance from 5fps to 25fps
10
12/17/2013 10 LLVM for Native Web Apps
C/C++ Code . Portable Native Client (PNaCl) . Compiles C/++ code to LLVM bitcode
NaCl SDK . Some restrictions on constructs Bitcode is xlated on device native code PNaCL Cross Compiler . . Runs in browser sandbox on device
pexe portable executable Better than 80% of native performance HTML5 . (LLVM Bitcode) . PNaCl will hit the stable channel with Chrome 31 in a few weeks Internet
Browser
LLVM Backend Translator
ARMv7 CPU (VFP, NEON)
11 Optimizing WebRTC via VP9
“WebRTC is a new front in the long war for an open and unencumbered Web.” Brendan Eich Mozilla CTO
. Already supported by 1 Billion Browsers worldwide . DTMS Encrypted connection by default . Video, Voice and arbitrary reliable/unreliable low latency data . Peer-to-Peer as well as Peer-to-Server . Google Hangouts will use WebRTC in the future . WebRTC uses the VP9 video codec in the Chrome Browser . Linaro have optimized VP9 decoder using NEON technology . Improved performance in some paths by up to 20% . http://www.webmproject.org/code/contribute
12 Performance Ping Pong Continues
Graphics
JavaScript
13 Improving Graphics - Optimizing the other 30%-50%
. University of Szeged Webkit nullport aka GL2D port- research . Replacing everything below Graphics Context API with a new OpenGLES 2.0 libary
GL2D Skia
14 Font rendering – a prototype of multi-core
normalized time . A prototype test shows 1.4 the relationship between 1.2 performance increase 1 0.8 and the number of glyph single 24px 0.6 queries dual 24px 0.4
0.2
0 #of glyph queries 1 2 3 4 5 6 7 8 9 10 11 12 13 . Modification in chromium/skia also shows that we can get about 40% performance increase on a pure CJK text webpage on the first load . Most glyphs can be found in cache for European languages . Most glyphs can be found in cache if CJK text always displayed at the same size
15 Path filling – A patch for scanline by CPU multi-core
= + + +
• Case 1: Composite 4 complex polygons • Original: 2.22 ms • Patched (2 threads): 2.01 ms • Improvements: 10%
. Case 2: Composite 4 polygons (large size) . Original: 5.6 ms . Patched (2 threads): 4.0 ms . Improvements: 40%
16 New Beginnings of Multicore Browsers . 2010-2012 Webkit . 2009-2011 Gecko . ARM & Szeged found limited . Electrolysis project to split into improvement threads . Codebase large . Improved stability . Limited SMP ability . Limited performance improvement
. 2013 Google Announce . 2013 Mozilla Announce Servo
17 The 3 pillars of the modern browser
Performance Critical Ingredient Technologies
Skia
Optimization, Stabilization, Improvement
Creative Commons Attribution-Share Alike 3.0 Unported license. Author: Matthias Kabel
18 JIT Tooling for ARM Architecture V8
19 VIXL A64 dynamic code generation toolkit . Macro Assembler . Instruction generation with helpful macro assembler . Functions for abstracting eg. immediate generation . Disassembler . Disassembles everything supported by the assembler . Simulator: . High-speed AArch64 processor simulation on 64-bit platforms . Supports all instructions generated by the assembler . Debugger . Supports stepping, register and memory examination, breakpoints . Test suite . Functionality and disassembly tests for all supported instructions
20 VIXL Embedded in Virtual Machine
PC
Virtual Machine
JIT Built for x86* ARM64 Runtime Assembler ARM64 ISA Simulator Debug Disassembler
21 Where to Use VIXL . JITs: JavaScript, Java, Python, other scripting languages . Dynamic code generation of optimized routines . Testing: . Random Instruction Stream (RIS) testing . Toolchain testing . ISA experimentation: try out features of the new A64 ISA . Benefits . A simple, fast, tested API . Integrated suite, ready to use on a new JIT project . Supported by ARM . Liberal 3 clause BSD license
22 Conclusion . The Web has become an important Software Platform and ARM understands this . The extensive R&D effort by ARM is delivering higher browser performance . More contributions and collaboration from ARM partners please . Try this at home - it’s all Open Source
23 Thank You
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners
24