<<

Optimizing for the Modern Web on the ARM Architecture

Evens Pan Strategic Software Alliance ARM

1 Question… What is the most popular Programing Language?

Source: American art and American art collections; essays on artistic subjects by the best art writers. Volume 2. Boston, E.W. Walker & Co. 1889 Author Walter Montgomery

2 The 3 pillars of the modern Web

. 4 Billion Web Pages in the World today . Or is that 4 Billion Web Apps?

Creative Commons Attribution-Share Alike 3.0 Unported license. Author: Matthias Kabel

3 From Browser to OS Platform . Web Apps are now:

. Offline first

. Out of browser

. Rich and immersive

. >100,000 lines of JavaScript

. Providing access to device peripherals

4 What Web App performance really means . Benchmarking is irrelevant!*

. Neither or care that much

. What matters is end-user experience

. Dropped frames are the currency of performance

. ://wiki.mozilla.org/Project_Eideticker

. http://jankfree.org/ * This is a systems integration perspective that assumes you have done your best with the components already

5 Building the new Smartphone OS'

云OS

6 JavaScript: The for the Web

Mozilla

Java /C++

Google GWT LLVM

JavaScript JavaScript

7 JavaScript Improvements . Single Page Apps (SPA) exceed 100,000 lines of JavaScript . Google Web Apps spend 50-70% of their time in V8 JavaScript Engine . ARM working on Google V8 JavaScript Engine for 3 years . 2010: Cortex-A9 was 35% slower than on V8 benchmark* . 2013: Cortex-A9 is 25% faster than Atom on V8 benchmark* . 2012-2013 JavaScript on Desktop improved 24% and 57% in Mobile+ . Cortex-A15 optimizations to V8 made this possible . ARM Team now has Commit Rights to V8 Script codebase

*Clock-for- clock +Google IO 2013

8 Profiling JavaScript HTML5 Execution . ARM created an extension to Mozilla and Webkit Browsers . Developers can see hotspot analysis while specific JavaScript is executing . You can zero-in on key areas to optimize in your for web Apps . You can find bottlenecks in specific web Apps

9 Mobile OS: A True Web-based Platform

. Firefox Mobile OS uses Android Kernel . ARM has integrated Streamline . Full profiling of Firefox Mobile OS from Web Apps to Kernel

User Interface & APPS

Mozilla Web Engine

Standard API’s (Javascript)

Contacts NFC Camera Bluetooth SMS Telephony Audio Loca on Se ngs

OSAndroid Kernel (e.g Kernel., Android &Lin Deviceux, etc.) Driver Framework Device Hardware Improved performance from 5fps to 25fps

10

12/17/2013 10 LLVM for Native Web Apps

C/C++ Code . Portable Native Client (PNaCl) . Compiles C/++ code to LLVM bitcode

NaCl SDK . Some restrictions on constructs Bitcode is xlated on device native code PNaCL . . Runs in browser sandbox on device

pexe portable Better than 80% of native performance HTML5 . (LLVM Bitcode) . PNaCl will hit the stable channel with Chrome 31 in a few weeks Internet

Browser

LLVM Backend

ARMv7 CPU (VFP, NEON)

11 Optimizing WebRTC via VP9

“WebRTC is a new front in the long war for an open and unencumbered Web.” Mozilla CTO

. Already supported by 1 Billion Browsers worldwide . DTMS Encrypted connection by default . Video, Voice and arbitrary reliable/unreliable low latency data . Peer-to-Peer as well as Peer-to-Server . will use WebRTC in the future . WebRTC uses the VP9 video codec in the Chrome Browser . Linaro have optimized VP9 decoder using NEON technology . Improved performance in some paths by up to 20% . http://www.webmproject.org/code/contribute

12 Performance Ping Pong Continues

Graphics

JavaScript

13 Improving Graphics - Optimizing the other 30%-50%

. University of Szeged Webkit nullport aka GL2D port- research . Replacing everything below Graphics Context API with a new OpenGLES 2.0 libary

GL2D Skia

14 Font rendering – a prototype of multi-core

normalized time . A prototype test shows 1.4 the relationship between 1.2 performance increase 1 0.8 and the number of glyph single 24px 0.6 queries dual 24px 0.4

0.2

0 #of glyph queries 1 2 3 4 5 6 7 8 9 10 11 12 13 . Modification in /skia also shows that we can get about 40% performance increase on a pure CJK text webpage on the first load . Most glyphs can be found in cache for European languages . Most glyphs can be found in cache if CJK text always displayed at the same size

15 Path filling – A patch for scanline by CPU multi-core

= + + +

• Case 1: Composite 4 complex polygons • Original: 2.22 ms • Patched (2 threads): 2.01 ms • Improvements: 10%

. Case 2: Composite 4 polygons (large size) . Original: 5.6 ms . Patched (2 threads): 4.0 ms . Improvements: 40%

16 New Beginnings of Multicore Browsers . 2010-2012 Webkit . 2009-2011 Gecko . ARM & Szeged found limited . Electrolysis project to split into improvement threads . Codebase large . Improved stability . Limited SMP ability . Limited performance improvement

. 2013 Google Announce . 2013 Mozilla Announce . New experimental browser for . Webkit fork to underlie modern SoCs Chrome/Chromium . Designed for multicore . Focus on improvements for . Built with new language Rust modern SoC’s . Rust compiler uses LLVM

17 The 3 pillars of the modern browser

Performance Critical Ingredient Technologies

Skia

Optimization, Stabilization, Improvement

Creative Commons Attribution-Share Alike 3.0 Unported license. Author: Matthias Kabel

18 JIT Tooling for ARM Architecture V8

19 VIXL A64 dynamic code generation toolkit . Macro Assembler . Instruction generation with helpful macro assembler . Functions for abstracting eg. immediate generation . Disassembler . Disassembles everything supported by the assembler . Simulator: . High-speed AArch64 processor simulation on 64-bit platforms . Supports all instructions generated by the assembler . . Supports stepping, register and memory examination, breakpoints . Test suite . Functionality and disassembly tests for all supported instructions

20 VIXL Embedded in

PC

Virtual Machine

JIT Built for * ARM64 Runtime Assembler ARM64 ISA Simulator Debug Disassembler

21 Where to Use VIXL . JITs: JavaScript, Java, Python, other scripting languages . Dynamic code generation of optimized routines . Testing: . Random Instruction Stream (RIS) testing . Toolchain testing . ISA experimentation: try out features of the new A64 ISA . Benefits . A simple, fast, tested API . Integrated suite, ready to use on a new JIT project . Supported by ARM . Liberal 3 clause BSD license

22 Conclusion . The Web has become an important Software Platform and ARM understands this . The extensive R& effort by ARM is delivering higher browser performance . More contributions and collaboration from ARM partners please . Try this at home - it’s all Open Source

23 Thank You

The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners

24