Optimizing for the Modern Web on the ARM Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Optimizing for the Modern Web on the ARM Architecture Evens Pan Strategic Software Alliance ARM 1 Question… What is the most popular Programing Language? Source: American art and American art collections; essays on artistic subjects by the best art writers. Volume 2. Boston, E.W. Walker & Co. 1889 Author Walter Montgomery 2 The 3 pillars of the modern Web . 4 Billion Web Pages in the World today . Or is that 4 Billion Web Apps? Creative Commons Attribution-Share Alike 3.0 Unported license. Author: Matthias Kabel 3 From Browser to OS Platform . Web Apps are now: . Offline first . Out of browser . Rich and immersive . >100,000 lines of JavaScript . Providing access to device peripherals 4 What Web App performance really means . Benchmarking is irrelevant!* . Neither Google or Mozilla care that much . What matters is end-user experience . Dropped frames are the currency of performance . https://wiki.mozilla.org/Project_Eideticker . http://jankfree.org/ * This is a systems integration perspective that assumes you have done your best with the components already 5 Building the new Smartphone OS' 云OS 6 JavaScript: The Assembly Language for the Web Mozilla Google Web Toolkit emscripten Java C/C++ Google GWT LLVM Compiler JavaScript JavaScript 7 JavaScript Improvements . Single Page Apps (SPA) exceed 100,000 lines of JavaScript . Google Web Apps spend 50-70% of their time in V8 JavaScript Engine . ARM working on Google V8 JavaScript Engine for 3 years . 2010: Cortex-A9 was 35% slower than Atom on V8 benchmark* . 2013: Cortex-A9 is 25% faster than Atom on V8 benchmark* . 2012-2013 JavaScript on Desktop improved 24% and 57% in Mobile+ . Cortex-A15 optimizations to V8 made this possible . ARM Team now has Commit Rights to V8 Java Script codebase *Clock-for- clock +Google IO 2013 8 Profiling JavaScript HTML5 Execution . ARM created an extension to Mozilla and Webkit Browsers . Developers can see hotspot analysis while specific JavaScript is executing . You can zero-in on key areas to optimize in your browser engine for web Apps . You can find bottlenecks in specific web Apps 9 Firefox Mobile OS: A True Web-based Platform . Firefox Mobile OS uses Android Kernel . ARM has integrated Streamline . Full profiling of Firefox Mobile OS from Web Apps to Kernel User Interface & APPS Mozilla Gecko Web Engine Standard API’s (Javascript) Contacts NFC Camera Bluetooth SMS Telephony Audio Loca on Se ngs OSAndroid Kernel (e.g Kernel., Android &Lin Deviceux, etc.) Driver Framework Device Hardware Improved performance from 5fps to 25fps 10 12/17/2013 10 LLVM for Native Web Apps C/C++ Code . Portable Native Client (PNaCl) . Compiles C/++ code to LLVM bitcode NaCl SDK . Some restrictions on constructs Bitcode is xlated on device native code PNaCL Cross Compiler . Runs in browser sandbox on device pexe portable executable Better than 80% of native performance HTML5 . (LLVM Bitcode) . PNaCl will hit the stable channel with Chrome 31 in a few weeks Internet Browser LLVM Backend Translator ARMv7 CPU (VFP, NEON) 11 Optimizing WebRTC via VP9 “WebRTC is a new front in the long war for an open and unencumbered Web.” Brendan Eich Mozilla CTO . Already supported by 1 Billion Browsers worldwide . DTMS Encrypted connection by default . Video, Voice and arbitrary reliable/unreliable low latency data . Peer-to-Peer as well as Peer-to-Server . Google Hangouts will use WebRTC in the future . WebRTC uses the VP9 video codec in the Chrome Browser . Linaro have optimized VP9 decoder using NEON technology . Improved performance in some paths by up to 20% . http://www.webmproject.org/code/contribute 12 Performance Ping Pong Continues Graphics JavaScript 13 Improving Graphics - Optimizing the other 30%-50% . University of Szeged Webkit nullport aka GL2D port- research . Replacing everything below Graphics Context API with a new OpenGLES 2.0 libary GL2D Skia 14 Font rendering – a prototype of multi-core normalized time . A prototype test shows 1.4 the relationship between 1.2 performance increase 1 0.8 and the number of glyph single 24px 0.6 queries dual 24px 0.4 0.2 0 #of glyph queries 1 2 3 4 5 6 7 8 9 10 11 12 13 . Modification in chromium/skia also shows that we can get about 40% performance increase on a pure CJK text webpage on the first load . Most glyphs can be found in cache for European languages . Most glyphs can be found in cache if CJK text always displayed at the same size 15 Path filling – A patch for scanline by CPU multi-core = + + + • Case 1: Composite 4 complex polygons • Original: 2.22 ms • Patched (2 threads): 2.01 ms • Improvements: 10% . Case 2: Composite 4 polygons (large size) . Original: 5.6 ms . Patched (2 threads): 4.0 ms . Improvements: 40% 16 New Beginnings of Multicore Browsers . 2010-2012 Webkit . 2009-2011 Gecko . ARM & Szeged found limited . Electrolysis project to split into improvement threads . Codebase large . Improved stability . Limited SMP ability . Limited performance improvement . 2013 Google Announce . 2013 Mozilla Announce Servo <blink> . New experimental browser for . Webkit fork to underlie modern SoCs Chrome/Chromium . Designed for multicore . Focus on improvements for . Built with new language Rust modern SoC’s . Rust compiler uses LLVM 17 The 3 pillars of the modern browser Performance Critical Ingredient Technologies Skia Optimization, Stabilization, Improvement Creative Commons Attribution-Share Alike 3.0 Unported license. Author: Matthias Kabel 18 JIT Tooling for ARM Architecture V8 19 VIXL A64 dynamic code generation toolkit . Macro Assembler . Instruction generation with helpful macro assembler . Functions for abstracting eg. immediate generation . Disassembler . Disassembles everything supported by the assembler . Simulator: . High-speed AArch64 processor simulation on 64-bit platforms . Supports all instructions generated by the assembler . Debugger . Supports stepping, register and memory examination, breakpoints . Test suite . Functionality and disassembly tests for all supported instructions 20 VIXL Embedded in Virtual Machine PC Virtual Machine JIT Built for x86* ARM64 Runtime Assembler ARM64 ISA Simulator Debug Disassembler 21 Where to Use VIXL . JITs: JavaScript, Java, Python, other scripting languages . Dynamic code generation of optimized routines . Testing: . Random Instruction Stream (RIS) testing . Toolchain testing . ISA experimentation: try out features of the new A64 ISA . Benefits . A simple, fast, tested API . Integrated suite, ready to use on a new JIT project . Supported by ARM . Liberal 3 clause BSD license 22 Conclusion . The Web has become an important Software Platform and ARM understands this . The extensive R&D effort by ARM is delivering higher browser performance . More contributions and collaboration from ARM partners please . Try this at home - it’s all Open Source 23 Thank You The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners 24 .