Optimizing for the Modern Web on the ARM Architecture

Optimizing for the Modern Web on the ARM Architecture

Optimizing for the Modern Web on the ARM Architecture Evens Pan Strategic Software Alliance ARM 1 Question… What is the most popular Programing Language? Source: American art and American art collections; essays on artistic subjects by the best art writers. Volume 2. Boston, E.W. Walker & Co. 1889 Author Walter Montgomery 2 The 3 pillars of the modern Web . 4 Billion Web Pages in the World today . Or is that 4 Billion Web Apps? Creative Commons Attribution-Share Alike 3.0 Unported license. Author: Matthias Kabel 3 From Browser to OS Platform . Web Apps are now: . Offline first . Out of browser . Rich and immersive . >100,000 lines of JavaScript . Providing access to device peripherals 4 What Web App performance really means . Benchmarking is irrelevant!* . Neither Google or Mozilla care that much . What matters is end-user experience . Dropped frames are the currency of performance . https://wiki.mozilla.org/Project_Eideticker . http://jankfree.org/ * This is a systems integration perspective that assumes you have done your best with the components already 5 Building the new Smartphone OS' 云OS 6 JavaScript: The Assembly Language for the Web Mozilla Google Web Toolkit emscripten Java C/C++ Google GWT LLVM Compiler JavaScript JavaScript 7 JavaScript Improvements . Single Page Apps (SPA) exceed 100,000 lines of JavaScript . Google Web Apps spend 50-70% of their time in V8 JavaScript Engine . ARM working on Google V8 JavaScript Engine for 3 years . 2010: Cortex-A9 was 35% slower than Atom on V8 benchmark* . 2013: Cortex-A9 is 25% faster than Atom on V8 benchmark* . 2012-2013 JavaScript on Desktop improved 24% and 57% in Mobile+ . Cortex-A15 optimizations to V8 made this possible . ARM Team now has Commit Rights to V8 Java Script codebase *Clock-for- clock +Google IO 2013 8 Profiling JavaScript HTML5 Execution . ARM created an extension to Mozilla and Webkit Browsers . Developers can see hotspot analysis while specific JavaScript is executing . You can zero-in on key areas to optimize in your browser engine for web Apps . You can find bottlenecks in specific web Apps 9 Firefox Mobile OS: A True Web-based Platform . Firefox Mobile OS uses Android Kernel . ARM has integrated Streamline . Full profiling of Firefox Mobile OS from Web Apps to Kernel User Interface & APPS Mozilla Gecko Web Engine Standard API’s (Javascript) Contacts NFC Camera Bluetooth SMS Telephony Audio Loca on Se ngs OSAndroid Kernel (e.g Kernel., Android &Lin Deviceux, etc.) Driver Framework Device Hardware Improved performance from 5fps to 25fps 10 12/17/2013 10 LLVM for Native Web Apps C/C++ Code . Portable Native Client (PNaCl) . Compiles C/++ code to LLVM bitcode NaCl SDK . Some restrictions on constructs Bitcode is xlated on device native code PNaCL Cross Compiler . Runs in browser sandbox on device pexe portable executable Better than 80% of native performance HTML5 . (LLVM Bitcode) . PNaCl will hit the stable channel with Chrome 31 in a few weeks Internet Browser LLVM Backend Translator ARMv7 CPU (VFP, NEON) 11 Optimizing WebRTC via VP9 “WebRTC is a new front in the long war for an open and unencumbered Web.” Brendan Eich Mozilla CTO . Already supported by 1 Billion Browsers worldwide . DTMS Encrypted connection by default . Video, Voice and arbitrary reliable/unreliable low latency data . Peer-to-Peer as well as Peer-to-Server . Google Hangouts will use WebRTC in the future . WebRTC uses the VP9 video codec in the Chrome Browser . Linaro have optimized VP9 decoder using NEON technology . Improved performance in some paths by up to 20% . http://www.webmproject.org/code/contribute 12 Performance Ping Pong Continues Graphics JavaScript 13 Improving Graphics - Optimizing the other 30%-50% . University of Szeged Webkit nullport aka GL2D port- research . Replacing everything below Graphics Context API with a new OpenGLES 2.0 libary GL2D Skia 14 Font rendering – a prototype of multi-core normalized time . A prototype test shows 1.4 the relationship between 1.2 performance increase 1 0.8 and the number of glyph single 24px 0.6 queries dual 24px 0.4 0.2 0 #of glyph queries 1 2 3 4 5 6 7 8 9 10 11 12 13 . Modification in chromium/skia also shows that we can get about 40% performance increase on a pure CJK text webpage on the first load . Most glyphs can be found in cache for European languages . Most glyphs can be found in cache if CJK text always displayed at the same size 15 Path filling – A patch for scanline by CPU multi-core = + + + • Case 1: Composite 4 complex polygons • Original: 2.22 ms • Patched (2 threads): 2.01 ms • Improvements: 10% . Case 2: Composite 4 polygons (large size) . Original: 5.6 ms . Patched (2 threads): 4.0 ms . Improvements: 40% 16 New Beginnings of Multicore Browsers . 2010-2012 Webkit . 2009-2011 Gecko . ARM & Szeged found limited . Electrolysis project to split into improvement threads . Codebase large . Improved stability . Limited SMP ability . Limited performance improvement . 2013 Google Announce . 2013 Mozilla Announce Servo <blink> . New experimental browser for . Webkit fork to underlie modern SoCs Chrome/Chromium . Designed for multicore . Focus on improvements for . Built with new language Rust modern SoC’s . Rust compiler uses LLVM 17 The 3 pillars of the modern browser Performance Critical Ingredient Technologies Skia Optimization, Stabilization, Improvement Creative Commons Attribution-Share Alike 3.0 Unported license. Author: Matthias Kabel 18 JIT Tooling for ARM Architecture V8 19 VIXL A64 dynamic code generation toolkit . Macro Assembler . Instruction generation with helpful macro assembler . Functions for abstracting eg. immediate generation . Disassembler . Disassembles everything supported by the assembler . Simulator: . High-speed AArch64 processor simulation on 64-bit platforms . Supports all instructions generated by the assembler . Debugger . Supports stepping, register and memory examination, breakpoints . Test suite . Functionality and disassembly tests for all supported instructions 20 VIXL Embedded in Virtual Machine PC Virtual Machine JIT Built for x86* ARM64 Runtime Assembler ARM64 ISA Simulator Debug Disassembler 21 Where to Use VIXL . JITs: JavaScript, Java, Python, other scripting languages . Dynamic code generation of optimized routines . Testing: . Random Instruction Stream (RIS) testing . Toolchain testing . ISA experimentation: try out features of the new A64 ISA . Benefits . A simple, fast, tested API . Integrated suite, ready to use on a new JIT project . Supported by ARM . Liberal 3 clause BSD license 22 Conclusion . The Web has become an important Software Platform and ARM understands this . The extensive R&D effort by ARM is delivering higher browser performance . More contributions and collaboration from ARM partners please . Try this at home - it’s all Open Source 23 Thank You The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners 24 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    24 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us