Accelerate Your Mobile Apps for Android On
Total Page:16
File Type:pdf, Size:1020Kb
The Developer Summit at ARM® TechCon™ 2013 Accelerate your Mobile Apps and Games for Android™ on ARM Matthew Du Puy! Software Engineer, ARM The Developer Summit at ARM® TechCon™ 2013 Presenter Matthew Du Puy! Software Engineer, ARM! ! Matthew Du Puy is a software engineer at ARM and is currently working to ensuring mobile app performance on the latest ARM technologies. Previously a self employed embedded systems software contractor working primarily on the Linux Kernel and a mountain climber.! ! Contact Details: ! Email: [email protected] Title: Accelerate Your Mobile Apps and Games for Android on ARM Overview: Learn to perform Android application and systems level analysis on Android apps and platforms using tools from Google, ARM, AT&T and others. Find bottlenecks in both SDK and NDK activities and learn different approaches to fixing those bottlenecks and better utilize platform technologies and APIs. Problem: This is not a desktop ▪ Mobile apps require special design considerations that aren’t always clear and tools to solve increasingly complex systems are limited! ▪ Animations and games drop frames! ▪ Networking, display, real time audio and video processing eat battery! ▪ App won’t fit in memory constraints Analysis ▪ Fortunately Google, ARM and many others are developing analysis tools and solutions to these problems! ▪ Is my app … ?! ▪ CPU/GPGPU bound! ▪ I/O or memory constrained! ▪ Power efficient! ▪ What can I do to fix it?# (short of buying everyone who runs my app# a Quad-core ARM® Cortex™-A15 processor # & ARM Mali™-T604 processor or Octo phone) In emerging markets, not everyone has access to the latest and greatest devices but they still want to game, shop, socialize and learn with their mobiles. You may consider this even if you think you app is efficient on modern devices. Analysis of Java SDK Android Apps ▪ Static analysis with SDK Lint tool! ▪ Dynamic analysis with DDMS! ▪ Allocation/heap! ▪ Process and thread utilization! ▪ Traceview (method) ! ▪ Network ! ▪ Hierarchy Viewer! ▪ Systrace But ask yourself these questions ▪ Is this performance bottleneck parallelizable?! ▪ Is this Java or Native?# Would it be better the# other way around?! ▪ Has this been done# before? Don’t reinvent the wheel.! ▪ Am I being smart with resources?! ▪ What version of Android should I target? Yes, Native is almost ALWAYS faster (what did you expect to hear from an ARM engineer?) BUT there are some cases where the context switch through JNI can be called so frequently for such small tasks that you thrash and undo the benefit. Even if you target android 2.2; you should use 4.1 for analysis as some of these tools only exist in JellyBean or have been greatly improved in it. New android devices buyers are likely running the latest and greatest and they are also, likely, new app buyers as well. Starting EASY! Static analysis: LINT Static slides in case live demo is not possible or working: It seems obvious to most, static analysis helps you avoid logic flaws and points out potentially hazardous data handling but Google has built a solid library of other issues, descriptions of the issue and even suggestions for fixing the problem. This is from SGTPuzzles, a free, opensouce game. You can try this at home. There are some allocations going on during the draw calls. Bad application. Bad. Static analysis: LINT This is from glTron, another open source game. Avoid type conversion between double and floats by using android specific math libraries like android.util.FloatMath instead of java.lang.Math calls Here you can see in the “Category” column, they’ve grouped analysis in terms of Performance, Security and Correctness amongst others Beyond static analysis# Dalvik Debug Monitor Server (DDMS) ▪ DDMS Thread analysis (like “top” but better) Fire up DDMS, select your android process and peer inside the process to see each thread it touches and how much time you spend in each thread. Here, again, is glTron, we can see we’re spending most of the time in the GLThread but this doesn’t tell us everything. We don’t know why we’re spending so much time there. Is it allocating frequently? Is it thrashing with frequently blocked calls? DDMS: Traceview# How much CPU time is each method consuming? ▪ Traceview (start method profiling button) Traceview allows you to see more information. You can view how much time is being spent in each Java method. You can drill down through children and up the parent tree and see how much CPU time each method is consuming. Try to identify a few key methods that are consuming most of the CPU. We can see most of these calls are related to rendering graphics, animations or signal processing related methods. Allocations and HEAP# are you allocating in a high frequency method? This is the allocation tracker. Find out if you’re making allocations in frequently run methods. I once saw a program (Google) that was allocating exception strings on the VSYNC pulse… allocating at 60 times per-second is a great way to ensure Dalvik wasting time doing Garbage collection. HEAP:# Is your app running out of memory? Older version of android limited the heap to 16MB per process. Just because things run smoothly on your fancy new device doesn’t mean they’ll run smoothly on all devices. This will give you a high level overview of how much memory you’re using in real time. If you suspect an activity is using too much memory at an instance, run to that point and cause a Garbage Collection manually with the button and see. Caveat, some methods for improving performance like multi-threading will consume more memory. Network Statistics ▪ Save battery, look for short spikes that can be delayed! ▪ TrafficStats API allows you to tag individual sockets !There is a built in network Statistics analyzer, it works fine but I’ll show you an even better (also free) tool called ARO later. By monitoring the frequency of your data transfers, and the amount of data transferred during each connection, you can identify areas of your application that can be made more battery-efficient. Generally, you should look for short spikes that can be delayed, or that should cause a later transfer to be pre-empted. To better identify the cause of transfer spikes, the TrafficStats API allows you to tag the data transfers occurring within a thread and its sockets Ever wonder if your popular ad supported app is caching ads, sending analytics and retrieving songs at the same time or simply running up the network stack and radio to grab ads on a timer regardless of other downloads? More on this later with Application Resource Optimizer (ARO open source project from AT&T) Adb shell DUMPsys ▪ With dumpsys you can check:! ▪ Event Hub State! ▪ Input Reader State! ▪ Input Dispatcher State! ▪ any number of other systems# e.g. dumpsys gfxinfo Dumpsys gfxinfo ▪ Drop dumpsys data columns in to a spreadsheet and visualize…# e.g. Will my animation drop frames? Here we can see an animation we’ve made in a java thread in blue. Then we stack the amount of time android spends processing the display list and swapping buffers. In a VSYNCed system, there are roughly 60 frames per second which means we have 15ms to get our animation frame drawn, processed and ready to be displayed. With this simple visualization from dumpsys data, we can see if we’re going to spend too much time drawing in our app and cause the system to drop a frame. This is just one example of what we can do with dumpsys data. Systrace ▪ I’ve done all I can to analyze inside my app but still can’t find the bottleneck.# # Systrace to# the rescue!# ▪ Systrace.py will generate a# 5 second system level snapshot# # As of Android 4.3, systrace.py has many more configuration options including configurable length traces (more than 5 sec now) Systrace# html5 page of info: Navigate with ‘w’a’s’d’ Systrace will show you a very high level view of what processes are occurring on the CPU and some information about what active threads are doing. E.g. this game glTron. We can see input events on the main thread (PID 29194) we can also see its child thread rendering and triggering the OpenGL updater and resulting effects on SufaceFlinger. We could zoom in and see if any of the parts of the system are blocked waiting for another part of the system. Here again, we can also see the VSYNC 60fps pulse. Other system profilers to consider: ▪ Chainfire PerfMon App - Free on XDA-Developers! ▪ Foreground App ! ▪ CPU! ▪ Disk I/O! ▪ Network I/O! ▪ From Qualcomm! ▪ Trepn Profiler App – overlay mode similar to PerfMon but can monitor Android Intents, log states, allows external control and power monitoring.! ▪ Adreno SDK and Profiler for profiling the Adreno GPU Analyzing native C/C++ (NDK) ▪ But I didn’t use the Java SDK to write my app! How do I analyze my already wicked fast native (or iOS app objective-C port) code?! ▪ What about the Linux kernel part of system analysis? ! ▪ Notes of caution! ▪ Applications that use NDK well will be faster and slicker! ▪ Ones that don’t will be cursed by unhappy users! ▪ If you build .so libraries for ARM, only ARM devices will be able to run your apps! ▪ Fortunately not many Android Platforms that aren’t ARM! ▪ Good use of the NDK will narrow the difference# between high and low end devices! ▪ Moving inefficient code to Native doesn’t magically# make it better code DS-5 CE for Android App Developers DS-5 CE ▪ Friendly, Reliable App Debugger! Project Manager ▪ Powerful graphical user interface! Application Performance ▪ ADB integration for native debug! Debugger Analyzer ▪ Java* and native debug in the same IDE!