Arxiv:2012.02328V2 [Cs.LG] 26 Feb 2021
Total Page:16
File Type:pdf, Size:1020Kb
MLPerf Mobile Inference Benchmark † ‡ ‡ ‡ § Vijay Janapa Reddi* David Kanter Peter Mattson Jared Duke Thai Nguyen Ramesh Chukka ¶ ¶ || || Kenneth Shiring Koan-Sin Tan Mark Charlebois William Chou Mostafa El-Khamy** †† § Jungwook Hong** Michael Buch* Cindy Trinh Thomas Atta-fosu Fatih Cakir** ‡ ¶ ‡‡ Masoud Charkhabi Xiaodong Chen** Jimmy Chiang Dave Dexter ‡ §§ § †† Woncheol Heo Guenther Schmuelling Maryam Shabani Dylan Zika Abstract Consequently, mobile-device and chipset manufacturers are motivated to improve AI implementations. Support for MLPerf Mobile is the first industry-standard open- the technology is becoming common in nearly all mobile source mobile benchmark developed by industry members segments, from cost-optimized devices to premium phones. and academic researchers to allow performance/accuracy The many AI approaches range from purely software-based evaluation of mobile devices with different AI chips and techniques to hardware-supported machine learning that re- software stacks. The benchmark draws from the expertise lies on tightly coupled libraries. Seeing through the mist of of leading mobile-SoC vendors, ML-framework providers, competing solutions is difficult for mobile consumers. and model producers. In this paper, we motivate the drive to On the hardware front, laptops and smartphones have in- demystify mobile-AI performance and present MLPerf Mo- corporated application-specific integrated circuits (ASICs) bile’s design considerations, architecture, and implemen- to support AI in an energy-efficient manner. For machine tation. The benchmark comprises a suite of models that learning, this situation leads to custom hardware that ranges operate with standard data sets, quality metrics, and run from specialized instruction-set-architecture (ISA) exten- rules. For the first iteration, we developed an Android app sions on general-purpose CPUs to fixed-function acceler- to provide an “out-of-the-box” inference test for computer ators dedicated to efficient machine learning. Also, because vision and natural-language processing on mobile devices. mobile devices are complex, they incorporate a variety of MLPerf Mobile Inference also supports non-smartphone features to remain competitive, especially those that con- devices such as laptops and mobile PCs. As a whole, it serve battery life. can serve as a framework for integrating future models, The software front includes many code paths and AI for customizing quality-target thresholds to evaluate system infrastructures to satisfy the desire to efficiently support performance, for comparing software frameworks, and for machine-learning hardware. Most SoC vendors lean toward assessing heterogeneous-hardware capabilities for machine custom model compilation and deployment that integrates learning, all fairly and faithfully with reproducible results. tightly with the hardware. Examples include Google’s An- droid Neural Network API (NNAPI) [15], Intel’s Open- 1 Introduction VINO [5], MediaTek’s NeuroPilot [19], Qualcomm’s SNPE [23] and Samsung’s Exynos Neural Network SDK [21]. arXiv:2012.02328v2 [cs.LG] 26 Feb 2021 Mobile artificial-intelligence (AI) applications are in- These frameworks handle different numerical formats (e.g., creasingly important as AI technology becomes a critical FP32, FP16, and INT8) for execution, and they provide run- differentiator in smartphones, laptops, and other mobile de- time support for various machine-learning networks that vices. Many consumer applications benefit from AI: image best fit the application and platform. processing, voice processing, and text interpretation. It pro- Hardware and software support for mobile AI applica- vides state-of-the-art solutions to these tasks with a quality tions is becoming a differentiating capability, increasing that users will notice on their devices. More and more con- the need to make AI-performance evaluation transparent. sumers are employing such applications, and they expect OEMs, SoC vendors, and consumers benefit when mobile a high-quality experience—especially for applications with devices employ AI in ways they can see and compare. A video or audio interactivity. typical comparison point for smartphone makers and the *Harvard University †MLCommons ‡Google §Intel technical press, for example, is CPUs and GPUs, both of ¶MediaTek ||Qualcomm **Samsung ††ENS Paris-Saclay which have associated benchmarks [6]. Similarly, mobile- ‡‡Arm §§Microsoft AI performance can also benefit from benchmarks. 1 Quantifying AI performance is nontrivial, however. It layer is an abstraction that allows hardware vendors to op- is especially challenging because AI implementations come timize their implementations for neural networks. The app in a wide variety with differing capabilities. This variety, also has a presentation layer for wrapping the more techni- combined with a lack of software-interface standards, com- cal benchmark layers and the Load Generator (“LoadGen”) plicates the design of standard benchmarks. In edge de- [9]. MLPerf created the LoadGen [9] to allow representa- vices, the quality of the results is often highly specific to tive testing of different inference platforms and use cases each problem. In other words, the definition of high perfor- by generating inference requests in a pattern and measur- mance is often task specific. For interactive user devices, ing certain parameters (e.g., latency, throughput, or latency- latency is normally the preferred performance metric. For bounded throughput). MLPerf additionally offers a head- noninteractive ones, throughput is usually preferred. The less version of the mobile application that enables laptops implementation for each task can generally trade off neural- running non-mobile OSs to use the same benchmarks. network accuracy for lower latency. This tradeoff makes The first round of MLPerf Mobile submissions is com- choosing a benchmark suite’s accuracy threshold critical. plete [12]. Intel, MediaTek, Qualcomm, and Samsung To address these challenges, MLPerf (mlperf.org) takes participated in this round, and all passed the third-party- an open-source approach. It is a consortium of industry and validation requirement (i.e., reproducibility) for their re- academic organizations with shared interests, yielding col- sults. These results exhibit performance variations and lective expertise on neural-network models, data sets, and illustrate the wide range of hardware and software ap- submission rules to ensure the results are relevant to the in- proaches that vendors take to implement neural-network dustry and beneficial to consumers while being transparent models on mobile devices. They also highlight a crucial and reproducible. takeaway: measuring mobile-AI performance is challeng- The following are important principles that inform the ing but possible. It requires a deep understanding of the MLPerf Mobile benchmark: fragmented and heterogeneous mobile ecosystem as well as a strong commitment to fairness and reproducibility. • Measured performance should match the performance MLPerf Mobile is a step toward better benchmark trans- that end users perceive in commercial devices. We parency. want to prevent the benchmark from implementing special code beyond what these users generally em- 2 Benchmarking Challenges ploy. The mobile ecosystem is rife with hardware hetero- geneity, software fragmentation, developer options, deploy- • The benchmark’s neural-network models should ment scenarios, and OEM life cycles. Each by itself leads closely match typical mobile-device workloads. They to hardware-performance variability, but the combination should reflect real benefits to mobile-device users in makes AI benchmarking on mobile systems extremely dif- daily situations. ficult. Figure 1 shows the various constituents and explains the implementation options and challenges facing each one. • The models should represent diverse tasks. This ap- proach yields a challenging test that resists extensive 2.1 Hardware Heterogeneity domain-specific optimizations. Smartphones contain complex heterogeneous chipsets that provide many different compute units and accelerators. • Testing conditions should closely match the environ- Any or all of these components can aid in machine-learning ments in which mobile devices typically serve. Af- (ML) inference. As such, recognizing the variability of fected characteristics include ambient temperature, SoCs is crucial. battery power, and special performance modes that are A typical mobile system-on-a-chip (SoC) complex in- software adjustable. cludes a CPU cluster, GPU, DSP, neural processing unit • All benchmark submissions should undergo third- (NPU), Hexagon Tensor Accelerator (HTA), Hexagon Vec- party validation. Since mobile devices are ubiquitous, tor Extensions (HVX), and so on. Many smartphones to- results should be reproducible outside the submitting day are Arm based, but the CPU cores generally implement organization. a heterogeneous “Big.Little” architecture [4]. Some SoCs even have big-CPU clusters where some CPUs clock faster MLPerf’s approach to addressing the mobile-AI bench- than others. Also, devices fall into different tiers with dif- mark needs of smartphones is to build an Android app ferent hardware capabilities at different prices, varying in that all tests must use. As of the initial v0.7 release of their memory capacity and storage features. MLPerf Mobile, the app employs a standard set of four Any processing engine can run ML workloads, but neural-network models for three vision tasks and one NLP this flexibility also