Optimizing Performance of HPC Storage Systems

Total Page:16

File Type:pdf, Size:1020Kb

Optimizing Performance of HPC Storage Systems Optimizing Performance of HPC Storage Systems Optimzing performance for reads and writes. Torben Kling Petersen, PhD John Fragalla HPC Storage Solutions HPC Storage Solutions Xyratex Ltd Xyratex International Inc. Langstone Road, Havant, Hampshire PO9 1SA, UK 46831 Lakeview Blvd, Fremont, California 94538, USA [email protected] [email protected] Abstract — The performance of HPC storage written on older, less capable solutions. For compute systems systems depend upon a variety of factors. The results of using on the Top500 list, the benchmark of choice is Linpack [1] any of the standard benchmark suites for storage depends not that will test a large system floating point performance as a only on the storage architecture, but also on the type of disk result of the CPU architecture and system interconnect. For drives, type and design of the interconnect, and the type and storage, however, there is no similar standard and number of clients. In addition, each of the benchmark suites unfortunately no equivalent list. This had led some vendors have a number of different parameters and test methodologies to state performance of their products in different ways such that require careful analysis to determine the optimal settings as aggregated theoretical disk bandwidth that obviously says for a successful benchmark run. To reliably benchmark a nothing about the systems actual performance. storage solution, every stage of the solution needs to be analyzed including block and file performance of the RAID, In the storage arena, a number of different benchmark network and client throughput to the entire filesystem and suites exist. These can be used to measure a systems I/O meta data servers. For a filesystem to perform at peak performance regardless if the solution is based on direct performance, there needs to be a balance between the actual attached storage, network attached storage (such as NFS performance of the disk drives, the SAS chain supporting the based solutions) or a parallel distributed filesystem (such as RAID sets, the RAID code used (whether hardware RAID GPFS or Lustre)[2, 3]. controllers or software MD-RAID), the interconnect and finally the clients. This paper describes these issues with For general filesystem I/O, benchmark tools such as IOR respect to the Lustre filesystem. The dependence of [4], IOzone [5], Bonnie++ [6] as well as meta data benchmark results with respect to various parameters is performance of said filesystem such as MDtest [7]. For the shown. Using a single storage enclosure consisting of 8 RAID purpose of this paper, we focused on obdfilter-survey and sets (8+2 drives each) it is possible achieve both read and write IOR. performances in excess of 6 GB/s which translates to more than 36 GB/s per rack of measured client based throughput. This ® paper will focus on using Linux performance tool, obdfilter- II. LUSTRE PARALLEL FILESYSTEM survey, and IOR to measure different levels of the filesystem While there’s a number of parallel filesystems used in performance using Lustre. HPC solutions today, Lustre currently dominates with 5 of the top10 systems and more that 70% of the Top100 [8]. Keywords—HPC Storage, Benchmarking, This is also why Xyratex is basing all solutions on this filesystem and hence this paper will focus on Lustre I. INTRODUCTION performance benchmarking. Benchmarking system performance has always been an Lustre is based on the concept of separating meta data important factor in the HPC industry. This is especially true from block I/O using a pair of metadata servers (MDS) and a during procurement phases of a new system. Although number of object store servers (OSSes) all interconnected benchmarks seldom, if ever, provide any indication of real with a network fabric (fig. 1). life performance of an application (unless the application itself is benchmarked at intended scale which is usually hard III. STORAGE SYSTEM ARCHITECTURES before the final system is delivered), a benchmark allows an end user to compare an offered solution to others. In a While the choice of filesystem probably has the greatest number of RFPs, the customers have written their own impact on I/O performance, it is important to remember that benchmarks which is distributed to vendors with specific the specific architecture of the underlying storage solution is instructions of how they should be wrong. While this is a also very important to understand to achieve good prudent approach, these benchmark codes rarely shows the throughput. Factors that affect the storage performance peak performance of a new solution as they often were ranges from the choice of RAID system (hardware based or software such as MD-RAID), disk protocols such as S-ATA Fig. 1 Schematic overview of active components making up the Lustre filesystem. and SAS, disk type (even within a range of say 2TB NL-SAS The embedded server modules (2 per enclosure) are drives from one of the 3 major vendors, has very different based on a single socket Intel Xeon CPU, 32 GB RAM and I/O characteristics), firmware levels of all components, the an onboard Mellanox CX3 FDR capable HCA. The components of the software stack, tuning parameters ClusterStor architecture uses Lustre version 2.1.3 + Xyratex (important even on the client side) and type of interconnect extensions. The ClusterStor solution benchmarked consists to mention a few. of a total of 2 SSUs, 4 OSS servers, and 16 OSTs. It is therefore important to benchmark the underlying The storage subsystem is configured with the following hardware before starting client based tests to set a baseline of Lustre parameters: writethrough cache enabled, read cache the disk backend subsystem. The most commonly used tool enabled and the read cache max filesize = 1M. The MDT, for this (at the filesystem level) is obdfilter-survey[9]. MGS and OSTs leverage software RAID and no data caching. All data is stored directly to disk and committed IV. SYSTEM ARCHITECTURE before sending an ACK back to the client. A. Storage Subsystem B. Clients and interconnect The ClusterStor 6000 Lustre based engineered solution The test clients were made up of 64 x86 compute nodes consist of a pair of active-passive failover servers with with the following configuration: shared storage acting as the systems metadata server (MDS) • 2x Intel Xeon Westmere Processors and metadata management server (MGS). Each object store servers consists of a 5 RU enclosure, known as a Scalable • 24 GB RAM Storage Unit (SSU) hosting 84 3,5’ disk drive slots and two • QDR interface embedded OSS server modules in the back of the All clients as well as the ClusterStor SSUs were enclosure[10]. The following storage components were used: connected to a Mellanox based FDR capable core switch • 82 Hitachi Mars-K near line SAS drives (3,5’ form fabric. factor) with a total capacity of 2 TB each. The clients connected to the storage were configured • 2 Hitachi Ralston Peak SSDs (2.5’ form factor in a with Lustre client version 1.8.8. Each client is also tuned 3.5’ carrier) with a total capacity of 100 GB each. according to industry best practice with LRU and check- sums disabled, and the max RPCs in flight set to 32. The disk drives are configured as 8 RAID 6 (8+2) volumes formatted with ldiskfs rendering them into 8 object V. TEST METHODOLOGY store targets (OST) with the remaining 2 drives acting as roaming spares. The 2 SSDs (formatted with RAID 1) was To define the optimal benchmark parameters, a number exclusively used for write intend bitmaps WIBs) and for of settings were tested. This paper will focus on the Linux journals. following subset of all user modifiable test parameters: # pdsh -g oss "TERM=linux thrlo=256 thrhi=256 nobjlo=1 nobjhi=1 rsz=1024K size=32768 obdfilter-survey" cstor01n04: ost 4 sz 134217728K rsz 1024K obj 4 thr 1024 write 3032.89 [ 713.86, 926.89] rewrite 3064.15 [ 722.83, 848.93] read 3944.49 [ 912.83,1112.82] cstor01n05: ost 4 sz 134217728K rsz 1024K obj 4 thr 1024 write 3022.43 [ 697.83, 819.86] rewrite 3019.55 [ 705.15, 827.87] read 3959.50 [ 945.20,1125.76] Fig. 2 Results of an optimized obdfilter-survey run on a single SSU. The individual OSS results are highlighted in bold. where transfer size have been used as a variable but most A. Obdfilter-survey ranges from sub 1 MB to a max of 4 MB. As our previous A part of the Lustre I/O kit, the obdfilter-survey script performance experiences with the ClusterStor solution have generates sequential I/O from varying numbers of threads indicated that the system architecture requires significant and objects (files) to simulate the I/O patterns of a Lustre load to perform well, we decided to vary transfer size (-t) the client. While it can run on a network attached client, it is between 1 and 64 megabytes. commonly used as a server side tool. E. Number of client threads B. I/O mode As IOR uses MPI for its execution on multiple nodes, it IOR supports a number of I/O modes such as Buffered is possible to control the number of processes each node will I/O and Direct I/O. In addition, the benchmark suite also use. This is set with the mpirun argument of –np. To assess supports simulation of several application characteristics how much load each client needs to produce to achieve such as file per process (FFP) and single shared file (SSF). In maximum throughput on the storage solution, a number of this paper we focused used both buffered and direct I/O runs were done varying the total number of processes using the file per process methodology.
Recommended publications
  • Bull SAS: Novascale B260 (Intel Xeon Processor 5110,1.60Ghz)
    SPEC CINT2006 Result spec Copyright 2006-2014 Standard Performance Evaluation Corporation Bull SAS SPECint2006 = 10.2 NovaScale B260 (Intel Xeon processor 5110,1.60GHz) SPECint_base2006 = 9.84 CPU2006 license: 20 Test date: Dec-2006 Test sponsor: Bull SAS Hardware Availability: Dec-2006 Tested by: Bull SAS Software Availability: Dec-2006 0 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0 18.0 12.7 400.perlbench 11.6 8.64 401.bzip2 8.41 6.59 403.gcc 6.38 11.9 429.mcf 12.7 12.0 445.gobmk 10.6 6.90 456.hmmer 6.72 10.8 458.sjeng 9.90 11.0 462.libquantum 10.8 17.1 464.h264ref 16.8 9.22 471.omnetpp 8.38 7.84 473.astar 7.83 12.5 483.xalancbmk 12.4 SPECint_base2006 = 9.84 SPECint2006 = 10.2 Hardware Software CPU Name: Intel Xeon 5110 Operating System: Windows Server 2003 Enterprise Edition (32 bits) CPU Characteristics: 1.60 GHz, 4MB L2, 1066MHz bus Service Pack1 CPU MHz: 1600 Compiler: Intel C++ Compiler for IA32 version 9.1 Package ID W_CC_C_9.1.033 Build no 20061103Z FPU: Integrated Microsoft Visual Studio .NET 2003 (lib & linker) CPU(s) enabled: 1 core, 1 chip, 2 cores/chip MicroQuill SmartHeap Library 8.0 (shlW32M.lib) CPU(s) orderable: 1 to 2 chips Auto Parallel: No Primary Cache: 32 KB I + 32 KB D on chip per core File System: NTFS Secondary Cache: 4 MB I+D on chip per chip System State: Default L3 Cache: None Base Pointers: 32-bit Other Cache: None Peak Pointers: 32-bit Memory: 8 GB (2GB DIMMx4, FB-DIMM PC2-5300F ECC CL5) Other Software: None Disk Subsystem: 73 GB SAS, 10000RPM Other Hardware: None
    [Show full text]
  • Microbenchmarks in Big Data
    M Microbenchmark Overview Microbenchmarks constitute the first line of per- Nicolas Poggi formance testing. Through them, we can ensure Databricks Inc., Amsterdam, NL, BarcelonaTech the proper and timely functioning of the different (UPC), Barcelona, Spain individual components that make up our system. The term micro, of course, depends on the prob- lem size. In BigData we broaden the concept Synonyms to cover the testing of large-scale distributed systems and computing frameworks. This chap- Component benchmark; Functional benchmark; ter presents the historical background, evolution, Test central ideas, and current key applications of the field concerning BigData. Definition Historical Background A microbenchmark is either a program or routine to measure and test the performance of a single Origins and CPU-Oriented Benchmarks component or task. Microbenchmarks are used to Microbenchmarks are closer to both hardware measure simple and well-defined quantities such and software testing than to competitive bench- as elapsed time, rate of operations, bandwidth, marking, opposed to application-level – macro or latency. Typically, microbenchmarks were as- – benchmarking. For this reason, we can trace sociated with the testing of individual software microbenchmarking influence to the hardware subroutines or lower-level hardware components testing discipline as can be found in Sumne such as the CPU and for a short period of time. (1974). Furthermore, we can also find influence However, in the BigData scope, the term mi- in the origins of software testing methodology crobenchmarking is broadened to include the during the 1970s, including works such cluster – group of networked computers – acting as Chow (1978). One of the first examples of as a single system, as well as the testing of a microbenchmark clearly distinguishable from frameworks, algorithms, logical and distributed software testing is the Whetstone benchmark components, for a longer period and larger data developed during the late 1960s and published sizes.
    [Show full text]
  • Overview of the SPEC Benchmarks
    9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
    [Show full text]
  • Comparing Filesystem Performance: Red Hat Enterprise Linux 6 Vs
    COMPARING FILE SYSTEM I/O PERFORMANCE: RED HAT ENTERPRISE LINUX 6 VS. MICROSOFT WINDOWS SERVER 2012 When choosing an operating system platform for your servers, you should know what I/O performance to expect from the operating system and file systems you select. In the Principled Technologies labs, using the IOzone file system benchmark, we compared the I/O performance of two operating systems and file system pairs, Red Hat Enterprise Linux 6 with ext4 and XFS file systems, and Microsoft Windows Server 2012 with NTFS and ReFS file systems. Our testing compared out-of-the-box configurations for each operating system, as well as tuned configurations optimized for better performance, to demonstrate how a few simple adjustments can elevate I/O performance of a file system. We found that file systems available with Red Hat Enterprise Linux 6 delivered better I/O performance than those shipped with Windows Server 2012, in both out-of- the-box and optimized configurations. With I/O performance playing such a critical role in most business applications, selecting the right file system and operating system combination is critical to help you achieve your hardware’s maximum potential. APRIL 2013 A PRINCIPLED TECHNOLOGIES TEST REPORT Commissioned by Red Hat, Inc. About file system and platform configurations While you can use IOzone to gauge disk performance, we concentrated on the file system performance of two operating systems (OSs): Red Hat Enterprise Linux 6, where we examined the ext4 and XFS file systems, and Microsoft Windows Server 2012 Datacenter Edition, where we examined NTFS and ReFS file systems.
    [Show full text]
  • Hypervisors Vs. Lightweight Virtualization: a Performance Comparison
    2015 IEEE International Conference on Cloud Engineering Hypervisors vs. Lightweight Virtualization: a Performance Comparison Roberto Morabito, Jimmy Kjällman, and Miika Komu Ericsson Research, NomadicLab Jorvas, Finland [email protected], [email protected], [email protected] Abstract — Virtualization of operating systems provides a container and alternative solutions. The idea is to quantify the common way to run different services in the cloud. Recently, the level of overhead introduced by these platforms and the lightweight virtualization technologies claim to offer superior existing gap compared to a non-virtualized environment. performance. In this paper, we present a detailed performance The remainder of this paper is structured as follows: in comparison of traditional hypervisor based virtualization and Section II, literature review and a brief description of all the new lightweight solutions. In our measurements, we use several technologies and platforms evaluated is provided. The benchmarks tools in order to understand the strengths, methodology used to realize our performance comparison is weaknesses, and anomalies introduced by these different platforms in terms of processing, storage, memory and network. introduced in Section III. The benchmark results are presented Our results show that containers achieve generally better in Section IV. Finally, some concluding remarks and future performance when compared with traditional virtual machines work are provided in Section V. and other recent solutions. Albeit containers offer clearly more dense deployment of virtual machines, the performance II. BACKGROUND AND RELATED WORK difference with other technologies is in many cases relatively small. In this section, we provide an overview of the different technologies included in the performance comparison.
    [Show full text]
  • Power Measurement Tutorial for the Green500 List
    Power Measurement Tutorial for the Green500 List R. Ge, X. Feng, H. Pyla, K. Cameron, W. Feng June 27, 2007 Contents 1 The Metric for Energy-Efficiency Evaluation 1 2 How to Obtain P¯(Rmax)? 2 2.1 The Definition of P¯(Rmax)...................................... 2 2.2 Deriving P¯(Rmax) from Unit Power . 2 2.3 Measuring Unit Power . 3 3 The Measurement Procedure 3 3.1 Equipment Check List . 4 3.2 Software Installation . 4 3.3 Hardware Connection . 4 3.4 Power Measurement Procedure . 5 4 Appendix 6 4.1 Frequently Asked Questions . 6 4.2 Resources . 6 1 The Metric for Energy-Efficiency Evaluation This tutorial serves as a practical guide for measuring the computer system power that is required as part of a Green500 submission. It describes the basic procedures to be followed in order to measure the power consumption of a supercomputer. A supercomputer that appears on The TOP500 List can easily consume megawatts of electric power. This power consumption may lead to operating costs that exceed acquisition costs as well as intolerable system failure rates. In recent years, we have witnessed an increasingly stronger movement towards energy-efficient computing systems in academia, government, and industry. Thus, the purpose of the Green500 List is to provide a ranking of the most energy-efficient supercomputers in the world and serve as a complementary view to the TOP500 List. However, as pointed out in [1, 2], identifying a single objective metric for energy efficiency in supercom- puters is a difficult task. Based on [1, 2] and given the already existing use of the “performance per watt” metric, the Green500 List uses “performance per watt” (PPW) as its metric to rank the energy efficiency of supercomputers.
    [Show full text]
  • Towards Better Performance Per Watt in Virtual Environments on Asymmetric Single-ISA Multi-Core Systems
    Towards Better Performance Per Watt in Virtual Environments on Asymmetric Single-ISA Multi-core Systems Viren Kumar Alexandra Fedorova Simon Fraser University Simon Fraser University 8888 University Dr 8888 University Dr Vancouver, Canada Vancouver, Canada [email protected] [email protected] ABSTRACT performance per watt than homogeneous multicore proces- Single-ISA heterogeneous multicore architectures promise to sors. As power consumption in data centers becomes a grow- deliver plenty of cores with varying complexity, speed and ing concern [3], deploying ASISA multicore systems is an performance in the near future. Virtualization enables mul- increasingly attractive opportunity. These systems perform tiple operating systems to run concurrently as distinct, in- at their best if application workloads are assigned to het- dependent guest domains, thereby reducing core idle time erogeneous cores in consideration of their runtime proper- and maximizing throughput. This paper seeks to identify a ties [4][13][12][18][24][21]. Therefore, understanding how to heuristic that can aid in intelligently scheduling these vir- schedule data-center workloads on ASISA systems is an im- tualized workloads to maximize performance while reducing portant problem. This paper takes the first step towards power consumption. understanding the properties of data center workloads that determine how they should be scheduled on ASISA multi- We propose that the controlling domain in a Virtual Ma- core processors. Since virtual machine technology is a de chine Monitor or hypervisor is relatively insensitive to changes facto standard for data centers, we study virtual machine in core frequency, and thus scheduling it on a slower core (VM) workloads. saves power while only slightly affecting guest domain per- formance.
    [Show full text]
  • The High Performance Linpack (HPL) Benchmark Evaluation on UTP High Performance Computing Cluster by Wong Chun Shiang 16138 Diss
    The High Performance Linpack (HPL) Benchmark Evaluation on UTP High Performance Computing Cluster By Wong Chun Shiang 16138 Dissertation submitted in partial fulfilment of the requirements for the Bachelor of Technology (Hons) (Information and Communications Technology) MAY 2015 Universiti Teknologi PETRONAS Bandar Seri Iskandar 31750 Tronoh Perak Darul Ridzuan CERTIFICATION OF APPROVAL The High Performance Linpack (HPL) Benchmark Evaluation on UTP High Performance Computing Cluster by Wong Chun Shiang 16138 A project dissertation submitted to the Information & Communication Technology Programme Universiti Teknologi PETRONAS In partial fulfillment of the requirement for the BACHELOR OF TECHNOLOGY (Hons) (INFORMATION & COMMUNICATION TECHNOLOGY) Approved by, ____________________________ (DR. IZZATDIN ABDUL AZIZ) UNIVERSITI TEKNOLOGI PETRONAS TRONOH, PERAK MAY 2015 ii CERTIFICATION OF ORIGINALITY This is to certify that I am responsible for the work submitted in this project, that the original work is my own except as specified in the references and acknowledgements, and that the original work contained herein have not been undertaken or done by unspecified sources or persons. ________________ WONG CHUN SHIANG iii ABSTRACT UTP High Performance Computing Cluster (HPCC) is a collection of computing nodes using commercially available hardware interconnected within a network to communicate among the nodes. This campus wide cluster is used by researchers from internal UTP and external parties to compute intensive applications. However, the HPCC has never been benchmarked before. It is imperative to carry out a performance study to measure the true computing ability of this cluster. This project aims to test the performance of a campus wide computing cluster using a selected benchmarking tool, the High Performance Linkpack (HPL).
    [Show full text]
  • A Study on the Evaluation of HPC Microservices in Containerized Environment
    ReceivED XXXXXXXX; ReVISED XXXXXXXX; Accepted XXXXXXXX DOI: xxx/XXXX ARTICLE TYPE A Study ON THE Evaluation OF HPC Microservices IN Containerized EnVIRONMENT DeVKI Nandan Jha*1 | SaurABH Garg2 | Prem PrAKASH JaYARAMAN3 | Rajkumar Buyya4 | Zheng Li5 | GrAHAM Morgan1 | Rajiv Ranjan1 1School Of Computing, Newcastle UnivERSITY, UK Summary 2UnivERSITY OF Tasmania, Australia, 3Swinburne UnivERSITY OF TECHNOLOGY, AustrALIA Containers ARE GAINING POPULARITY OVER VIRTUAL MACHINES (VMs) AS THEY PROVIDE THE ADVANTAGES 4 The UnivERSITY OF Melbourne, AustrALIA OF VIRTUALIZATION WITH THE PERFORMANCE OF NEAR bare-metal. The UNIFORMITY OF SUPPORT PROVIDED 5UnivERSITY IN Concepción, Chile BY DockER CONTAINERS ACROSS DIFFERENT CLOUD PROVIDERS MAKES THEM A POPULAR CHOICE FOR DEVel- Correspondence opers. EvOLUTION OF MICROSERVICE ARCHITECTURE ALLOWS COMPLEX APPLICATIONS TO BE STRUCTURED INTO *DeVKI Nandan Jha Email: [email protected] INDEPENDENT MODULAR COMPONENTS MAKING THEM EASIER TO manage. High PERFORMANCE COMPUTING (HPC) APPLICATIONS ARE ONE SUCH APPLICATION TO BE DEPLOYED AS microservices, PLACING SIGNIfiCANT RESOURCE REQUIREMENTS ON THE CONTAINER FRamework. HoweVER, THERE IS A POSSIBILTY OF INTERFERENCE BETWEEN DIFFERENT MICROSERVICES HOSTED WITHIN THE SAME CONTAINER (intra-container) AND DIFFERENT CONTAINERS (inter-container) ON THE SAME PHYSICAL host. In THIS PAPER WE DESCRIBE AN Exten- SIVE EXPERIMENTAL INVESTIGATION TO DETERMINE THE PERFORMANCE EVALUATION OF DockER CONTAINERS EXECUTING HETEROGENEOUS HPC microservices. WE ARE PARTICULARLY CONCERNED WITH HOW INTRa- CONTAINER AND inter-container INTERFERENCE INflUENCES THE performance. MoreoVER, WE INVESTIGATE THE PERFORMANCE VARIATIONS IN DockER CONTAINERS WHEN CONTROL GROUPS (cgroups) ARE USED FOR RESOURCE limitation. FOR EASE OF PRESENTATION AND REPRODUCIBILITY, WE USE Cloud Evaluation Exper- IMENT Methodology (CEEM) TO CONDUCT OUR COMPREHENSIVE SET OF Experiments.
    [Show full text]
  • I.MX 8Quadxplus Power and Performance
    NXP Semiconductors Document Number: AN12338 Application Note Rev. 4 , 04/2020 i.MX 8QuadXPlus Power and Performance 1. Introduction Contents This application note helps you to design power 1. Introduction ........................................................................ 1 management systems. It illustrates the current drain 2. Overview of i.MX 8QuadXPlus voltage supplies .............. 1 3. Power measurement of the i.MX 8QuadXPlus processor ... 2 measurements of the i.MX 8QuadXPlus Applications 3.1. VCC_SCU_1V8 power ........................................... 4 Processors taken on NXP Multisensory Evaluation Kit 3.2. VCC_DDRIO power ............................................... 4 (MEK) Platform through several use cases. 3.3. VCC_CPU/VCC_GPU/VCC_MAIN power ........... 5 3.4. Temperature measurements .................................... 5 This document provides details on the performance and 3.5. Hardware and software used ................................... 6 3.6. Measuring points on the MEK platform .................. 6 power consumption of the i.MX 8QuadXPlus 4. Use cases and measurement results .................................... 6 processors under a variety of low- and high-power 4.1. Low-power mode power consumption (Key States modes. or ‘KS’)…… ......................................................................... 7 4.2. Complex use case power consumption (Arm Core, The data presented in this application note is based on GPU active) ......................................................................... 11 5. SOC
    [Show full text]
  • Software Performance Engineering Using Virtual Time Program Execution
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Spiral - Imperial College Digital Repository IMPERIAL COLLEGE LONDON DEPARTMENT OF COMPUTING Software Performance Engineering using Virtual Time Program Execution Nikolaos Baltas Submitted in part fullment of the requirements for the degree of Doctor of Philosophy in Computing of Imperial College London and the Diploma of Imperial College London To my mother Dimitra for her endless support The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives licence. Re- searchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work. 4 Abstract In this thesis we introduce a novel approach to software performance engineering that is based on the execution of code in virtual time. Virtual time execution models the timing-behaviour of unmodified applications by scaling observed method times or replacing them with results acquired from performance model simulation. This facilitates the investigation of \what-if" performance predictions of applications comprising an arbitrary combination of real code and performance models. The ability to analyse code and models in a single framework enables performance testing throughout the software lifecycle, without the need to to extract perfor- mance models from code. This is accomplished by forcing thread scheduling decisions to take into account the hypothetical time-scaling or model-based performance specifications of each method.
    [Show full text]
  • Connectcore® 8X Performance and Power Benchmarking Report
    ConnectCore® 8X Performance and Power Benchmarking Report Application Note Revision history—90002448 Revision Date Description A March 2021 Initial release. Trademarks and copyright Digi, Digi International, and the Digi logo are trademarks or registered trademarks in the United States and other countries worldwide. All other trademarks mentioned in this document are the property of their respective owners. © 2021 Digi International Inc. All rights reserved. Disclaimers Information in this document is subject to change without notice and does not represent a commitment on the part of Digi International. Digi provides this document “as is,” without warranty of any kind, expressed or implied, including, but not limited to, the implied warranties of fitness or merchantability for a particular purpose. Digi may make improvements and/or changes in this manual or in the product(s) and/or the program(s) described in this manual at any time. Feedback To provide feedback on this document, email your comments to [email protected] Include the document title and part number (ConnectCore® 8X Performance and Power, 90002448 A) in the subject line of your email. ConnectCore® 8X Performance and Power 2 Contents Introduction Power architecture Measurement conditions Hardware used 7 Software used 7 Digi Embedded Yocto 7 MCA firmware 7 Benchmark packages 8 Host requirements 8 General conditions 8 Location and environment 8 Instrumentation 9 SOM power measurements 9 How to calculate SOM power 9 Resistor swap 9 Console cable 9 Measure points 10 Formula 10
    [Show full text]