Minimizing Lookup Rpcs in Lustre File System Using Metadata Delegation at Client Side

Total Page:16

File Type:pdf, Size:1020Kb

Minimizing Lookup Rpcs in Lustre File System Using Metadata Delegation at Client Side Minimizing Lookup RPCs in Lustre File System using Metadata Delegation at Client Side Vilobh Meshram, Xiangyong Ouyang and Dhabaleswar K. Panda Department of Computer Science and Engineering The Ohio State University {meshram, ouyangx, panda}@cse.ohio-state.edu Abstract 75% of all the file system calls access metadata [18]. Therefore, the efficient management of metadata is Lustre is a massively Parallel Distributed File Sys- crucial for the overall system performance. Along tem and its architecture scales well to large amount with this it will be very beneficial if we can minimize of data. However the performance of Lustre can be the time spent in communication between the inter- limited by the load of metadata operations at the acting Client and Server nodes in case of file system Metadata Server (MDS). Because of the higher ca- calls which access metadata. pacity of parallel file systems, they are often used to Lustre is a POSIX compliant, open-source dis- store and access millions of small files. These small tributed parallel filesystem.Due to the extremely scal- files may create a meta-data bottleneck, especially able architecture of the Lustre filesystem, Lustre de- for file systems that have only a single active meta- ployments are popular in scientific supercomputing, data server. Also,in case of Lustre installations with as well as in the oil and gas, manufacturing, rich me- Single Metadata Server or with Clustered Metadata dia, and finance sectors. Lustre presents a POSIX Server the time spent in path traversal from each interface to its clients with parallel access capabili- client involves multiple LOOKUP Remote Procedure ties to the shared file objects. As of this writing, 18 Call (RPC) for path traversal before receiving the ac- of the top 30 fastest supercomputers in the world use tual metadata related information and the extended Lustre filesystem for high-performance scratch space. attributes adding additional pressure on the MDS. Lustre is an object-based filesystem. It is composed In this paper we propose an approach to minimize of three components: Metadata servers (MDSs), ob- the number of Lookup RPC calls to MDS. We have ject storage servers (OSSs), and clients. Figure 1 il- designed a new scheme called Metadata Delegation lustrates the Lustre architecture. Lustre uses block at Client Side (MDCS), to delegate part of metadata devices for file data and metadata storage and each and the extended attributes for a file to the client so block device can be managed by only one Lustre ser- that we can load balance the traffic at the MDS by vice. The total data capacity of the Lustre filesystem delegating some useful information to the client. Ini- is the sum of all individual OST capacities. Lustre tial experimental results show that we can achieve up clients access and concurrently use data through the to 45% improvement in metadata operations through- standard POSIX I/O system calls. put for system calls such as file open. MDCS also MDS provides metadata services. Correspond- reduces the latency to open a Lustre file by 33%. ingly, an MDC (metadata client) is a client of those services. One MDS per filesystem manages one meta- 1 Introduction data target (MDT). Each MDT stores file metadata, such as file names, directory structures, and access Modern distributed file systems architectures like permissions. OSS (object storage server) exposes Lustre [5, 3], PVFS [7], Google File System [13] or the block devices and serves data. Correspondingly, OSC object based storage file systems [12, 16] separate the (object storage client) is client of the services. Each management of metadata from the storage of the ac- OSS manages one or more object storage targets tual file data. These architecture have proven to eas- (OSTs), and OSTs store file data objects. ily scale the storage capacity and bandwidth. How- In our studies with Lustre filesystem, we noticed a ever the management of metadata remains a bottle- potential performance hit in the metadata operation neck [10, 19]. Studies has shown that approximately part caused by repeated LOOKUP RPC calls from 1 Lustre clients to the MDS. Since path LOOKUP is • We have implemented this MDCS mechanism one of the most often seen metadata operations in into Lustre filesystem. Initial experiments show a typical filesystem workload, this issue may result up to 45% improvement in terms of metadata in performance degradation in Lustre MDS perfor- operation throughput over the basic Lustre de- mance. sign. The rest of paper is organized as follows. In sec- LDAP Server tion 2, we describe the background of Metadata op- eration in Lustre File System and some well known configuration information, problems in metadata access from MDS. In section 3, network connection details we explain how the path name lookup is performed & security management in Lustre and also explain the approach used by us to minimize the number of RPC’s by making use Clients of Design presented in Section 4. In section 4, we present our detailed designs and discuss our design directory operations, file I/O choices. In section 5, we conduct experiments to eval- meta-data & concurrency & locking uate our designs and present results that indicate im- provement. In section 6, we discuss the related work. Finally we provide our conclusion and state the direc- tion of the research we intend to conduct in future. Meta-Data Server Object Storage Targets (MDS) (OST) 2 Background frecovery, file status & file creation 2.1 Lustre Metadata Server and its components Figure 1: Basic Lustre Design Lustre Metadata Server (MDS) [5, 8, 3] is the criti- cal component of the Lustre File System. Lustre File In this paper we propose a new approach called System has an important component known as Lus- “Metadata Delegation at Client Side” (MDCS) in tre Distributed Lock Manager (LDLM) which ensures this paper. With this new strategy, we are able to cache metadata integrity i.e. atomic access to meta- minimize the number of LOOKUP RPC [9] calls and data. In Lustre we also have a LDLM component hence the request traffic on MDS. We delegate meta- at the OSS/OST [5, 8, 3] but since in this paper we data related attributes to the special clients which we focus more on the Metadata operation we will con- call “Delegator Client”, which are the client nodes sider LDLM which is present at the MDS. In case of that open/create files initially. In case of increased a Single Metadata Server deployment a single MDS workload at the MDS, we distribute the load by as- manages the entire namespace so when a client wants signing the ownership to different clients depending to lookup or create a name in that namespace its the upon the client which at first created or opened the sole owner of the entire namespace. Whereas in case file. So in case of an environment where a file is sub- of a Clustered Metadata Server design each directory sequently accessed by many clients at regular inter- can be striped over multiple metadata servers, each vals, we can distribute the workload from MDS to of which contains a disjoint portion of the namespace. the Delegator Client and the Normal Client who are So when a client wants to lookup or create a name subsequently accessing the files will fetch the meta- in that namespace, it uses a hashing algorithm to de- data related attributes from the Delegator Client and termine which metadata server holds the information hence for that name. In a summary, our contributions in this paper are: A single MDS server can be a bottleneck. Also the number of RPC calls made in the path name lookup • We have identified and revealed a potential per- can contribute a significant portion in Lustre File Sys- formance hit in Lustre MDS that is caused by tem performance. In case of an environment where repeated LOOKUP RPC calls to MDS. many Clients are performing interleaving access and I/O on the same file the time spent in path name • We have proposed a new approach (MDCS) to lookup can be huge. We have addressed this problem address the potential performance issue. in this paper. 2 Number of Transactions Number of Number of Transactions transactions per second files in pool transactions per second 1,000 333 1,000 1,000 333 5,000 313 5,000 5,000 116 10,000 325 10,000 10,000 94 20,000 321 20,000 20,000 79 Table 1: Transaction throughput with a fixed Table 2: Transaction throughput with varying file pool size of 1000 files file pool 2.2 Single Lustre Metadata Server 3 RPC Mechanism in Lustre (MDS) Bottlenecks Filesystem The MDS is currently restricted to a single node, with In the following section we discuss about the RPC a fail-over MDS that becomes operational if the pri- mechanism in Lustre Filesystem. Our experiments mary server becomes nonfunctional. Only one MDS show that nearly 1500-2000 usecs are spent in RPC is ever operational at a given time. This limitation on a TCP transport. poses a potential bottleneck as the number of clients and/or files increase. IOZone [1] is used to measure 3.1 Existing RPC Processing with the sequential file IO throughput, and PostMark [6] Lustre is used to measure the scalability of the MDS per- formance. Since MDS performance is the primary When we consider the RPC processing in Lustre we concern of this research, we discuss the PostMark ex- also talk about the how lock processing works in Lus- periment with more details. PostMark is a file system tre [5, 8, 3, 19] and how our modifications can benefit benchmark that performs a lot of metadata intensive to minimize the number of LOOKUP RPC.
Recommended publications
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Performance Evaluation of Vmware and Virtualbox
    2012 IACSIT Hong Kong Conferences IPCSIT vol. 29 (2012) © (2012) IACSIT Press, Singapore Performance Evaluation of VMware and VirtualBox Deepak K Damodaran1+, Biju R Mohan2, Vasudevan M S 3 and Dinesh Naik 4 Information Technology, National Institute of Technology Karnataka, India Abstract. Virtualization is a framework of dividing the resources of a computer into multiple execution environments. More specific it is a layer of software that provides the illusion of a real machine to multiple instances of virtual machines. Virtualization offers a lot of benefits including flexibility, security, ease to configuration and management, reduction of cost and so forth, but at the same time it also brings a certain degree of performance overhead. Furthermore, Virtual Machine Monitor (VMM) is the core component of virtual machine (VM) system and its effectiveness greatly impacts the performance of whole system. In this paper, we measure and analyze the performance of two virtual machine monitors VMware and VirtualBox using LMbench and IOzone, and provide a quantitative and qualitative comparison of both virtual machine monitors. Keywords: Virtualization, Virtual machine monitors, Performance. 1. Introduction Years ago, a problem aroused. How to run multiple operating systems on the same machine at the same time. The solution to this problem was virtual machines. Virtual machine monitor (VMM) the core part of virtual machines sits between one or more operating systems and the hardware and gives the illusion to each running operating system that it controls the machine. Behind the scenes, however the VMM actually is in control of the hardware, and must multiplex running operating systems across the physical resources of the machine.
    [Show full text]
  • INTERNATIONAL JOURNAL of ENGINEERING SCIENCES & RESEARCH TECHNOLOGY a Media Player Based on ARM by Porting of Linux Archan Agrawal*, Mrs
    [Agrawal, 3(12): December, 2014] ISSN: 2277-9655 Scientific Journal Impact Factor: 3.449 (ISRA), Impact Factor: 2.114 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A Media Player Based On ARM by Porting of Linux Archan Agrawal*, Mrs. Monika S. Joshi Department of Electronics & Comm. Engg., Marathwada Institute of Technology, Aurangabad, India Abstract This paper describes the porting of embedded linux on ARM 9 platform for designing and implementing of embedded media player on S3Cmini2440 development board. A novel transplating method for linux kernel is presented here, Linux kernel as well as its cut,compile,and porting process under ARM platform are introduced. Optimized linux operating system in the processor has been installed and transplanted the SDL_FFMPEG library into S3Cmini2440 after the cross compilation. Ofthis whole system come together in playing the audio/video & picture formats files smoothly & effectively. Keywords: Embedded Linux, ARM9,Porting, S3Cmini2440, Compilation,.Embedded Media Player. Introduction compatible to any of the hardware with the With the wide application of embedded systems in architecture specific changes into it, Linux has consumer electronics, industrial control, aerospace, become popular making the embedded system market automotive electronics, health care, network more competitive. communications and other fields ,embedded system include executable programs and scripts. An has been familiar to people all walks of life, operating system provides applications with a embedded systems have been into people's lives, it is platform where they can run, managing their access changing people's production and lifestyles in a to the CPU and system memory. The user's operation variety of forms.
    [Show full text]
  • Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times²§
    Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times²§ Rafael H. Saavedra³ Alan Jay Smith³³ ABSTRACT In previous research, we have developed and presented a model for measuring machines and analyzing programs, and for accurately predicting the running time of any analyzed program on any measured machine. That work is extended here by: (a) developing a high level program to measure the design and performance of the cache and TLB for any machine; (b) using those measurements, along with published miss ratio data, to improve the accuracy of our run time predictions; (c) using our analysis tools and measurements to study and compare the design of several machines, with par- ticular reference to their cache and TLB performance. As part of this work, we describe the design and performance of the cache and TLB for ten machines. The work presented in this paper extends a powerful technique for the evaluation and analysis of both computer systems and their workloads; this methodology is valuable both to com- puter users and computer system designers. 1. Introduction The performance of a computer system is a function of the speed of the individual functional units, such as the integer, branch, and floating-point units, caches, bus, memory system, and I/O units, and of the workload presented to the system. In our previous research [Saav89, 90, 92b, 92c], described below, we have measured the performance of the parts of the CPU on corresponding portions of various work- loads, but this work has not explicitly considered the behavior and performance of the cache memory.
    [Show full text]
  • What Every Programmer Should Know About Memory
    What Every Programmer Should Know About Memory Ulrich Drepper Red Hat, Inc. [email protected] November 21, 2007 Abstract As CPU cores become both faster and more numerous, the limiting factor for most programs is now, and will be for some time, memory access. Hardware designers have come up with ever more sophisticated memory handling and acceleration techniques–such as CPU caches–but these cannot work optimally without some help from the programmer. Unfortunately, neither the structure nor the cost of using the memory subsystem of a computer or the caches on CPUs is well understood by most programmers. This paper explains the structure of memory subsys- tems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them. 1 Introduction day these changes mainly come in the following forms: In the early days computers were much simpler. The var- • RAM hardware design (speed and parallelism). ious components of a system, such as the CPU, memory, mass storage, and network interfaces, were developed to- • Memory controller designs. gether and, as a result, were quite balanced in their per- • CPU caches. formance. For example, the memory and network inter- faces were not (much) faster than the CPU at providing • Direct memory access (DMA) for devices. data. This situation changed once the basic structure of com- For the most part, this document will deal with CPU puters stabilized and hardware developers concentrated caches and some effects of memory controller design. on optimizing individual subsystems. Suddenly the per- In the process of exploring these topics, we will explore formance of some components of the computer fell sig- DMA and bring it into the larger picture.
    [Show full text]
  • Stream Processing Systems Benchmark: Streambench
    Aalto University School of Science Degree Programme in Computer Science and Engineering Yangjun Wang Stream Processing Systems Benchmark: StreamBench Master's Thesis Espoo, May 26, 2016 Supervisors: Assoc. Prof. Aristides Gionis Advisor: D.Sc. Gianmarco De Francisci Morales Aalto University School of Science ABSTRACT OF Degree Programme in Computer Science and Engineering MASTER'S THESIS Author: Yangjun Wang Title: Stream Processing Systems Benchmark: StreamBench Date: May 26, 2016 Pages: 59 Major: Foundations of Advanced Computing Code: SCI3014 Supervisors: Assoc. Prof. Aristides Gionis Advisor: D.Sc. Gianmarco De Francisci Morales Batch processing technologies (Such as MapReduce, Hive, Pig) have matured and been widely used in the industry. These systems solved the issue processing big volumes of data successfully. However, first big amount of data need to be collected and stored in a database or file system. That is very time-consuming. Then it takes time to finish batch processing analysis jobs before get any results. While there are many cases that need analysed results from unbounded sequence of data in seconds or sub-seconds. To satisfy the increasing demand of processing such streaming data, several streaming processing systems are implemented and widely adopted, such as Apache Storm, Apache Spark, IBM InfoSphere Streams, and Apache Flink. They all support online stream processing, high scalability, and tasks monitoring. While how to evaluate stream processing systems before choosing one in production development is an open question. In this thesis, we introduce StreamBench, a benchmark framework to facilitate performance comparisons of stream processing systems. A common API compo- nent and a core set of workloads are defined.
    [Show full text]
  • Digital Technical Journal, Volume 4, Number 3: NVAX-Microprocessor VAX Systems
    Editorial Jane C. Blake, Editor Kathleen M. Stetson, Associate Editor Helen L. Patterson, Associate Editor Circulation Catherine M. Phillips, Administrator Sherry L. Gonzalez Production Terri Autieri, Production Editor Anne S. Katzeff, ppographer Peter R. Woodbury, Illustrator Advisory Board Samuel H. Fuller, Chairman Richard W Beane Richard J. Hollingsworth Alan G. Nemeth Victor A. Vyssotsky Gayn B. Winters The Digital TechnicalJournal is published quarterly by Digital Equipment Corporation, 146 Main Street MLO 1-3/B68,Maynard, Massachusetts 01754-2571. Subscriptions to the Journal are $40.00 for four issues and must be prepaid in U.S. funds. University and college professors and Ph.D. students in the electrical engineering and computer science fields receive complimentary subscriptions upon request. Orders, inquiries, and address changes should be sent to the Digital TechnicalJournal at the published-by address. Inquiries can also be sent electronically to [email protected]. Single copies and back issues are available for $16.00 each from Digital Press of Digital Equipment Corporation, 1 Burlington Woods Drive, Burlington, MA 01830-4597 Digital employees may send subscription orders on the ENET to RDVAX::JOURNAL or by interoffice mail to mailstop ML01-3/B68. Orders should include badge number, site location code, and address. All employees must advise of changes of address. Comments on the content of any paper are welcomed and may be sent to the editor at the published-by or network address. Copyright O 1992 Digital Equipment Corporation. Copying without fee is permitted provided that such copies are made for use in educational institutions by faculty members and are not distributed for commercial advantage.
    [Show full text]
  • COMP520-12C Final Report Nomadfs a Block
    COMP520-12C Final Report NomadFS A block migrating distributed file system Samuel Weston This report is in partial fulfilment of the requirements for the degree of Bachelor of Computing and Mathematical Sciences with Honours (BCMS(Hons)) at The University of Waikato. ©2012 Samuel Weston Abstract A distributed file system is a file system that is spread across multiple ma- chines. This report describes the block-based distributed file system NomadFS. NomadFS is designed for small scale distributed settings, such as those that exist in computer laboratories and cluster computers. It implements features, such as caching and block migration, which are aimed at improving the perfor- mance of shared data in such a setting. This report includes a discussion of the design and implementation of No- madFS, including relevant background. It also includes performance measure- ments, such as scalability. 2 Acknowledgements I would like to thank all the friendly members of the WAND network research group. This especially includes my supervisor Tony McGregor who has provided me with a massive amount of help over the year. Thanks! On a personal level I have enjoyed developing NomadFS and have learnt a great deal as a consequence of this development. This learning includes improv- ing my C programming ability, both in user space and kernel space (initially NomadFS was planned to be developed as a kernel space file system). I have also learnt a large amount about file systems and operating systems in general. 3 nomad /'n@Umæd/ noun member of tribe roaming from place to place for pasture; 4 Contents 1 Introduction 11 2 Background 13 2.1 A file system overview .
    [Show full text]
  • Performance and Cost Analysis Between On-Demand and Preemptive Virtual Machines
    Performance and Cost Analysis Between On-Demand and Preemptive Virtual Machines Breno G. S. Costa, Marco Antonio Sousa Reis, Aleteia´ P. F. Araujo´ and Priscila Solis Department of Computer Science, University of Brasilia, Bras´ılia/DF, Brazil Keywords: Cloud Computing, Transient Server, Performance Analysis, Preemptive Machines. Abstract: A few years ago, Amazon Web Services introduced spot instances, transient servers that can be contracted at a significant discount over regular price, but whose availability depends on cloud provider criteria and the instance can be revoked at any time. Google Cloud Platform offers preemptive instances, transient servers that have similar behavior and discount level to spot instances. Both providers advertise that their transient servers have the same performance level as servers contracted on-demand. Even with the possibility of revocation at the provider’s discretion, some applications can benefit from the low prices charged by these servers. But the measured performance of both models, transient and on-demand, must be similar, and the applications must survive occasional or mass server revoking. This work compares the performance and costs of transient and on-demand servers from both providers. Results show there is no significant difference in performance measured, but there is real cost advantage using transient servers. On Amazon Web Services a MapReduce cluster composed of transient servers achieved a 68% discount when compared to the same cluster based on on-demand servers. On Google Cloud Platform, the discount achieved was 26% but it can be bigger when the clusters are larger. 1 INTRODUCTION form (GCP) transient servers, named preemptive in- stances (Google, 2017), have a fixed discount of about Cloud providers have introduced a new class of 80%, a maximum lifetime of 24 hours (with the possi- servers, called transient servers, which they can uni- bility of preemption within lifetime) and with an alert laterally revoke at any time (Singh et al., 2014).
    [Show full text]
  • Performance Analysis of Scientific Computing Workloads on Trusted
    Performance Analysis of Scientific Computing Workloads on Trusted Execution Environments Ayaz Akram Anna Giannakou Venkatesh Akella Jason Lowe-Power Sean Peisert UC Davis LBNL UC Davis UC Davis LBNL & UC Davis [email protected] [email protected] [email protected] [email protected] [email protected] Abstract—Scientific computing sometimes involves computa- Physical memory Zone of trust tion on sensitive data. Depending on the data and the execution Encrypted Unsecured Data Open provider data environment, the HPC (high-performance computing) user or data data data provider may require confidentiality and/or integrity guar- Runtime Operating system Untrusted antees. To study the applicability of hardware-based trusted exe- or OS or hypervisor Trusted cution environments (TEEs) to enable secure scientific computing, Secure Normal Normal Data Other app app app scientist users we deeply analyze the performance impact of AMD SEV and Intel Untrusted Compute provider SGX for diverse HPC benchmarks including traditional scientific TEE computing, machine learning, graph analytics, and emerging Fig. 1: Trusted execution environments and threat model scientific computing workloads. We observe three main findings: 1) SEV requires careful memory placement on large scale NUMA machines (1×–3.4× slowdown without and 1×–1.15× slowdown with NUMA aware placement), 2) virtualization—a prerequisite the world, trust required by data providers of the compute for SEV—results in performance degradation for workloads with facility, and liability to an organization for hosting such data irregular memory accesses and large working sets (1×–4× slow- are both very high. Moreover, traditionally, the environments down compared to native execution for graph applications) and 3) created to protect sensitive data have significant usability SGX is inappropriate for HPC given its limited secure memory size and inflexible programming model (1.2×–126× slowdown challenges.
    [Show full text]
  • Performance and Evaluation of Lisp Systems
    Performance and Evaluation of Lisp Systems Richard P. Gabriel Originally Published by The MIT Press in their Computer Systems Series (ISBN 0-262-07093-6) Preface The distance is commonly very great between actual performances and speculative possibility, It is natural to suppose that as much as has been done today may be done tomorrow: but on the morrow some difficulty emerges, or some external impediment obstructs. Indolence, interrup- tion, business, and pleasure, all take their turns of retardation; and every long work is lengthened by a thousand causes that can, and ten thousand that cannot, be recounted. Perhaps no extensive and multifarious per- formance was ever effected within the term originally fixed in the under- taker’s mind. He that runs against Time has an antagonist not subject to casualties. Samuel Johnson (Gibbon’s Miscellaneous Works) When I ran across this quote, I was at first jubilant to have found something profound about performance written by Samuel Johnson which I could use as a centerpiece for the preface to this book. But as I read I saw that he was talking much too specifically about human performance to be an appropriate general statement about performance—a statement that could be applied to the performance of a computer program. It took me a few days to see that the point Johnson made addressed the very center of what should be learned about the performance of Lisp systems by anyone who cares to study the material I’ve presented in this book. That point is that people work very hard to attain every microsecond
    [Show full text]
  • The Next 700 CPU Power Models Maxime Colmant, Romain Rouvoy, Mascha Kurpicz, Anita Sobe, Pascal Felber, Lionel Seinturier
    The Next 700 CPU Power Models Maxime Colmant, Romain Rouvoy, Mascha Kurpicz, Anita Sobe, Pascal Felber, Lionel Seinturier To cite this version: Maxime Colmant, Romain Rouvoy, Mascha Kurpicz, Anita Sobe, Pascal Felber, et al.. The Next 700 CPU Power Models. Journal of Systems and Software, Elsevier, In press, 144, pp.382-396. 10.1016/j.jss.2018.07.001. hal-01827132v2 HAL Id: hal-01827132 https://hal.inria.fr/hal-01827132v2 Submitted on 13 Mar 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Next 700 CPU Power Models Maxime Colmanta,b, Romain Rouvoya,c, Mascha Kurpiczd, Anita Sobed, Pascal Felberd, Lionel Seinturiera aUniversity of Lille / Inria, France bADEME, France cInstitut Universitaire de France (IUF) dUniversity of Neuchâtel, Switzerland Abstract Software power estimation of CPUs is a central concern for energy efficiency and resource management in data centers. Over the last few years, a dozen of ad hoc power models have been proposed to cope with the wide diversity and the growing complexity of modern CPU architectures. However, most of these CPU power models rely on a thorough expertise of the targeted architectures, thus leading to the design of hardware-specific solutions that can hardly be ported beyond the initial settings.
    [Show full text]