Eindhoven University of Technology MASTER Embedded Platform

Total Page:16

File Type:pdf, Size:1020Kb

Eindhoven University of Technology MASTER Embedded Platform Eindhoven University of Technology MASTER Embedded platform selection based on the Roofline model applied to video content analysis Spierings, M.G.M.; van de Voort, R.W.P.M. Award date: 2012 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Eindhoven University of Technology Department of Electrical Engineering Electronic Systems Group Master’s Thesis Embedded platform selection based on the Roofline model Applied to video content analysis By Martien Spierings & Rob van de Voort Supervisors: Henk Corporaal Cedric Nugteren Tom Goossens Nick de Koning Embedded platform selection based on the roofline model Science Park Eindhoven, December 8, 2011 Course : Graduation project Master : Embedded Systems Department : Electrical Engineering - Eindhoven University of Technology Chair : Electronic Systems - Eindhoven University of Technology Committee : Eindhoven University of Technology: Henk Corporaal Pieter Jonker Cedric Nugteren Prodrive B.V.: Tom Goossens Authors : Rob van de Voort & Martien Spierings Science Park Eindhoven 5501 5692 EM SON Copyright 2011 Prodrive B.V. All rights are reserved. No part of this book may be reproduced, stored in a retrieval system, or trans- mitted , in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission from the copyright owner. Prodrive B.V. Eindhoven University of Technology P.O. Box 28030 P.O. Box 513 Science Park 5501 Den Dolech 2 5692 EM Son 5612 AZ Eindhoven The Netherlands The Netherlands T +31 (0)40 2676 200 T +31 (0)40 2479 111 Ò +31 (0)40 2676 201 Ò +31 (0)40 2441 692 Í http://www.prodrive.nl Í http://www.tue.nl B [email protected] B [email protected] Abstract An Embedded platform is an information processing system, designed to perform a dedicated func- tion. These systems are embedded into many electronic products, e.g., watches, cars, ovens, and mobile phones. Embedded systems have become part of our daily life and their number, functionality and complexity increased over the years. For instance, in the past a phone only offered functions like calling and text messaging. Nowadays, these basic functions are extended with video calling, multimedia messaging, internet browsing and gaming. These functionalities of an embedded system are often controlled by software that executes on one or more processing units inside the embedded system. Selecting hardware is an essential step in the development of an embedded system, especially selecting the appropriate processing units. It is a key step to the success of an embedded system that the processing units are able to meet the required performance. Performance estimations can be done using a cycle accurate simulator or by running benchmarks, however running a simulation is time consuming and running benchmarks may not be possible, this causes a major problem when a combination of processing units needs to be selected in the early phase of a design. Therefore, we present a method for the selection of processing units based on documentation and a high- level application description. This allows for a fast selection which decreases the time to market for embedded systems. The method focusses on throughput oriented systems and extends the Roofline model to heteroge- neous platforms, in order to give an upper bound for the performance of an application on a platform while providing an insightful visualization of the attainable performance. A platform is the compo- sition of multiple processing units. This method enables the modelling of embedded applications on different processing units without requiring detailed knowledge of the application. It consist of decomposing applications into Generic Computational Blocks (GCBs) from which the number of op- erations and data transfers are extracted. This decomposition is then used to match the GCBs to the available hardware model and to select a set of processing units to form a computational platform for the application. The method is validated by using micro-benchmarks and an real-life application from the video content analysis domain. The theoretical estimates constructed with this method match the real-life performance of an processing unit for 95%. However the memory estimate is only accurate for 40% up to 80% due to the complex implementation of the memory controllers involved. i The method uses an extended version of the Roofline model. These extensions make the method suited for fast selection of processing units for an application. The guideline constructed from this method is usable by a designer to see different design trade-offs. ii Contents List of Figures iv List of Tables vi 1 Introduction 1 1.1 Related work . .2 1.2 Motivation . .6 1.3 Thesis outline . .7 2 Method for selecting an embedded platform 9 2.1 The Roofline model and its shortcomings . 10 2.2 Contributions to the Roofline Model . 14 2.3 Roofline estimation errors . 17 2.4 Determining a Roofline model . 19 2.5 Application utilisation Roofline . 23 2.6 Verifying a Roofline Model . 26 3 CPU Roofline model 29 3.1 Background . 29 3.2 Determining the theoretical Roofline . 33 3.3 Determining the practical Roofline . 36 3.4 Intel Atom E6xx . 39 3.5 Intel Xeon E5540 . 43 iii CONTENTS 3.6 Texas Instruments OMAP4430 . 46 3.7 Performance restrictions . 49 3.8 Conclusions . 51 4 DSP Roofline model 53 4.1 Background . 53 4.2 Determining the Roofline for the Omap 137 . 55 4.3 Additional properties . 57 4.4 Conclusion . 58 5 GPU Roofline model 59 5.1 Background . 59 5.2 Determining the theoretical Roofline . 63 5.3 Determining the practical Roofline . 63 5.4 Performance restrictions . 67 5.5 Conclusion . 70 6 FPGA Roofline model 71 6.1 FPGA basics . 71 6.2 A Roofline for the FPGA . 72 6.3 The FPGA computational Roofline . 76 6.4 Determining the practical Roofline . 80 6.5 Performance restrictions . 81 6.6 Conclusions . 81 7 The decomposition of VCA applications 83 7.1 VCA applications . 84 7.2 Describing Generic Computational Blocks . 84 7.3 Operations per byte of a GCB . 87 7.4 VCA use case . 89 iv 7.5 Multiple GCBs on one processing unit . 98 7.6 Pipelining GCBs on one processing unit . 100 7.7 Platform . 102 8 From application to platform, a guideline 107 8.1 Guideline prerequisites . 107 8.2 Processing unit selection guideline . 108 8.3 Example usage of the guideline . 110 9 Conclusions 117 9.1 Future work . 119 v List of Figures 1.1 Design space exploration . .3 1.2 Processor design space, image from [7] . .6 2.1 Research Approach . .9 2.2 Model assumed by the Roofline model . 11 2.3 Original Roofline model . 12 2.4 Differences between the original and the extended Roofline model . 13 2.5 Different types of operations result in points in the Roofline model . 15 2.6 Extra data sources in the Roofline model . 17 2.7 Extensions to the original Roofline . 18 2.8 Simplified processing unit model . 20 2.9 Utilisation Roofline for an application and processing unit . 26 2.10 Micro-benchmarking concept . 27 3.1 Basic components of a stored instruction digital computer . 30 3.2 Basic CPU concept withArithmetic Logic Unit (ALU), memory and CU) . 31 3.3 Modern CPUs contain multiple ALUs to process operands in parallel . 31 3.4 Intel Atom tunnel-creek micro-architecture instruction ports . 40 3.5 Intel E630 benchmark results for integer, single precision float and SIMD . 42 3.6 Intel E630 benchmark results for intrinsics, assembly and cache . 43 3.7 Intel Nehalem micro-architecture instruction ports . 44 vii LIST OF FIGURES 3.8 Intel Xeon 5540 benchmark results . 45 3.9 Intel Xeon 5540 benchmark results for intrinsics, assembly and cache measurements 46 3.10 Cortex-A9 instruction ports . 47 3.11 Texas Instruments OMAP4430 micro-benchmark results . 48 3.12 X86 ISA registers . 50 3.13 Register of the ARM Instruction Set Architecture (ISA) and extensions to it such as NEON and VFPv3 . 51 4.1 Early DSP architectures . 54 4.2 TI DSP C674x architecture . 56 4.3 Measured roof line of DSP . 57 4.4 Software pipelining . 58 5.1 NVIDIA Architectures . 60 5.2 AMD VLIW5 architecture . 61 5.3 Cuda and OpenCL programming model . 62 5.4 NVIDIA FX1700 ..
Recommended publications
  • Energy Efficient Spin-Locking in Multi-Core Machines
    Energy efficient spin-locking in multi-core machines Facoltà di INGEGNERIA DELL'INFORMAZIONE, INFORMATICA E STATISTICA Corso di laurea in INGEGNERIA INFORMATICA - ENGINEERING IN COMPUTER SCIENCE - LM Cattedra di Advanced Operating Systems and Virtualization Candidato Salvatore Rivieccio 1405255 Relatore Correlatore Francesco Quaglia Pierangelo Di Sanzo A/A 2015/2016 !1 0 - Abstract In this thesis I will show an implementation of spin-locks that works in an energy efficient fashion, exploiting the capability of last generation hardware and new software components in order to rise or reduce the CPU frequency when running spinlock operation. In particular this work consists in a linux kernel module and a user-space program that make possible to run with the lowest frequency admissible when a thread is spin-locking, waiting to enter a critical section. These changes are thread-grain, which means that only interested threads are affected whereas the system keeps running as usual. Standard libraries’ spinlocks do not provide energy efficiency support, those kind of optimizations are related to the application behaviors or to kernel-level solutions, like governors. !2 Table of Contents List of Figures pag List of Tables pag 0 - Abstract pag 3 1 - Energy-efficient Computing pag 4 1.1 - TDP and Turbo Mode pag 4 1.2 - RAPL pag 6 1.3 - Combined Components 2 - The Linux Architectures pag 7 2.1 - The Kernel 2.3 - System Libraries pag 8 2.3 - System Tools pag 9 3 - Into the Core 3.1 - The Ring Model 3.2 - Devices 4 - Low Frequency Spin-lock 4.1 - Spin-lock vs.
    [Show full text]
  • On the Performance of MPI-Openmp on a 12 Nodes Multi-Core Cluster
    On the Performance of MPI-OpenMP on a 12 nodes Multi-core Cluster Abdelgadir Tageldin Abdelgadir1, Al-Sakib Khan Pathan1∗ , Mohiuddin Ahmed2 1 Department of Computer Science, International Islamic University Malaysia, Gombak 53100, Kuala Lumpur, Malaysia 2 Department of Computer Network, Jazan University, Saudi Arabia [email protected] , [email protected] , [email protected] Abstract. With the increasing number of Quad-Core-based clusters and the introduction of compute nodes designed with large memory capacity shared by multiple cores, new problems related to scalability arise. In this paper, we analyze the overall performance of a cluster built with nodes having a dual Quad-Core Processor on each node. Some benchmark results are presented and some observations are mentioned when handling such processors on a benchmark test. A Quad-Core-based cluster's complexity arises from the fact that both local communication and network communications between the running processes need to be addressed. The potentials of an MPI-OpenMP approach are pinpointed because of its reduced communication overhead. At the end, we come to a conclusion that an MPI-OpenMP solution should be considered in such clusters since optimizing network communications between nodes is as important as optimizing local communications between processors in a multi-core cluster. Keywords: MPI-OpenMP, hybrid, Multi-Core, Cluster. 1 Introduction The integration of two or more processors within a single chip is an advanced technology for tackling the disadvantages exposed by a single core when it comes to increasing the speed, as more heat is generated and more power is consumed by those single cores.
    [Show full text]
  • Benchmarking-HOWTO.Pdf
    Linux Benchmarking HOWTO Linux Benchmarking HOWTO Table of Contents Linux Benchmarking HOWTO.........................................................................................................................1 by André D. Balsa, [email protected] ..............................................................................................1 1.Introduction ..........................................................................................................................................1 2.Benchmarking procedures and interpretation of results.......................................................................1 3.The Linux Benchmarking Toolkit (LBT).............................................................................................1 4.Example run and results........................................................................................................................2 5.Pitfalls and caveats of benchmarking ..................................................................................................2 6.FAQ .....................................................................................................................................................2 7.Copyright, acknowledgments and miscellaneous.................................................................................2 1.Introduction ..........................................................................................................................................2 1.1 Why is benchmarking so important ? ...............................................................................................3
    [Show full text]
  • Exploring Coremark™ – a Benchmark Maximizing Simplicity and Efficacy by Shay Gal-On and Markus Levy
    Exploring CoreMark™ – A Benchmark Maximizing Simplicity and Efficacy By Shay Gal-On and Markus Levy There have been many attempts to provide a single number that can totally quantify the ability of a CPU. Be it MHz, MOPS, MFLOPS - all are simple to derive but misleading when looking at actual performance potential. Dhrystone was the first attempt to tie a performance indicator, namely DMIPS, to execution of real code - a good attempt, which has long served the industry, but is no longer meaningful. BogoMIPS attempts to measure how fast a CPU can “do nothing”, for what that is worth. The need still exists for a simple and standardized benchmark that provides meaningful information about the CPU core - introducing CoreMark, available for free download from www.coremark.org. CoreMark ties a performance indicator to execution of simple code, but rather than being entirely arbitrary and synthetic, the code for the benchmark uses basic data structures and algorithms that are common in practically any application. Furthermore, EEMBC carefully chose the CoreMark implementation such that all computations are driven by run-time provided values to prevent code elimination during compile time optimization. CoreMark also sets specific rules about how to run the code and report results, thereby eliminating inconsistencies. CoreMark Composition To appreciate the value of CoreMark, it’s worthwhile to dissect its composition, which in general is comprised of lists, strings, and arrays (matrixes to be exact). Lists commonly exercise pointers and are also characterized by non-serial memory access patterns. In terms of testing the core of a CPU, list processing predominantly tests how fast data can be used to scan through the list.
    [Show full text]
  • Linux on IBM Z13:Performance Aspects of New Technology And
    Linux on IBM z13: Performance Aspects of New Technology and Features Mario Held ([email protected]) Linux on z Systems Performance Analyst IBM Corporation Session 17772 August 13, 2015 Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. BlueMix ECKD IBM* Maximo* Smarter Cities* WebSphere* z Systems BigInsights FICON* Ibm.com MQSeries* Smarter Analytics XIV* z/VSE* Cognos* FileNet* IBM (logo)* Performance Toolkit for VM SPSS* z13 z/VM* DB2* FlashSystem IMS POWER* Storwize* zEnterprise* DB2 Connect GDPS* Informix* Quickr* System Storage* z/OS* Domino* GPFS InfoSphere Rational* Tivoli* DS8000* Sametime* * Registered trademarks of IBM Corporation The following are trademarks or registered trademarks of other companies. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Windows Server and the Windows logo are trademarks of the Microsoft group of countries. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S.
    [Show full text]
  • A Multicore Computing Platform for Benchmarking
    A MULTICORE COMPUTING PLATFORM FOR BENCHMARKING DYNAMIC PARTIAL RECONFIGURATION BASED DESIGNS by DAVID A. THORNDIKE Submitted in partial fulfillment of the requirements For the degree of Master of Science Thesis Advisor: Dr. Christos A. Papachristou Department of Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERSITY August, 2012 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of David A. Thorndike candidate for the Master of Science degree *. (signed) Christos A.Papachristou (chair of the committee) Francis L. Merat Francis G. Wolff (date) June 1, 2012 *We also certify that written approval has been obtained for any proprietary material contained therein. Table of Contents List of Figures ................................................................................................................... iii List of Tables .................................................................................................................... iv Abstract .............................................................................................................................. v 1. Introduction .................................................................................................................. 1 1.1 Motivation .......................................................................................................... 1 1.2 Contributions ...................................................................................................... 2 1.3 Thesis Outline
    [Show full text]
  • NOEL-XCKU-EX Quick Start Guide
    NOEL-XCKU-EX Quick Start Guide. 2020 User's Manual The most important thing we build is trust NOEL-XCKU-EX Quick Start Guide NOEL-XCKU-EX-QSG 1 www.cobhamaes.com/gaisler June 2020, Version 1.1 Table of Contents 1. Introduction .......................................................................................................................... 4 1.1. Overview ................................................................................................................... 4 1.2. Availability ................................................................................................................ 4 1.3. Prerequisites ............................................................................................................... 4 1.4. References .................................................................................................................. 4 2. Overview .............................................................................................................................. 5 2.1. Boards ....................................................................................................................... 5 2.2. Design summary ......................................................................................................... 5 2.3. Processor features ........................................................................................................ 5 2.4. Software Development Environment ............................................................................... 5 2.4.1. RTEMS ..........................................................................................................
    [Show full text]
  • Scheduling on Clouds Considering Energy Consumption and Performance Trade-Offs : from Modelization to Industrial Applications Daniel Balouek-Thomert
    Scheduling on Clouds considering energy consumption and performance trade-offs : from modelization to industrial applications Daniel Balouek-Thomert To cite this version: Daniel Balouek-Thomert. Scheduling on Clouds considering energy consumption and performance trade-offs : from modelization to industrial applications. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Lyon, 2016. English. NNT : 2016LYSEN058. tel-01436822v2 HAL Id: tel-01436822 https://tel.archives-ouvertes.fr/tel-01436822v2 Submitted on 16 Jan 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Numéro National de Thèse : 2016LYSEN058 THESE de DOCTORAT DE L’UNIVERSITE DE LYON opérée par l’Ecole Normale Supérieure de Lyon Ecole Doctorale N° 512 en Informatique et Mathématiques de Lyon Spécialité de doctorat : Informatique Soutenue publiquement le 5 décembre 2016, par : Daniel BALOUEK-THOMERT Scheduling on Clouds considering energy consumption and performance trade-offs: from modelization to industrial applications Ordonnancement sur Clouds avec arbitrage entre la
    [Show full text]
  • A Comparison of Complete Global Optimization Solvers
    A comparison of complete global optimization solvers Arnold Neumaier Oleg Shcherbina Waltraud Huyer Institut furÄ Mathematik, UniversitÄat Wien Nordbergstr. 15, A-1090 Wien, Austria Tam¶as Vink¶o Research Group on Arti¯cial Intelligence of the Hungarian Academy of Sciences and University of Szeged, H-6720 Szeged, Aradi v¶ertanuk¶ tere 1., Hungary Please send correspondence to: [email protected] Abstract. Results are reported of testing a number of existing state of the art solvers for global constrained optimization and constraint satisfaction on a set of over 1000 test problems in up to 1000 variables. 1 Overview As the recent survey Neumaier [24] of complete solution techniques in global optimization documents, there are now about a dozen solvers for constrained global optimization that claim to solve global optimization and/or constraint satisfaction problems to global optimality by performing a complete search. Within the COCONUT project [30, 31], we evaluated many of the existing software packages for global optimization and constraint satisfaction prob- lems. This is the ¯rst time that di®erent constrained global optimization and constraint satisfaction algorithms are compared on a systematic basis and with a test set that allows to derive statistically signi¯cant conclusions. We tested the global solvers BARON, GlobSol, ICOS, LGO, LINGO, OQNLP, Premium Solver, the local solver MINOS, and a basic combination strategy COCOS implemented in the COCONUT platform. 1 The testing process turned out to be extremely time-consuming, due to var- ious reasons not initially anticipated. A lot of e®ort went into creating ap- propriate interfaces, making the comparison fair and reliable, and making it possible to process a large number of test examples in a semiautomatic fashion.
    [Show full text]
  • Measuring Performance
    Computer Architecture 10 Measuring Performance Made wi th OpenOffi ce.org 1 Performance Measurement FactorsFactors influencinginfluencing thethe performanceperformance ofof modernmodern computerscomputers pure microelectronic & CPU architecture issues I/O performance interprocessor communication cache coherence memory hierarchy Amdahl's law FieldField ofof computercomputer applicationapplication dedicated/specific tasks general-purpose Made wi th OpenOffi ce.org 2 MIPS MeasureMeasure ofof aa computer'scomputer's processorprocessor speedspeed inin (Millions)(Millions) InstructionsInstructions PerPer SecondSecond historically the oldest, straightforward and naïve rated to 1MIPS VAX 11/780 in the 70's (0.5MIPS) direct correlation with clock speed strong influence of instruction set CISC/RISC & Cache misunderstandings Meaningless Indication of Processor Speed Meaningless Information on Performance for Salespeople Made wi th OpenOffi ce.org 3 MIPS poor,poor, but...but... gives some elementary performance overview very good to compare the CPU's with: ● the same instruction set ● the same reference code ● built with the same compiler important for microcontrollers to estimate the resource-management abilities BogoMips ("bogus" MIPS) is a measurement of CPU speed made by the Linux kernel when it boots, to calibrate an internal busy-loop and cache effect Made wi th OpenOffi ce.org 4 MIPS – Examples ١٩٧٤ 640 kIPS at 2 MHz ٨٠٨٠ Intel ١٩٧٧ 500 kIPS ٧٨٠/١١ VAX ١٩٧٩ MHz ٨ MIPS at ١ ٦٨٠٠٠ Motorola ١٩٨٨ MHz ٢٥ MIPS at ٨.٥ ٣٨٦DX Intel ١٩٩٢ MHz ٦٦ MIPS
    [Show full text]
  • Linux All-In-One for Dummies CHAPTER 3: Commanding the Shell
    Linux ® ALL-IN-ONE Linux ® ALL-IN-ONE 6th Edition by Emmett Dulaney Linux® All-in-One For Dummies®, 6th Edition Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com Copyright © 2018 by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/ permissions. Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. Linux is a registered trademark of Linus Torvalds. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book. LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS.
    [Show full text]
  • Mini-HOWTO Bogomips
    Mini-HOWTO BogoMips Wim van Dorst, [email protected] v0, 8 f´evrier 1999 Ce texte donne des renseignements sur les BogoMips, tir´es de diverses sources comme les News ou de courriers ´electroniques personnels. Vous pouvez trouver l’original de ce mini-HOWTO dans les diff´erentesarchives Linux sous le r´epertoire .../HOWTO/mini/BogoMips . La traductions de ce document est a priori disponible sur les archives appropri´ees en japonais, italien, allemand, polonais et portugais. Un article explicatif complet intitul´e ’the Quintessential Linux Benchmark,’ a ´et´epubli´edans le num´erode janvier 1996 du Linux Journal(tm). De nouvelles entr´ees pour les processeurs non r´ef´erenc´esseraient grandement appr´eci´ees. Elles peuvent ˆetre envoy´ees par e-mail `al’auteur. Traduction du 04 mars 1999 par Antoine Levavasseur <[email protected]> Contents 1 Les scores Bogomips records des syst`emes mono-processeur 2 1.1 Le plus faible score BogoMips au boot de Linux .......................... 2 1.2 Le meilleur score BogoMips au boot d’un Linux mono-processeur ................ 2 1.3 Le meilleur score BogoMips au boot d’un Linux multi-processeur ................ 2 1.4 Le meilleur score BogoMips non-Linux ............................... 2 2 Que sont les BogoMips ? 3 3 Comment calculer quelle devrait ˆetreune vitesse en BogoMips ? 3 4 Comment savoir quelle est la vitesse en BogoMips ? 4 5 Fluctuation dans la mesure des BogoMips 5 6 BogoMips ... failed 5 7 A` propos des processeurs clones (Cyrix, NexGen, AMD, etc.) 6 8 Pourquoi se pr´eoccuper des BogoMips ? 6 9 Compilation de mesures 6 9.1 386, bizarrement ou mal configur´es ................................
    [Show full text]