Mips, Mops, and Mflops

Total Page:16

File Type:pdf, Size:1020Kb

Mips, Mops, and Mflops 10.5 PREHISTORIC PERFORMANCE RATINGS: MIPS, MOPS, AND MFLOPS Lerd Kelvin might have been predicting precessnr perfermanee measurements when he said, “When yen can measure what yen are speaking abeut, and express it in numbers, yen knew semething abeut it; but when yen eannet measure it, when yen cannet express it in numbers, ynur knnwledge is efa meager and unsatisfactnry kind. It may,r be the beginning ef knnwledge. but gnu have scarcely, in yeur theughts, advanced “tn the stage ef science." [3] The need tn rate precesser periermance is an great that, at first, micrepreeesser "renders grabbed any and all numbers at hand in rate perfermance. The first figures ef merit used were likelyr eleck rate and memerr bandwidth. These prehisterie ratings measured precesser perfer- manee dieereed ef any cenneetien te running cede. Censequently, these perfermanee ratings are net benchmarks. Just as an engine's revelutiens per minute (1113M) reading is net sufficient te measure vehicle perfermanee [yen need engine terque plus transmissien gearing, differential gearing, and tire diameter te eempute speed), the prehisteric, eieekrrelated precesser ratings el" MIPS, MOPS, MFLOPS, and VAX MIPS [defined belew) tell yen almest nething abeut a preces- ser's true perfermanee petential. Befere it was the name ef a micreprecesser architecture, the term "MIPS" was an acrnnym fer mitiiens efinstrnetiens per serene. If all precessers had the same instructinnfiset architecture {ISA} and used the same cempiler te generate cede. then a MIPS rating might pessibly be used as a perfermance measure. Hewever, all precessers de net have the same ISA and they mnst defi- nitely de net use the same cen'ipiler, se titer;F are net equally efficient when it cemes te task esecu- tien speed versus clack rate. in fact, micreprecessers and precesser ceres have very different lSAs and cempilers fer these precessers are differently abled when generating cede. Censequently, seme preeessers can de mere with ene instructien than ether preeessers, just as large autemebile engines can de mere than smaller enes running at the same RPM. This preblem was already had in the days when enly cemplex~instructitin—set cemputer (CISC) precessers reamed the earth. The preblem went frem bad te werse when reducedkinstructienmset eemputer (RISC) preeessers arrived en the scene. One CISC instructien weuld eften tie the werk at several RISC instructiens (by design} an that a CISC micre precesser's MIPS rating did net cer- relate at all with a RlSC micreprecesser's MIPS rating because ef the werk differential between RISC’s simple instructiens and CISC’s mere cemplex instructiens. The nest prehisteric step in creating a usable precesser perferrnance rating was te switch frem MIPS te VAX MIPS. which was acce-rnplished by setting the entremel}r successful VAX 11i78i} minicemputer—intreduced in, 19?? by the new defunct Digital Equipment Cerperatien {DECl—as the benchmark against which all ether precessers are measured. Se, if a micreprecesr ser executed a set ef pregrarns twice as fast as a VAX HE’SU. it was said te be rated at 2 VAX MIPS. The eriginal term “MIPS" then became "native MIPS.” se as net te cenfuse the eriginal ratings with VAX MIPS. DEC referred te VAX MIPS as VAX units ef perfermance [VUPsl just te keep things interesting er cenfusing depending en seer peint ef view. Beth native and VAX MIPS are weefully inadequate measures ef precesser perfermance because they were usually previded witheut specifying the seftware (er even the pregram- ming language) used te create the rating. Because different pregrams have diflerent instrucw tien mines. different memer},r usage patterns, and different data-mevement patterns. the same precesser ceuld easily earn ene MIPS rating en ene set ef pregrams and quite a different rating en anether set. Because MIPS ratings are net linked in a specific benchmark pregram suite. the MIPS acrenrm new stands fer meaningless indieatien efperferrnanee fer these in the knew. A farther prehiern with the VAX MIPS measure efpreeesser pederrnanee is that the eeneept efnsing a VAX iix’730 rninieernpnter as the geld perfermance standard is an idea that is mere than a hit ieng in the teeth in the twenty-first century. There are an [eager many {er any) VAX iii/’780s avaitahfe fer running benchmark cede and DEC afiectivefy disappeared when Cempaq Cernpn ter Cerp. purchased what was ftft ef the eernpa ny in 1a nnary i 995'. feitewing the deetine ef the rninieernpnter market. Hewlett-Packard ahserhed Cenipaa in May 2002. reenterging DEC's identity even farther. VAX MIPS is new the precesser eqaivatent efafirrfengs-per-fertnight speed rating—weefntiy eatdated. Even mere tenueus than the MIPS perfermance rating is the cencept ef MOPS, an acrenym that stands fer "milliens ef eperatiens per secend." Every algorithmic task requires the cemple- tien ef a certain number ef fundamental eperatiens. which may er triager net have a ene-te-ene cerrespendence with machine instructiens. Ceunt these fundamental eperatiens in the milliens and thew,r beceme MOPS. if the}r are ileating-peint eperatiens. yeu get MFLOPS. One theusand MFLOPS equals ene GFLOPS. The MOPS. MFLOPS. and GFLOPS ratings suffer frem the same drawback as the MIPS rating: there is ne standard seftware te serve as the benchmark that preduces the ratings. In additien. the eenversien faster fer cementing hew man}r eperatiens a precesser perferms per cleck {er hew many precesser instructiens censtitute ene eperatien) is semewhat fluid as well. which means that the precesser vender is free te develep a cenversien facter en its ew n. Censequently, MOPS and MELOPS perferrnance ratings exist fer varieus pre- cesser ceres but they really de net help an IC design team pick a precesser cer-e because they are net true benchmarks. 10.6 CLASSIC. PROCESSOR BENCHMARKS (THE STONE AGE) Like 155s. standardized precesser perfermance benchmarks predate the 19Tl intreductien ef Intel's 4004 micreprecesser, but just barely. The first benchmark suite te attain de facte standard status was a set ef pregrams knewn as the Livermere Kernels (alse pepularl},r called the Livermere Leeps). 10.6.1 LIVERMORE FORTRAN KERNELSILIUERMORE LOOPS BENCHMARK The Livermere Kernels were first develeped in 1er and censist ef14 numerically intensive appli- catien kernels written in FORTRAN. Ten mere kernels were added during the earl}: 19805 and the final suite ef benchmarks was discussed in a paper published in 1986 by F. H. McMahen ef the Lawrence Livermere Natienal l..aberater}lr [LLNLL lecated in Livermere, Ch. [4]. '[he Livermere Kernels actually constitute a supercomputer benchmark, measuring a processor’s floating-point computational performance in terms of MFLOPS (millions of floatingepoint operations per sec» ond). Because of the somewhat frequent occurrence of floating-point errors in many computers, the Livermor-e Kernels test both the processor’s speed and the system's computational accu- racy. Today, the Livermore Kernels are called the Livermore FORTRAN Kernels (LFK) or the Livermore Loops. The Livermore Loops are real samples of floatingepoint computation taken from a diverse workload of scientific applications extracted from operational program code used at LLNL- The kernels were extracted from programs in use at LLNL because those programs were generally far too large to serve as useful benchmark programs; they included hardware-s specific subroutines for performing functions such as NO, memory management, and graphics that were not appropriate for benchmark testing of floating—point performance; and they were largely classified. due to the nature of the work done at LLN L. Some kernels represent widely used, generic computations such as dot and matrix (SAXPY) products, polynomials, first sum and differences, firsteorder recurs rences. matrix solvers, and array searches. Some kernels typify often—used FORTRAN constructs while others contain constructs that are difficult to compile into efficient machine code. These kernels were selected to represent both the best and worst cases of common FORTRAN pro»- gramming practice to produce results that measure a realistic floating-point performance range- by challenging the FORTRAN compiler's ability to produce optimized machine code. Table 10.1 lists the 24 Livermore Loops. A complete LFK run produces 72 timed results. produced by timing the execution of the 24 LFK kernels three times using difi'erent DO-Ioop lengths. The LFK kernels are a mixture of vectorizable and nonvectorizable loops and test the compu- tational capabilities of the processor hardware and the software tools’ ability to generate efficient machine code. The Livermore Loops also tests a processor’s vector abilities and the associated software tools’ abilities to vectorixe code. TABLE 140.1 Twenty-Four Kernels in the Livermore Loops LFK Kernel Number Kernel Description Kernel 1 An excerpt from a hydrodynamic application Kernel 2 An excerpt from an incomplete Cholesky-Conjugate |[Eiradient program Kernel 3 [he standard inner~product function from linear algebra Kernel 4 An excerpt from a banded linear equation routine Kernel 5 An excerpt from a tridlagonal elimination routine Kernel a An example of a general linear recurrence equation Kernel i' Equation of state code fragment {as used in nuclear weapons researc h} Kernel 8 An excerpt of an alternating direction, implicit integration program Kernel 9 An integrate predictor program Kernel to A difference predictor program Kernel l l A first sum Kernel l? A first difference Kernel 13 An excerpt from a 2D particle—in-cell
Recommended publications
  • EEMBC and the Purposes of Embedded Processor Benchmarking Markus Levy, President
    EEMBC and the Purposes of Embedded Processor Benchmarking Markus Levy, President ISPASS 2005 Certified Performance Analysis for Embedded Systems Designers EEMBC: A Historical Perspective • Began as an EDN Magazine project in April 1997 • Replace Dhrystone • Have meaningful measure for explaining processor behavior • Developed business model • Invited worldwide processor vendors • A consortium was born 1 EEMBC Membership • Board Member • Membership Dues: $30,000 (1st year); $16,000 (subsequent years) • Access and Participation on ALL Subcommittees • Full Voting Rights • Subcommittee Member • Membership Dues Are Subcommittee Specific • Access to Specific Benchmarks • Technical Involvement Within Subcommittee • Help Determine Next Generation Benchmarks • Special Academic Membership EEMBC Philosophy: Standardized Benchmarks and Certified Scores • Member derived benchmarks • Determine the standard, the process, and the benchmarks • Open to industry feedback • Ensures all processor/compiler vendors are running the same tests • Certification process ensures credibility • All benchmark scores officially validated before publication • The entire benchmark environment must be disclosed 2 Embedded Industry: Represented by Very Diverse Applications • Networking • Storage, low- and high-end routers, switches • Consumer • Games, set top boxes, car navigation, smartcards • Wireless • Cellular, routers • Office Automation • Printers, copiers, imaging • Automotive • Engine control, Telematics Traditional Division of Embedded Applications Low High Power
    [Show full text]
  • Microbenchmarks in Big Data
    M Microbenchmark Overview Microbenchmarks constitute the first line of per- Nicolas Poggi formance testing. Through them, we can ensure Databricks Inc., Amsterdam, NL, BarcelonaTech the proper and timely functioning of the different (UPC), Barcelona, Spain individual components that make up our system. The term micro, of course, depends on the prob- lem size. In BigData we broaden the concept Synonyms to cover the testing of large-scale distributed systems and computing frameworks. This chap- Component benchmark; Functional benchmark; ter presents the historical background, evolution, Test central ideas, and current key applications of the field concerning BigData. Definition Historical Background A microbenchmark is either a program or routine to measure and test the performance of a single Origins and CPU-Oriented Benchmarks component or task. Microbenchmarks are used to Microbenchmarks are closer to both hardware measure simple and well-defined quantities such and software testing than to competitive bench- as elapsed time, rate of operations, bandwidth, marking, opposed to application-level – macro or latency. Typically, microbenchmarks were as- – benchmarking. For this reason, we can trace sociated with the testing of individual software microbenchmarking influence to the hardware subroutines or lower-level hardware components testing discipline as can be found in Sumne such as the CPU and for a short period of time. (1974). Furthermore, we can also find influence However, in the BigData scope, the term mi- in the origins of software testing methodology crobenchmarking is broadened to include the during the 1970s, including works such cluster – group of networked computers – acting as Chow (1978). One of the first examples of as a single system, as well as the testing of a microbenchmark clearly distinguishable from frameworks, algorithms, logical and distributed software testing is the Whetstone benchmark components, for a longer period and larger data developed during the late 1960s and published sizes.
    [Show full text]
  • Examining the Viability of FPGA Supercomputing
    1 Examining the Viability of FPGA Supercomputing Stephen D. Craven and Peter Athanas Bradley Department of Electrical and Computer Engineering Virginia Polytechnic Institute and State University Blacksburg, VA 24061 USA email: {scraven,athanas}@vt.edu Abstract—For certain applications, custom computational hardware created using field programmable gate arrays (FPGAs) produces significant performance improvements over processors, leading some in academia and industry to call for the inclusion of FPGAs in supercomputing clusters. This paper presents a comparative analysis of FPGAs and traditional processors, focusing on floating- point performance and procurement costs, revealing economic hurdles in the adoption of FPGAs for general High-Performance Computing (HPC). Index Terms— computational accelerator, digital arithmetic, Field programmable gate arrays, high- performance computing, supercomputers. I. INTRODUCTION Supercomputers have experienced a recent resurgence, fueled by government research dollars and the development of low-cost supercomputing clusters. Unlike the Massively Parallel Processor (MPP) designs found in Cray and CDC machines of the 70s and 80s, featuring proprietary processor architectures, many modern supercomputing clusters are constructed from commodity PC processors, significantly reducing procurement costs. In an effort to improve performance, several companies offer machines that place one or more FPGAs in each node of the cluster. Configurable logic devices, of which FPGAs are one example, permit the device’s hardware to be programmed multiple times after manufacture. A wide body of research over two decades has repeatedly demonstrated significant performance improvements for certain classes of applications when implemented within an FPGA’s configurable logic [1]. Applications well suited to speed-up by FPGAs typically exhibit massive parallelism and small integer or fixed-point data types.
    [Show full text]
  • Cloud Workbench a Web-Based Framework for Benchmarking Cloud Services
    Bachelor August 12, 2014 Cloud WorkBench A Web-Based Framework for Benchmarking Cloud Services Joel Scheuner of Muensterlingen, Switzerland (10-741-494) supervised by Prof. Dr. Harald C. Gall Philipp Leitner, Jürgen Cito software evolution & architecture lab Bachelor Cloud WorkBench A Web-Based Framework for Benchmarking Cloud Services Joel Scheuner software evolution & architecture lab Bachelor Author: Joel Scheuner, [email protected] Project period: 04.03.2014 - 14.08.2014 Software Evolution & Architecture Lab Department of Informatics, University of Zurich Acknowledgements This bachelor thesis constitutes the last milestone on the way to my first academic graduation. I would like to thank all the people who accompanied and supported me in the last four years of study and work in industry paving the way to this thesis. My personal thanks go to my parents supporting me in many ways. The decision to choose a complex topic in an area where I had no personal experience in advance, neither theoretically nor technologically, made the past four and a half months challenging, demanding, work-intensive, but also very educational which is what remains in good memory afterwards. Regarding this thesis, I would like to offer special thanks to Philipp Leitner who assisted me during the whole process and gave valuable advices. Moreover, I want to thank Jürgen Cito for his mainly technologically-related recommendations, Rita Erne for her time spent with me on language review sessions, and Prof. Harald Gall for giving me the opportunity to write this thesis at the Software Evolution & Architecture Lab at the University of Zurich and providing fundings and infrastructure to realize this thesis.
    [Show full text]
  • Overview of the SPEC Benchmarks
    9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
    [Show full text]
  • Automatic Benchmark Profiling Through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin
    Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin To cite this version: Alexis Martin, Vania Marangozova-Martin. Automatic Benchmark Profiling through Advanced Trace Analysis. [Research Report] RR-8889, Inria - Research Centre Grenoble – Rhône-Alpes; Université Grenoble Alpes; CNRS. 2016. hal-01292618 HAL Id: hal-01292618 https://hal.inria.fr/hal-01292618 Submitted on 24 Mar 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin , Vania Marangozova-Martin RESEARCH REPORT N° 8889 March 23, 2016 Project-Team Polaris ISSN 0249-6399 ISRN INRIA/RR--8889--FR+ENG Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin ∗ † ‡, Vania Marangozova-Martin ∗ † ‡ Project-Team Polaris Research Report n° 8889 — March 23, 2016 — 15 pages Abstract: Benchmarking has proven to be crucial for the investigation of the behavior and performances of a system. However, the choice of relevant benchmarks still remains a challenge. To help the process of comparing and choosing among benchmarks, we propose a solution for automatic benchmark profiling. It computes unified benchmark profiles reflecting benchmarks’ duration, function repartition, stability, CPU efficiency, parallelization and memory usage.
    [Show full text]
  • I.MX 8Quadxplus Power and Performance
    NXP Semiconductors Document Number: AN12338 Application Note Rev. 4 , 04/2020 i.MX 8QuadXPlus Power and Performance 1. Introduction Contents This application note helps you to design power 1. Introduction ........................................................................ 1 management systems. It illustrates the current drain 2. Overview of i.MX 8QuadXPlus voltage supplies .............. 1 3. Power measurement of the i.MX 8QuadXPlus processor ... 2 measurements of the i.MX 8QuadXPlus Applications 3.1. VCC_SCU_1V8 power ........................................... 4 Processors taken on NXP Multisensory Evaluation Kit 3.2. VCC_DDRIO power ............................................... 4 (MEK) Platform through several use cases. 3.3. VCC_CPU/VCC_GPU/VCC_MAIN power ........... 5 3.4. Temperature measurements .................................... 5 This document provides details on the performance and 3.5. Hardware and software used ................................... 6 3.6. Measuring points on the MEK platform .................. 6 power consumption of the i.MX 8QuadXPlus 4. Use cases and measurement results .................................... 6 processors under a variety of low- and high-power 4.1. Low-power mode power consumption (Key States modes. or ‘KS’)…… ......................................................................... 7 4.2. Complex use case power consumption (Arm Core, The data presented in this application note is based on GPU active) ......................................................................... 11 5. SOC
    [Show full text]
  • Performance of a Computer (Chapter 4) Vishwani D
    ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2013 Performance of a Computer (Chapter 4) Vishwani D. Agrawal & Victor P. Nelson epartment of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 1 What is Performance? Response time: the time between the start and completion of a task. Throughput: the total amount of work done in a given time. Some performance measures: MIPS (million instructions per second). MFLOPS (million floating point operations per second), also GFLOPS, TFLOPS (1012), etc. SPEC (System Performance Evaluation Corporation) benchmarks. LINPACK benchmarks, floating point computing, used for supercomputers. Synthetic benchmarks. ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 2 Small and Large Numbers Small Large 10-3 milli m 103 kilo k 10-6 micro μ 106 mega M 10-9 nano n 109 giga G 10-12 pico p 1012 tera T 10-15 femto f 1015 peta P 10-18 atto 1018 exa 10-21 zepto 1021 zetta 10-24 yocto 1024 yotta ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 3 Computer Memory Size Number bits bytes 210 1,024 K Kb KB 220 1,048,576 M Mb MB 230 1,073,741,824 G Gb GB 240 1,099,511,627,776 T Tb TB ELEC 5200-001/6200-001 Performance Fall 2013 . Lecture 4 Units for Measuring Performance Time in seconds (s), microseconds (μs), nanoseconds (ns), or picoseconds (ps). Clock cycle Period of the hardware clock Example: one clock cycle means 1 nanosecond for a 1GHz clock frequency (or 1GHz clock rate) CPU time = (CPU clock cycles)/(clock rate) Cycles per instruction (CPI): average number of clock cycles used to execute a computer instruction.
    [Show full text]
  • Automatic Benchmark Profiling Through Advanced Trace Analysis
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Hal - Université Grenoble Alpes Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin To cite this version: Alexis Martin, Vania Marangozova-Martin. Automatic Benchmark Profiling through Advanced Trace Analysis. [Research Report] RR-8889, Inria - Research Centre Grenoble { Rh^one-Alpes; Universit´eGrenoble Alpes; CNRS. 2016. <hal-01292618> HAL Id: hal-01292618 https://hal.inria.fr/hal-01292618 Submitted on 24 Mar 2016 HAL is a multi-disciplinary open access L'archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destin´eeau d´ep^otet `ala diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publi´esou non, lished or not. The documents may come from ´emanant des ´etablissements d'enseignement et de teaching and research institutions in France or recherche fran¸caisou ´etrangers,des laboratoires abroad, or from public or private research centers. publics ou priv´es. Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin , Vania Marangozova-Martin RESEARCH REPORT N° 8889 March 23, 2016 Project-Team Polaris ISSN 0249-6399 ISRN INRIA/RR--8889--FR+ENG Automatic Benchmark Profiling through Advanced Trace Analysis Alexis Martin ∗ † ‡, Vania Marangozova-Martin ∗ † ‡ Project-Team Polaris Research Report n° 8889 — March 23, 2016 — 15 pages Abstract: Benchmarking has proven to be crucial for the investigation of the behavior and performances of a system. However, the choice of relevant benchmarks still remains a challenge. To help the process of comparing and choosing among benchmarks, we propose a solution for automatic benchmark profiling.
    [Show full text]
  • Srovnání Kvality Virtuálních Strojů Assessment of Virtual Machines Performance
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by DSpace at VSB Technical University of Ostrava VŠB - Technická univerzita Ostrava Fakulta elektrotechniky a informatiky Katedra Informatiky Srovnání kvality virtuálních strojů Assessment of Virtual Machines Performance 2011 Lenka Novotná Prohlašuji, že jsem tuto bakalářskou práci vypracovala samostatně. Uvedla jsem všechny literární prameny a publikace, ze kterých jsem čerpala. V Ostravě dne 20. dubna 2011 ……………………… Lenka Novotná Ráda bych poděkovala vedoucímu bakalářské práce, Ing. Petru Olivkovi, za pomoc, věcné připomínky a veškerý čas, který mi věnoval. Abstrakt Hlavním cílem této bakalářské práce je vysvětlit pojem virtualizace, virtuální stroj, a dále popsat a porovnat dostupné virtualizační prostředky, které se využívají ve světovém oboru informačních technologií. Práce se především zabývá srovnáním výkonu virtualizačních strojů pro desktopové operační systémy MS Windows a Linux. Při testování jsem se zaměřila na tři hlavní faktory a to propustnost sítě, při které jsem použila aplikaci Iperf, ke změření výkonu diskových operací jsem využila program IOZone a pro test posledního faktoru, který je zaměřen na přidělování CPU procesů, jsem použila známé testovací aplikace Dhrystone a Whetstone. Všechna zmiňovaná měření okruhů byla provedena na třech virtualizačních platformách, kterými jsou VirtualBox OSE, VMware Player a KVM s QEMU. Klíčová slova: Virtualizace, virtuální stroj, VirtualBox, VMware, KVM, QEMU, plná virtualizace, paravirtualizace, částečná virtualizace, hardwarově asistovaná virtualizace, virtualizace na úrovní operačního systému, měření výkonu CPU, měření propustnosti sítě, měření diskových operací, Dhrystone, Whetstone, Iperf, IOZone. Abstract The main goal of this thesis is explain the term of Virtualization and Virtual Machine, describe and compare available virtualization resources, which we can use in worldwide field of information technology.
    [Show full text]
  • Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance
    Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance Wilson Lisan August 24, 2018 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2018 Abstract This dissertation project is based on participation in the Student Cluster Competition (SCC) at the International Supercomputing Conference (ISC) 2018 in Frankfurt, Germany as part of a four-member Team EPCC from The University of Edinburgh. There are two main projects which are the team-based project and a personal project. The team-based project focuses on the optimisations and tweaks of the HPL, HPCG, and HPCC benchmarks to meet the competition requirements. At the competition, Team EPCC suffered with hardware issues that shaped the cluster into an asymmetrical system with mixed hardware. Unthinkable and extreme methods were carried out to tune the performance and successfully drove the cluster back to its ideal performance. The personal project focuses on testing the SCC benchmarks to evaluate the performance efficiency and system balance at several HPC systems. HPCG fraction of peak over HPL ratio was used to determine the system performance efficiency from its peak and actual performance. It was analysed through HPCC benchmark that the fraction of peak ratio could determine the memory and network balance over the processor or GPU raw performance as well as the possibility of the memory or network bottleneck part. Contents Chapter 1 Introduction ..............................................................................................
    [Show full text]
  • Introduction to Digital Signal Processors
    INTRODUCTION TO Accumulator architecture DIGITAL SIGNAL PROCESSORS Memory-register architecture Prof. Brian L. Evans in collaboration with Niranjan Damera-Venkata and Magesh Valliappan Load-store architecture Embedded Signal Processing Laboratory The University of Texas at Austin Austin, TX 78712-1084 http://anchovy.ece.utexas.edu/ Outline n Signal processing applications n Conventional DSP architecture n Pipelining in DSP processors n RISC vs. DSP processor architectures n TI TMS320C6x VLIW DSP architecture n Signal and image processing applications n Signal processing on general-purpose processors n Conclusion 2 Signal Processing Applications n Low-cost embedded systems 4 Modems, cellular telephones, disk drives, printers n High-throughput applications 4 Halftoning, radar, high-resolution sonar, tomography n PC based multimedia 4 Compression/decompression of audio, graphics, video n Embedded processor requirements 4 Inexpensive with small area and volume 4 Deterministic interrupt service routine latency 4 Low power: ~50 mW (TMS320C5402/20: 0.32 mA/MIP) 3 Conventional DSP Architecture n High data throughput 4 Harvard architecture n Separate data memory/bus and program memory/bus n Three reads and one or two writes per instruction cycle 4 Short deterministic interrupt service routine latency 4 Multiply-accumulate (MAC) in a single instruction cycle 4 Special addressing modes supported in hardware n Modulo addressing for circular buffers (e.g. FIR filters) n Bit-reversed addressing (e.g. fast Fourier transforms) 4Instructions to keep the
    [Show full text]