A Virtual Platform Environment for Exploring Power, Thermal and Reliability Management Control Strategies in High-Performance Multicores

Total Page:16

File Type:pdf, Size:1020Kb

A Virtual Platform Environment for Exploring Power, Thermal and Reliability Management Control Strategies in High-Performance Multicores A Virtual Platform Environment for Exploring Power, Thermal and Reliability Management Control Strategies in High-performance Multicores Andrea Bartolini†, Matteo Cacciari †, Andrea Tilli †, Matthias Gries‡, Luca Benini † University of Bologna, DEIS † Intel Labs ‡ Italy Germany Microprocessor Lab a.bartolini, matteo.cacciari, andrea.tilli, [email protected], [email protected] Outline • Introduction • Simulation Strategy • Virtual Platform • Platform Characterization • Control Development Cycle • Case Study • Conclusion Today and Future Multicores Tecnology scaling software System integration High performace Costs requirements Spatial and temporal Limitated workload variation High dissipation power capabilities densities NON UNIFORM: power, temperature, performance Leakage current Reliability lost, Hot spots, thermal Aging gradients and cycles Today and Future Multicores Tecnology scaling software System integration High performace Costs requirements Spatial and temporal Limitated workload variation High dissipation power Resource Management solution:capabilities densities on-line tuning of system performance and NON UNIFORM: temperaturepower, temperature, through closed performance-loop control management policies Leakage current Reliability lost, Hot spots, thermal Aging gradients and cycles Management Loop: Holistic view System information Workload from OS Task scheduling - migration CPU utilization, queue status Control algo migration policy SW HW Introspection: monitors Self-calibation: knobs (Vdd, Vb, f,on/off )‏ Reliability alarms Temperature Power Multicore Platform Core 1 Core N Proc. 1 Proc. 2 … Proc. N INTERCONNECT Private Private … Private Mem Mem Mem Evaluation by Simulation HW prototype and test-chips: accurate but expensive • Long and complex development stage System • Limited introspection Simulator • Incorrect control policy can lead to permanent HW damage Key features : • must accurately simulate the entire system evolution (program flow, data coherency conflicts, complex memory latency) • to avoid unrealistic simulation artifacts • to allow evaluation of control solution that fits in the system • must emulate the same performance knobs and introspective sensors of real HW • must co-simulate the physical effects we want the controller to be developed for • Power model, Thermal model, Reliability model • allows high-level control strategies co-simulation • acting on virtual performance knobs / reacting on virtual introspective sensors Simulator Strategy Trace driven Simulator [1]: • Not suitable for full system simulation (How to simulate O.S.?) • looses information on cross-dependencies resulting in degraded simulation accuracy Workload Multicore Simulator Power Model Control strategy Temperature Model [1] P Chaparro et al. Understanding the thermal implications of multi-core architectures. 2007 Simulator Strategy Trace driven Simulator [1]: • Not suitable for full system simulation (How to simulate O.S.?) • looses information on cross-dependencies Workload Set resulting in degraded simulation accuracy Close loop simulator: Multicore • Cycle accurate simulators [2] : Simulator • High modeling accuracy • support well-established power and temperature co- Execution Trace simulation based on analytical models and system database micro-architectural knowledge • Low simulation speed • Not suitable for full-system simulation Power Model • Functional and instruction set simulators: Control • allow full system simulation strategy • less internal precision Temperature Model • less detailed input data no micro-architectural analytical model • introduces the challenge of having accurate power and temperature physical models [1] P Chaparro et al. Understanding the thermal implications of multi-core architectures. 2007 [2] Benini L. et al. MPARM: Exploring the multi-processor SoC design space with SystemC 2005 Contributions Novel Virtual Platform: • Allows fast but physical accurate simulation • Based on a full-system multicore high-level functional simulator (Virtutech Simics [1]) • Integrates Power and Thermal model derived by real HW characterization • Enables fast control-algorithms prototyping, design and testing: • Allowing co-simulation of multicores SoCs with high-level description of control-algorithms (Mathworks Matlab/Simulink [2]) Challenges: • Identification, from experiments on a real general-purpose multicore platform, of accurate, but component-oriented, power and thermal models • Interfacing physical model with functional simulator engine • Development of a co-simulation bridge between the multicores simulator and Mathworks Matlab/Simulink. [1] http://www.virtutech.com/ [2] http://www.mathworks.com/ Virtual Platform App.1 App.N 1 1 N N T T T RUBY CORE ... T .... ... Stall Mem O.S. SW Access HW CPU1 CPU2 CPUN SIMICS L1 L1 L1 L2 L2 Simulator Network DRAM Simics by Virtutech: Virtual Platform • full system functional simulator Memory timing model • models the entire system: • RUBY – GEMS (University of Wisconsin)[1] peripherals, BIOS, network interfaces, cores, • Public cycle-accurate memory timing memories model • allows booting full OS, such as LinuxSMP • Different target memory architectures • supports different target CPU (arm, sparc, x86) • fully integrated with Virtutech Simics • x86 model: • written in C++ • in-order • we use it as skeleton to apply our add- • all instruction are retired in 1 cycle ons (as C++ object) • does not account for memory latency [1] Martin Milo M. K. et al. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset 2005 Virtual Platform Performance knobs (DVFS) module: • Virtutech Simics support frequency change at run-time • RUBY does not support it: • does not have internal knowledge of frequency • We add a new DVFS module to support it : • ensures L2 cache and DRAM to have a constant clock frequency • L1 latency scale with Simics processor clock frequency App.1 App.N 1 1 N N T T T ... T .... ... fi ,VDD f O.S. SW i f,v f,v f,v HW DVFS RUBY CORE CPU1 CPU2 CPUN f Stall Mem L1 L1 L1 i Access L2 L2 SIMICS Network DRAM Simulator Virtual Platform Virtual Platform Performance knobs (DVFS) module: • Virtutech Simics support frequency change at run-time • RUBY does not support it: • does not have internal knowledge of frequency • We add a new DVFS module to support it : • ensures L2 cache and DRAM to have a constant clock frequency • L1 latency scale with Simics processor clock frequency App.1 App.N 1 1 N N T T T ... T .... ... fi ,VDD f O.S. SW i f,v f,v f,v HW DVFS RUBY CORE PC CPU1 CPU2 CPUN f Stall Mem #hlt,stall L1 L1 L1 i Access active,cycles L2 L2 SIMICS Network DRAM #L1 , #BUS , MISS ACCESS Virtual Platform Simulator CYCLEACTIVE,.... Virtual Platform Power model module: • At run-time estimate the power consumption of the target architecture • Core model PT = [PD(f,CPI) + PS(T, VDD)] *(1 − idleness) + idleness *(PIDLE) • PD experimentally calibrated analytical power model • Cache and memory power – access cost estimated with CACTI [1] PCORE, PL1, PL2 App.1 App.N POWER MODEL 1 1 N N T T T ... T .... ... fi ,VDD f O.S. SW i f,v f,v f,v HW DVFS RUBY CORE PC CPU1 CPU2 CPUN f Stall Mem #hlt,stall L1 L1 L1 i Access active,cycles L2 L2 SIMICS Network DRAM #L1 , #BUS , Simulator MISS ACCESS Virtual Platform CYCLEACTIVE,.... [1] Thoziyoor Shyamkumar et al. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. 2008 Power Model . Power model interface i-th CPU # Istruction retired P # Stall Cycle O # HLT Istruction P W i-th L1 O # Line & WD Read E W f = k * f f = k * f # Line & WD Write R 1 1 nom 2 2 nom fN = kN * fnom E f ,V ,T R dd & CPU1 CPU2 CPU N M L1 L1 L1 i-th L2 E O N L2 L2 D # Line Read E # Line Write E R Network L G DRAM Simulation snap-shot Y DRAM Ruby & simics # Burst Read # Burst Write Modeling Real Platform – Power Real Power Measurement • Intel server system S7000FC4UR • 16 cores - 4 quad cores Intel® Xeon® X7350, 2.93GHz • 16GB FBDIMMs • Intel® Core™ 2 Duo architecture • At the wall Power consumption • test: • set of synthetic benchmarks with different memory pattern accesses • forcing all the cores to run at different performance levels • for each benchmark we extract the clocks per instruction metrics (CPI) and correlate it with the power consumption 2 kE PD kA VDD fCK kB (kC kD fCK )CPI • We relate the static power with the operating point by using an analytical model Modeling Real Platform – Power Real Power Measurement • Intel server system S7000FC4UR • 16 cores - 4 quad cores Intel® Xeon® X7350, 2.93GHz • 16GB FBDIMMs High accuracy at • Intel® Core™ 2 Duo architecture high and low CPI • At the wall Power consumption • test: • set of synthetic benchmarks with different memory pattern accesses • forcing all the cores to run at different performance levels • for each benchmark we extract the clocks per instruction metrics (CPI) and correlate it with the power consumption 2 kE PD kA VDD fCK kB (kC kD fCK )CPI • We relate the static power with the operating point by using an analytical model Virtual Platform Temperature model module: • we integrate our virtual platform with a thermal simulator [1] • Input: power dissipated by the main functional units composing the target platform • Output: Provides the temperature distribution along the simulated multicore die area as output TEMPERATURE MODEL T PCORE, PL1, PL2 App.1 App.N POWER MODEL 1 1 N N T T T ... T .... ... fi ,VDD f O.S. SW i f,v f,v f,v HW DVFS RUBY CORE PC CPU1 CPU2 CPUN f Stall Mem #hlt,stall L1 L1
Recommended publications
  • Fax86: an Open-Source FPGA-Accelerated X86 Full-System Emulator
    FAx86: An Open-Source FPGA-Accelerated x86 Full-System Emulator by Elias El Ferezli A thesis submitted in conformity with the requirements for the degree of Master of Applied Science (M.A.Sc.) Graduate Department of Electrical and Computer Engineering University of Toronto Copyright c 2011 by Elias El Ferezli Abstract FAx86: An Open-Source FPGA-Accelerated x86 Full-System Emulator Elias El Ferezli Master of Applied Science (M.A.Sc.) Graduate Department of Electrical and Computer Engineering University of Toronto 2011 This thesis presents FAx86, a hardware/software full-system emulator of commodity computer systems using x86 processors. FAx86 is based upon the open-source IA-32 full-system simulator Bochs and is implemented over a single Virtex-5 FPGA. Our first prototype uses an embedded PowerPC to run the software portion of Bochs and off- loads the instruction decoding function to a low-cost hardware decoder since instruction decode was measured to be the most time consuming part of the software-only emulation. Instruction decoding for x86 architectures is non-trivial due to their variable length and instruction encoding format. The decoder requires only 3% of the total LUTs and 5% of the BRAMs of the FPGA's resources making the design feasible to replicate for many- core emulator implementations. FAx86 prototype boots Linux Debian version 2.6 and runs SPEC CPU 2006 benchmarks. FAx86 improves simulation performance over the default Bochs by 5 to 9% depending on the workload. ii Acknowledgements I would like to begin by thanking my supervisor, Professor Andreas Moshovos, for his patient guidance and continuous support throughout this work.
    [Show full text]
  • Ramp: Research Accelerator for Multiple Processors
    ..................................................................................................................................................................................................................................................... RAMP: RESEARCH ACCELERATOR FOR MULTIPLE PROCESSORS ..................................................................................................................................................................................................................................................... THE RAMP PROJECT’S GOAL IS TO ENABLE THE INTENSIVE, MULTIDISCIPLINARY INNOVATION John Wawrzynek THAT THE COMPUTING INDUSTRY WILL NEED TO TACKLE THE PROBLEMS OF PARALLEL David Patterson PROCESSING. RAMP ITSELF IS AN OPEN-SOURCE, COMMUNITY-DEVELOPED, FPGA-BASED University of California, EMULATOR OF PARALLEL ARCHITECTURES. ITS DESIGN FRAMEWORK LETS A LARGE, Berkeley COLLABORATIVE COMMUNITY DEVELOP AND CONTRIBUTE REUSABLE, COMPOSABLE DESIGN Mark Oskin MODULES. THREE COMPLETE DESIGNS—FOR TRANSACTIONAL MEMORY, DISTRIBUTED University of Washington SYSTEMS, AND DISTRIBUTED-SHARED MEMORY—DEMONSTRATE THE PLATFORM’S POTENTIAL. Shih-Lien Lu ...... In 2005, the computer hardware N Prototyping a new architecture in Intel industry took a historic change of direction: hardware takes approximately four The major microprocessor companies all an- years and many millions of dollars, nounced that their future products would even at only research quality. Christoforos Kozyrakis be single-chip multiprocessors, and that N Software
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Selected Project Reports, Spring 2005 Advanced OS & Distributed Systems
    Selected Project Reports, Spring 2005 Advanced OS & Distributed Systems (15-712) edited by Garth A. Gibson and Hyang-Ah Kim Jangwoo Kim††, Eriko Nurvitadhi††, Eric Chung††; Alex Nizhner†, Andrew Biggadike†, Jad Chamcham†; Srinath Sridhar∗, Jeffrey Stylos∗, Noam Zeilberger∗; Gregg Economou∗, Raja R. Sambasivan∗, Terrence Wong∗; Elaine Shi∗, Yong Lu∗, Matt Reid††; Amber Palekar†, Rahul Iyer† May 2005 CMU-CS-05-138 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 †Information Networking Institute ∗Department of Computer Science ††Department of Electrical and Computer Engineering Abstract This technical report contains six final project reports contributed by participants in CMU’s Spring 2005 Advanced Operating Systems and Distributed Systems course (15-712) offered by professor Garth Gibson. This course examines the design and analysis of various aspects of operating systems and distributed systems through a series of background lectures, paper readings, and group projects. Projects were done in groups of two or three, required some kind of implementation and evalution pertaining to the classrom material, but with the topic of these projects left up to each group. Final reports were held to the standard of a systems conference paper submission; a standard well met by the majority of completed projects. Some of the projects will be extended for future submissions to major system conferences. The reports that follow cover a broad range of topics. These reports present a characterization of synchronization behavior
    [Show full text]
  • First Steps Simics/GEMS on Ubuntu 7.10
    First Steps Simics/GEMS on Ubuntu 7.10 Author: Aleksandar Vitorovic, [email protected], student of Faculty of Electrical Engineering, University of Belgrade, Serbia This document is not a product of the Multifacet Group. It is a GEMS user contribution, and this document is provided as-is, without warranty or guarantee of any kind, and without support. This document may serve as a useful guide for many users, but is neither tested nor verified by any authors of the GEMS infrastructure. 1 Introduction This tutorial explains basic steps for users to get familiar with GEMS/Simics environment. It is idiot- proof tutorial. Basic think you should know to start simulator is showed. In every fork I decided to choose one option and explain it in detail. For further details and other options you should consult appropriate documentation. In chapter 2 is detailed installation process for GEMS 2.1 and Simics 3.0.30 on Ubuntu 7.10 x86. In chapter 3 is detailed installation process for Simics 2.2.19 on Ubuntu 7.10 x86. In chapter 4 there is different manuals for Frequently used Commands. In chapter 5 we explain how to add benchmark to your target machine and how to start that. In chapter 6 there is list of used references. This tutorial are provided AS IS with NO WARRANTY OF ANY KIND, INCLUDING THE WARRANTY OF DESIGN, MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE. 2 GEMS 2.1/Simics 3.0.30 Installation It is possible to install GEMS 2.1 based on Simics 3.0.30 on Ubuntu 7.10 x86 with GCC 4.1.3, although GEMS manufacturers internally work with GCC 3.4.6.
    [Show full text]
  • Automotive MRD Overview
    Developing Multicore Software is Easy, If… Freescale Technical Forum - Summer 2009 Emeka Nwafor Director, Product Management Development Tools © 2009 Wind River Systems, Inc. Abstract Multicore is here. There has been much written and discussed about how challenging developing Multicore Software can be. Most of us have heard the saying "you shouldn't bring a knife to a gunfight". This principle applies to developing Multicore Software. We will talk about how Wind River's development tools help to make developing Multicore Software easier on Freescale's family of multicore processors. We will also talk about directions that Wind River is taking to make developing Multicore Software "easy". 2 © 2009 Wind River Systems, Inc. Agenda Context: Where are we with multicore, today? What is Wind River's Multicore Software solution? Simplifying development on Freescale's multicore devices Directions for Multicore Software 3 © 2009 Wind River Systems, Inc. Some Market Trends around Multicore 12% Decline in single processor projects since 2006 18% Increase in dual-processor based designs since 2006 42% Percentage of embedded developers that say that chip technology will represent the most dramatic area change in embedded software development over the next 2-5 years 34% Percentage of embedded developers saying that embedded development / design processes will change significantly in the next 2-5 years 84% State that in-house code from previous projects is reused in new projects Source: 2009 TechInsights Embedded Market Study 4 © 2009 Wind River Systems, Inc. Some Qualitative Observations “We found that things were obscenely sensitive while we were trying to optimize the usage of our cache…” “It is very easy to write a slow multicore application and very hard to figure why it is slow.” “Software Engineers didn’t ask for multicore – it was forced on them…” “We are absolutely terrified about multicore and the challenges that parallelism and concurrency are introducing.
    [Show full text]
  • MC-Sim: an Efficient Simulation Tool for Mpsoc Designs
    MC-Sim: An Efficient Simulation Tool for MPSoC Designs Jason Cong, Karthik Gururaj, Guoling Han, Adam Kaplan, Mishali Naik, Glenn Reinman Computer Science Department, University of California, Los Angeles Los Angeles, CA 90095, USA ABSTRACT Chip (MPSoC). To manage the massive increase in the design The ability to integrate diverse components such as processor space available for system designers, simulation tools, which cores, memories, custom hardware blocks and complex can provide rapid and accurate estimations of the performance network-on-chip (NoC) communication frameworks onto a and cost of design instances, are required. Specifically, the tool single chip has greatly increased the design space available for should possess the capability to rapidly and accurately evaluate system-on-chip (SoC) designers. Efficient and accurate the performance of multi-core designs, model a variety of performance estimation tools are needed to assist the designer in memory systems and NoC configurations, simulate coprocessors making design decisions. In this paper, we present MC-Sim, a efficiently and allow the designer to migrate from one design heterogeneous multi-core simulator framework which is capable instance to another easily. of accurately simulating a variety of processor, memory, NoC configurations and application specific coprocessors. We also 1.1 Previous Work describe a methodology to automatically generate fast, cycle- Single processor simulators true behavioral, C-based simulators for coprocessors using a Traditional single-core simulators, such as SimpleScalar [7], high-level synthesis tool and integrate them with MC-Sim, thus were designed to model out-of-order superscalar processor augmenting it with the capacity to simulate coprocessors. Our C- pipelines, and lack the ability to model multiple application based simulators provide on an average 45x improvement in workloads or shared-memory parallel applications, let alone the simulation speed over that of RTL descriptions.
    [Show full text]
  • Simics Brief Introduction Presented By: Chunrong Lai Software Engineer Simics Technology Center (STC), Software and Services Group
    Simics Brief Introduction Presented By: Chunrong Lai Software Engineer Simics Technology Center (STC), Software and Services Group 26 Oct, 2012 What is Simics? Simics is a high-performance, full system simulator used by software developers to simulate large and complex electronic systems. Wind River Simics • Simulate any size of target system • Run unmodified binaries • Other use cases as add-ons Simics allows you to break the rules of embedded systems development Software and Services Group ‹#› Simics History Research project at SICS started in 1991 Virtutech was founded in Sweden in 1998 and incorporated in USA in 2003. Acquired by Intel in 2010 Marketed and sold through Wind River (subsidiary) Software and Services Group ‹#› Simics Customers Marketing, Field & Sales Simics users Simics Technology Center Widley used acroos different Intel Simics External Customers, include groups/projects as an uniform network, communication, space, software platform simulator semiconductor and HPC field Software and Services Group ‹#› What Wind River Simics Customers Say Computing Server Networking Industrial Space . Shortens bring-up . Finds and solves . Prevents the loss . Saves $6M in time from 22 weeks problems in 30 of $10M by getting target hardware to 29 days minutes instead the ASIC right labs of three weeks before and three people manufacturing . Configures complex labs in minutes instead of weeks Software and Services Group ‹#› 5 Some Simics Features and Capabilities • High performance • Modeling language for fast development • Synchronized
    [Show full text]
  • Embedded Systems Computer Embeddedembedded Architecture Systemssystems
    Embedded Systems Computer EmbeddedEmbedded Architecture SystemsSystems Jakob Engblom, PhD Uppsala University & Virtutech Inc. [email protected] [email protected] virtuvirtutechtech 14 Nov 2003 Embedded Computer Architecture 2 Embedded Systems Embedded Systems Now what is this elephant thing? You’re all wrong, it is a »“A computer that doesn’t fan! No, a look like a computer” It is a wall! snake! »Interacts with world »Primitive or no user interface »Part of other products No, it is a Part of other products treetrunk! No, a pillar! 14 Nov 2003 Embedded Computer Architecture 3 14 Nov 2003 Embedded Computer Architecture 4 Embedded Systems Processor Market »Single purpose products »Embedded = most processors! ¡Not general purpose like desktop PCs ¡200 million PC and server ¡Do one thing very efficiently ¡8000 million embedded »Software very important: ¡Gives character to product Used to differentiate inside a “platform” ¡Can be changed late "Embedded" 98% ¡Processor cheaper than special HW "Desktop" ¡Today, dominates dev cost 2% 14 Nov 2003 Embedded Computer Architecture 5 14 Nov 2003 Embedded Computer Architecture 6 Processor Market Real-Time System » Processors: 100% »Timing as important as result ¡ 50% of all DSP DSP semiconductor revenue 90% 4-bit 4-bit »Hard real-time: ¡ Explains why everyone 80% 8-bit wants to do processors 70% 16-bit ¡Hard deadlines » 32-bit dominant 60% ¡Dead if missed deadline ¡ 30% of total 50% 8-bit ¡Worst-case semiconductors 40% » PC processors: 30% 32-bit »Soft real-time: ¡ 50% of CPU revenue 20% 16-bit
    [Show full text]
  • HPC Virtualization with Xen on Itanium
    HPC Virtualization with Xen on Itanium H˚avard K. F. Bjerke July 18, 2005 Abstract The Xen Virtual Machine Monitor has proven to achieve higher efficiency in virtualizing the x86 architecture than competing x86 virtualization technolo- gies. This makes virtualization on the x86 platform more feasible in High- Performance and mainframe computing, where virtualization can offer attractive solutions for managing resources between users. Virtualization is also attractive on the Itanium architecture. Future x86 and Itanium computer architectures include extensions which make virtualization more efficient. Moving to virtu- alizing resources through Xen may ready computer centers for the possibilities offered by these extensions. The Itanium architecture is “uncooperative” in terms of virtualization. Privi- lege-sensitive instructions make full virtualization inefficient and impose the need for para-virtualization. Para-virtualizing Linux involves changing certain native operations in the guest kernel in order to adapt it to the Xen virtual ar- chitecture. Minimum para-virtualizing impact on Linux is achieved by, instead of replacing illegal instructions, trapping them by the hypervisor, which then emulates them. Transparent para-virtualization allows the same Linux kernel binary to run on top of Xen and on physical hardware. Itanium region registers allow more graceful distribution of memory between guest operating systems, while not disturbing the Translation Lookaside Buffer. The Extensible Firmware Interface provides a standardized interface to hard- ware functions, and is easier to virtualize than legacy hardware interfaces. The overhead of running para-virtualized Linux on Itanium is reasonably small and measured to be around 4.9 %. Also, the overhead of running transpar- ently para-virtualized Linux on physical hardware is reasonably small compared to non-virtualized Linux.
    [Show full text]
  • Evaluation of CPU Architecture by Simulation Technologies And
    International Journal of Scientific & Engineering Research, Volume 7, Issue 11, November-2016 1271 ISSN 2229-5518 Evaluation of CPU architecture by simulation technologies and benchmark computer systems Madiha Yousafa, Muhammad Harisb abCOMSATS Institute of Information Technology Park Road, Tarlai Kalan, Islamabad-45550, Pakistan [email protected] Abstract— The processor architecture designers face major challenges to improve the processor’s performance. To measure the performance of the processor there are many parameter like performance of cache, TLB, IO operations, bus speed etc. Different companies launch series of processors with same base configuration and a little change of variations base on cache size, cache levels, share and separate cache and many other parameters like that. There are many simulators available to measure processor's performance theoretically with several parameters. We simulate some models of Intel Pentium 4 and UltraSPARC II processors and analyze the performance of the processors regarding cache and TLB. Keywords: ISTC (Instruction Simulator Translation Caches), DSTC (Data Simulator Translation Caches) —————————— —————————— 1 INTRODUCTION LSI design and developer companies like Intel, SUN, It stands for multimedia extensions and was developed by V AMD and IBM are also working in the field of processor- the Intel to increase the speed of multi media design. Before moving on we must first discuss that what operations. [6] is processor design and the factors on which it depends. Designing a computer processor in such a way that cycle time, In addition to above mentioned technologies, the architec- reliability, performance, chip area and many other issues are tures like P4 Netburst were very helpful to improve proces- resolved.
    [Show full text]
  • Simics-On-Simics
    Jakob Engblom, PhD, Intel, Stockholm, Sweden [email protected] My Background Jakob Engblom Blog at the Intel Developer Zone . MSc, Computer Science, Uppsala . https://software.intel.com/en-us/meet- . PhD, Real-Time Systems, Uppsala the-developers/evangelists/team/jakob- Currently: engblom . Product Management Engineer, Simics Core team, at Intel in Stockholm, Sweden My own blog, since 2007: . Software Evangelist – Simulation . https://jakob.engbloms.se Previously: . https://www.engbloms.se/jakob.html . IAR Systems, Virtutech, Wind River Very rarely touch actual hardware . Product management, product marketing, technical sales, technical marketing, business when doing development. Very rarely. development, training development, demos, ... Copyright Intel 2019 | SAMOS 2019 2 What Does Intel Do? • Intel® Core® • Intel® Xeon® • SSD • Ethernet • Intel® Atom™ • Processors • 3D XPoint™ • WiFi • Chipsets • Chipsets • Intel® Optane™ • Bluetooth • Smart NICs • GNSS Laptop and Server Storage Connectivity desktop • SoC-FPGA • Movidius • Processors • Development tools • FPGA • Nervana • Gateways • Compilers • FPGA-CPU combo • MobilEye • Security • Simulation solutions • Intel® Xeon® • Management • Linux & Windows drivers • UEFI & BIOS FPGA AI and ML IoT Software Copyright Intel 2019 | SAMOS 2019 3 Copyright Intel 2019 | SAMOS 2019 Hardware: A Hard Development Platform? Copyright Intel 2019 | SAMOS 2019 5 Hardware is Hard When it is... Not yet available Flaky prototype stage Not available anymore Copyright Intel 2019 | SAMOS 2019 6 Hardware is
    [Show full text]