Advance CPU Architecture

Total Page:16

File Type:pdf, Size:1020Kb

Advance CPU Architecture Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. ©byTien-FuChen@CCU Adv CPU-0 MMX technology ! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism => single-instruction multiple data (SIMD) ! features " packed data type " a rich set of MMX instructions to perform parallel operations " saturation arithmetic different from regular arithmetic: don’t truncate/wrapping around choosing largest or smallest numbers " parallel compare " overlapped operations " pack/unpack data type " compatible extension architectures ©byTien-FuChen@CCU Adv CPU-1 Packed Data Types (small data types packed into one) register ! Dual Usage of Floating-point Register ! Enhanced Instruction Set Operating In Parallel Fashion " Totally 57 MMX instructions are added to IA. ©byTien-FuChen@CCU Adv CPU-2 Fast DSP computation ©byTien-FuChen@CCU Adv CPU-3 Performance of Matrix Multiplication Performance Comparison between IA and MMX -working example on Matrix and vector multiplication Traditional IA MMX No.ofLoads 32 8 No.ofMultiply 16 4 No.ofAdd 15 3 Vector Vector *Loop control 12 0 multiplication Other overhead 0 3 Final result save 1 1 Instr Count 76 19 **Cycle Count 200 12 Total Instrs 4(4x76+3)=1228 4(4x19+3) = 316 Matrix Vector Multiplication Both under 1200 cycles 207 cycles optimized mode Comp Result: Speed up 5.8 times * Assume we per form 4 MACs (out of 16) per loop iteration of our code. for ( K = 1; K < 5; K++) { Mac (K); } So for each loop, there will be 3 instruction per iteration, increment, compare, and branch. ** 1) The cycle count is dominated by the nonpipelined, 11-cycle integer multiply operation 2) 4 mispredictions totally when existing the loops 3) All data are in on-chip caches; ©byTien-FuChen@CCU Adv CPU-4 More Parallelisms ! Streaming SIMD Extension (SSE) since Pentium III. " Physically add eight new 128 bit XMM registers and 70 instruction set. New machine state introduced. " Support four 32-bit single precision floating point operations in parallel. Recall all MMX SIMD instruction are all for mere integers. ! Streaming SIMD Extension 2 (SSE2) since Pentium 4. " Use XMM registers. No new machine. " 144 new instructions added. " Support double precision floating point parallel operations. ! IA-64 ItaniumTM Architecture. " Enable, enhance, express, exploit Parallelism at: Proc./Thread level for programmers, at the instruction level for compilers. All explicitly. ©byTien-FuChen@CCU Adv CPU-5 Objectives of IA-64 Instruction Set Architecture (ISA) ! Intel and HP Technology Alliance ! Enable industry leading system performance " Breakthrough performance " Headroom ! Enable compatibility with today’s IA-32 software & PA- RISC software ! Allow scalability over a wide range of implementations ! Full 64-bit Full 64-bit computing ©byTien-FuChen@CCU Adv CPU-6 Next Generation Terminology ! EPIC: (Explicitly Parallel Instruction Computing): the next generation processor technology " e.g., RISC, CISC ! IA-64 (Intel Architecture, 64-bit): the architecture that incorporates EPIC Technology " e.g., IA-32, PA-RISC ! Merced processor: the project name for Intel’s first IA-64-based implementation " e.g., Pentium II, PA-8500 ©byTien-FuChen@CCU Adv CPU-7 Features of IA-64 Architecture ! Explicit Parallelism " ILP is explicit in machine code " compiler analyzes and identifies parallelism at compile time ! Predication Enhances Parallelism ! Speculation Minimizes the Effect of Memory Latency ! IA-64 Processors are Massively Resourced " Many registers " Many functional units " Inherently scalable ! Performance, headroom, binary compatibility ©byTien-FuChen@CCU Adv CPU-8 Predication: Features and Benefits ! Compiler given larger scheduling scope " Nearly all instructions can be predicated " State updated if an instruction?s predicate is true, otherwise " acts as a NOP " Compiler assigns predicates, compare instructions set them " Architecture provides 64 1-bit predicate registers (PR) ! Predicated execution removes branches " Convert a control dependence to a data dependence " Reduce mispredict penalties ! Parallel execution through larger basic " Effective use of parallel hardware ©byTien-FuChen@CCU Adv CPU-9 Intel/HP IA-64 “Explicitly Parallel Instruction Computer (EPIC)” ! IA-64: instruction set architecture; EPIC is type " EPIC = 2nd generation VLIW? ! Itanium™ the first implementation (2001) " Highly parallel and deeply pipelined hardware at 800Mhz " 6-wide, 10-stage pipeline at 800Mhz on 0.18 µ process ! 128 64-bit integer registers + 128 82-bit floating point registers " Not separate register files per functional unit as in old VLIW ! Hardware checks dependencies (interlocks => binary compatibility over time) ! Predicated execution (select 1 out of 64 1-bit flags) => 40% fewer mispredictions? ©byTien-FuChen@CCU Adv CPU-10 Binary Compatibility C, C++, IA-32 PA-RISC High-level Fortran, Object Object Language COBOL Code Code • Application Source Compatible • Design Criteria • C, C++ and FTN • Systems Architecture • Transparent to User Native • Default Compiler and Optimizer Native IA-64 Code Dynamic HP-UX and NT Translator IA-64 Play: Next generation ISA ©byTien-FuChen@CCU Adv CPU-11 VLIW Processor Architectures for DSP !Why VLIW Architecture? " VLIW is especially suitable for DSP applications " DSP algorithms are dominated by data-parallel computation and consist of core tight loops executed repeatedly. # Convolution, FFT " Single-chip high-performance VLIW processors with multiple FUs are commercially available. ©byTien-FuChen@CCU Adv CPU-12 VLIW Architecture ! Instruction-Level Parallelism (ILP) " Multiple different FUs in parallel. " Each instruction contains an operation code for each FU. ! Data-Level Parallelism (DLP) " Single FU is divided to perform the same operation on multiple smaller precision data. ! Instruction Set Architecture " Each processor has its own instruction to further enhance the performance. " Complex_multiply for FFT and autocorrelation algorithms ! Memory I/O " Via DMA controller " Predictable access time " Hide the data transfer time behind the processing time by independent work " Real-time requirement ©byTien-FuChen@CCU Adv CPU-13 TI TMS320C62 ! 256 bits per instr. (8x32bit) ! 2 clusters " Each with 4 Fus " Each with 16 32-bit register " One cross-cluster read port each way ! Two integer ALU support partitioned instr. ! Programmable DMA controller with two 32-kB memory ©byTien-FuChen@CCU Adv CPU-14 TI TMS320C80 • ILP, DLP, multiple processors on single chip • 4 ADSP (DSP+VLIW) – A 16-bit MUL, a 3-input 32-bit ALU, a branch unit, 2 load/strore units. – 3 zero-overhead loop ! DMA (Transfer Controller) controllers " Support various types of – One 2-KB I-cache, Four 2- data transfers with complex KB D-cache address calculation. • RISC processor ! No support for some –FPU:FPMAC powerful instrs. – A 4-KB I-cache, A 4-KB D- " SAD, inner-product ©byTien-FuChen@CCUcache Adv CPU-15 Philips Trimedia TM1000 ! 27 Fus, coprocessor for MPEG-2 decoding ! NO DMA controller, 16 KB D-cache, 32 KB I-cache ! One PCI port, MM I/O ! Issue 5 simultaneous instr per cycle ! DSPALU: partitioned Instr. ! DSPMUL: partitioned instr. Inner-product ©byTien-FuChen@CCU Adv CPU-16 Transmeta’s Crusoe Processor, TM5400 ! General purpose microprocessor based on VLIW. " Difficult: Binary code compatibility, Very complicated compiler ! Support X86 (MS Windows, Linux): " X86 code morphing software using dynamic binary code translation. ! 2 interger units, 1 FPU, 1 load/store, 1 branch " 64 KB 16-way L1 D-cache " 64 KB 8-way I-cache " 256 KB L2 cache " 64 32 bit GPR " VLIW instr size: 64, 128 bits, 4 instr per cycle. Support partioned instr. Crusoe: A low-power x86 processor ! Crusoe processor = Software + hardware Code Morphing software • Dynamically translates x86 instructions into VLIW instructions 3/4 • Provides x86 compatibility • Optimization and scheduling by software VLIW hardware • 128 bit Very long Instruction Word Processor • Simple and fast 1/4 • Fewer transistors Low power x86 compatibility PC performance ©byTien-FuChen@CCU Adv CPU-18 Crusoe VLIW ©byTien-FuChen@CCU Adv CPU-19 Code Morphing Software A dynamic translation system, reside in a ROM, First program to start executing when booting ! Drawing the H/W and S/W line " Software: decoding x86 instructions and generating parallel molecule " Hardware: execute using a simple, high-speed VLIW engine ! Decoding and scheduling " Translation cache : CMS translates instructions once, saving the resulting translation for re-use $ Skip the translation in the next time Play: Transmeta Crusoe ©byTien-FuChen@CCU Adv CPU-20.
Recommended publications
  • SIMD Extensions
    SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel.
    [Show full text]
  • The Technology Behind Crusoe™ Processors
    The Technology Behind Crusoe™ Processors Low-power x86-Compatible Processors Implemented with Code Morphing™ Software Alexander Klaiber Transmeta Corporation January 2000 The Technology Behind Crusoe™ Processors Property of: Transmeta Corporation 3940 Freedom Circle Santa Clara, CA 95054 USA (408) 919-3000 http://www.transmeta.com The information contained in this document is provided solely for use in connection with Transmeta products, and Transmeta reserves all rights in and to such information and the products discussed herein. This document should not be construed as transferring or granting a license to any intellectual property rights, whether express, implied, arising through estoppel or otherwise. Except as may be agreed in writing by Transmeta, all Transmeta products are provided “as is” and without a warranty of any kind, and Transmeta hereby disclaims all warranties, express or implied, relating to Transmeta’s products, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose and non-infringement of third party intellectual property. Transmeta products may contain design defects or errors which may cause the products to deviate from published specifications, and Transmeta documents may contain inaccurate information. Transmeta makes no representations or warranties with respect to the accuracy or completeness of the information contained in this document, and Transmeta reserves the right to change product descriptions and product specifications at any time, without notice. Transmeta products have not been designed, tested, or manufactured for use in any application where failure, malfunction, or inaccuracy carries a risk of death, bodily injury, or damage to tangible property, including, but not limited to, use in factory control systems, medical devices or facilities, nuclear facilities, aircraft, watercraft or automobile navigation or communication, emergency systems, or other applications with a similar degree of potential hazard.
    [Show full text]
  • Crusoe Processor Model TM3120
    Crusoe Processor Model TM3120 CrusoeTM Processor Model TM3120 Features • VLIW processor and x86 Code MorphingTM software provide x86-compatible mobile platform solution • Processor core operates at 333, 366, and 400 MHz • Integrated 64K-byte instruction cache and 32K-byte data cache • Integrated northbridge core logic features facilitate compact system designs • SDR SDRAM memory controller with 66-133 MHz, 3.3V interface • PCI bus controller (PCI 2.1 compliant) with 33 MHz, 3.3V interface • Advanced power management features and very-low power operation extend mobile battery life • Full System Management Mode (SMM) support • Compact 474-pin ceramic BGA package The Transmeta Crusoe Processor is a very-low power, high-speed microprocessor based on an advanced VLIW core architecture. When used in conjunction with Transmeta’s x86 Code Morphing software, the Crusoe Pro- cessor provides x86-compatible software execution using dynamic binary code translation, without requiring code recompilation. In addition to the VLIW core, the processor incorporates a 64K-byte instruction cache, 32K-byte data cache, 64-bit SDR SDRAM memory controller, and 32-bit PCI controller. These additional functional units, which are typically part of the core system logic that surrounds the microprocessor, allow the Crusoe Processor to provide a highly integrated and cost effective platform solution for the x86 mobile market. The processor core operates from a 1.5V supply, resulting in very low power consumption, even at high operat- ing frequencies. Crusoe processor power consumption during typical operation is as low as 15 milliwatts. Transmeta, Crusoe, and Code Morphing are trademarks of Transmeta Corporation. 1/18/00 Transmeta Corporation Crusoe Processor 1.0 Architecture The Crusoe Processor incorporates integer and floating point execution units, instruc- tion and data caches, a memory management unit, and multimedia instructions.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Dissertation
    An Agile and Rapidly Reconfigurable Test Bed for Hardware-Based Security Features by Daniel Smith Beard Master of Science Computer Information Systems Florida Institute of Technology 2009 Bachelor of Science Engineering, Electrical Option University of South Florida 1980 A dissertation submitted to the College of Engineering and Computer Science at Florida Institute of Technology in partial fulfillment of the requirements for the degree of Doctorate of Philosophy in Computer Science Melbourne, Florida December, 2019 © Copyright 2019 Daniel Smith Beard All Rights Reserved The author grants permission to make single copies. We the undersigned committee hereby approve the attached dissertation An Agile and Rapidly Reconfigurable Test Bed for Hardware-Based Security Features by Daniel Smith Beard Marco Carvalho, Ph.D. Professor and Dean College of Engineering and Science Committee Chair Stephen K. Cusick, J.D. Associate Professor College of Aeronautics Outside Committee Member William H. Allen, Ph.D. Associate Professor Computer Engineering and Sciences Committee Member Heather Crawford, Ph.D. Assistant Professor Computer Engineering and Sciences Committee Member Philip J. Bernhard, Ph.D. Associate Professor and Department Head Computer Engineering and Sciences ABSTRACT Title: An Agile and Rapidly Reconfigurable Test Bed for Hardware-Based Security Features Author: Daniel Smith Beard Major Advisor: Marco Carvalho, Ph.D. Current general-purpose computing hardware and the software that runs on it have evolved over more than a half century from large mainframe systems in corporate, military, and research use to interconnected commodity devices more common than wrist watches. Computational power, storage capacity, and communication capa- bilities have increased in wonderful and staggering ways; however, when we read about the latest vulnerability or data breach it seems that cybersecurity is stuck somewhere between 1983 when Matthew Broderick first heard a synthesized voice ask \Shall we play a game?", [93] and 1988 when the Morris worm hit the Internet [116].
    [Show full text]
  • The Transmeta Code Morphing Software
    Appeared in the Proceedings of the First Annual IEEE/ACM International Symposium on Code Generation and Optimization, 27-29 March 2003, San Francisco, California The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, Jim Mattson Transmeta Corporation, 3990 Freedom Circle, Santa Clara, CA 95054 VLIW instruction set architecture (ISA) with little Abstract resemblance to the external ISA (x86) that it presents to users. This approach allows a simple, compact, low- Transmeta’s Crusoe microprocessor is a full, system- power microprocessor implementation, with the freedom level implementation of the x86 architecture, comprising to modify the internal ISA between generations, while a native VLIW microprocessor with a software layer, the supporting the broad range of legacy x86 software Code Morphing Software (CMS), that combines an in- available. Producing robust runtime performance terpreter, dynamic binary translator, optimizer, and run- comparable to competing x86 implementations requires time system. In its general structure, CMS resembles that CMS deal effectively with a number of difficult other binary translation systems described in the litera- problems that have usually been ignored in the literature ture, but it is unique in several respects. The wide range on binary translation and dynamic optimization. of PC workloads that CMS must handle gracefully in In this paper, we will sketch the structure of CMS, but real-life operation, plus the need for full system-level x86 our focus will be on several of the challenges it faced that compatibility, expose several issues that have received set it apart from other systems described in the literature, little or no attention in previous literature, such as excep- and on the solutions we implemented.
    [Show full text]
  • Dynamic Binary Translation and Optimization Erik R. Altman Kemal
    Dynamic Binary Translation and Optimization Erik R. Altman Kemal Ebcioglu˘ IBM T.J. Watson Research Center Micro-33 December 13, 2000 Timetable for Micro-33 Tutorial on Dynamic Binary Translation and Optimization Wednesday, December 13, 2000 2:30 - 2:50 Kemal Ebcioglu: Future Challenges 2:50 - 2:55 Erik Altman: DAISY Demo 2:55 - 3:20 Erik Altman: Binary Translation Issues 3:20 - 3:35 Break 3:35 - 5:00 Erik Altman DAISY, Crusoe, Dynamo 5:00 - 5:15 Break 5:15 - 5:45 Kemal Ebcioglu: LaTTe IBM DAISY DAISY Schematic AIX Applications AIX DAISY Software PowerPC VLIW Machine DAISY Memory Map DAISY Memory o Translator o Translated Code o Side Tables o System Software PowerPC Memory DAISY System L3 Cache DAISY VLIW 6xx Bus DAISY PowerPC Flash ROM Flash ROM Memory PCI Bus Controller DiskVideo Network Keyboard Memory PowerPC DAISY DAISY Source Architecture ¯ Most DAISY work uses PowerPC as the source architec- ture. ¯ But, the DAISY approach is general: – ICS’2000 reported how to use DAISY with S/390 as the source architecture. – The 1996 DAISY Research Report discussed PowerPC, S/390 and x86 as source architectures. Optimization Unit ¯ Page: – Used by first versions of DAISY. ¯ Arbitrary: with translated code forming a tree region: – Currently used by DAISY. ¯ Basic Block ¯ Single path/trace ¯ Loop ¯ Function Problems of Page ¯ Cross page boundary, have indirect branch, or other serial- izer every 10-15 instructions. ¯ Generate code for all reachable paths on page, even those not executed. ¯ Unnecessary serializations to limit code explosion. A B C D E ¯ 16 paths A-E.
    [Show full text]
  • TM5800 Data Book
    TM5800 Version 2.1 Data Book Crusoe Processors Described in this Document Processor Memory Package Max Core SKU Marking L2 Cache Frequency Core Voltage Tj Max TDP DDR SDR TM5800-1000-ULP 5800T100021 512 KBytes 1000 MHz 0.80-1.25 V 80/100 °C 6.5 W Yes Yes CoolRun80 DDR/SDR TM5800-1000-VLP 5800N100021 512 KBytes 1000 MHz 0.80-1.30 V 80/100 °C 7.5 W Yes Yes CoolRun80 DDR/SDR TM5800-1000-LP 5800P100021 512 KBytes 1000 MHz 0.80-1.35 V 80/100 °C 8.5 W Yes Yes CoolRun80 DDR/SDR TM5800-1000 5800R100021 512 KBytes 1000 MHz 0.80-1.40 V (AV-1) 80/100 °C 9.5 W Yes Yes CoolRun80 0.75-1.25 V (AV-2) DDR/SDR AVC TM5800-800-ULP 5800U080021 512 KBytes 800 MHz 0.80-1.20 V 100 °C 5.5 W No Yes 100°C SDR-only September 2, 2003 TM5800 Version 2.1 Data Book September 2, 2003 Crusoe™ Processor Model TM5800 Version 2.1 Data Book Revision 2.01 Revision History: 2.00 February 12, 2003 - First release TM5800 version 2.1processor specifications 2.01 September 2, 2003 - Added new 1 GHz 9.5 W AVC SKU, changed 1 GHz ULP SKU to non-AVC, changed 867 MHz ULP SKU to 800 MHz, updated DDR and SDR memory interface sections in Chapter 1. Property of: Transmeta Corporation 3990 Freedom Circle Santa Clara, CA 95054 USA (408) 919-3000 http://www.transmeta.com The information contained in this document is provided solely for use in connection with Transmeta products, and Transmeta reserves all rights in and to such information and the products discussed herein.
    [Show full text]
  • Seminar Report
    SEMINAR REPORT (SUBMITTED IN PARTIAL FULFILMENT OF THE AWARD OF DEGREE OF BACHELOR OF TECHNOLOGY) ON SESSION 2009-2010 UNDER THE GUIDANCE OF Mrs. Nida Haseeb (Seminar Co-ordinator) SUBMITTED BY Vikas Kumar Mishra IV YEAR INFORMATION TECHNOLOGY ROLL No. : 0600115059 INTEGRAL UNIVERSITY LUCKNOW Phone No.: 0522-2890812, 2890730, 3096117 Fax: 0522-2890809 Web: www.integraluniversity.ac.in SEMINAR REPORT on “CRUSOE PROCESSOR” CERTIFICATE This is to certify that VIKAS KUMAR MISHRA has completed necessary Seminar work & prepared the bonafied report on CRUSOE -PROCESSOR in satisfactory manner as the partial fulfillment for the requirement of the degree of B.Tech (Information Technology) Of INTEGRAL UNIVERSITY, LUCKNOW under the guidance of his faculty within his time limit and his full effort to make his Seminar good. Mr. M. M. Tripathi Mr. Rizwan Beg (Seminar Co-ordinator) (HOD - CSE/IT) Mrs. Nida Haseeb (Seminar Co-ordinator) Miss. Nikhat Akhtar (Seminar Co-ordinator) Submitted By: VIKAS KUMAR MISHRA (0600115059) 2 SEMINAR REPORT on “CRUSOE PROCESSOR” ACKNOWLEDGEMENT I take the opportunity to express my sincere thanks to Mrs. Nida Haseeb (Department Of CSE/IT) for her valuable advice and guidance for the success of this seminar. I also thank Dr. Rizwan Beg, HOD, (CSE/IT Dept). and all other staff of the department for their kind co-operation extended to me. Also I am extending my gratitude to everyone who helped me in the successful presentation of this seminar. I am thankful to all my friends who helped me in completing my seminar a successful one. I am also thankful to all the people who were directly or indirectly involved me in helping to complete my seminar report.
    [Show full text]
  • Crusoe Processor Model TM5400
    Crusoe Processor Model TM5400 CrusoeTM Processor Model TM5400 Features • VLIW processor and x86 Code MorphingTM software provide x86-compatible mobile platform solution • Processor core operates at 500-700 MHz • Integrated 64K-byte L1 instruction cache, 64K-byte L1 data cache, and 256K-byte L2 write-back cache • Integrated northbridge core logic features facilitate compact system designs • DDR SDRAM memory controller with 100-133 MHz, 2.5V interface • SDR SDRAM memory controller with 66-133 MHz, 3.3V interface • PCI bus controller (PCI 2.1 compliant) with 33 MHz, 3.3V interface • LongRunTM advanced power management with ultra-low power operation extends mobile battery life • 1-2 W @ 500-700 MHz, 1.2-1.6V running typical multimedia applications • 30 mW in deep sleep • Full System Management Mode (SMM) support • Compact 474-pin ceramic BGA package The Transmeta Crusoe Processor is an ultra-low power, high-speed microprocessor based on an advanced VLIW core architecture. When used in conjunction with Transmeta’s x86 Code Morphing software, the Crusoe Processor provides x86-compatible software execution using dynamic binary code translation, without requiring code recompilation. In addi- tion to the VLIW core, the processor incorporates separate 64K-byte instruction and data caches, a large 256K-byte L2 write-back cache, 64-bit DDR SDRAM memory controller, 64-bit SDR SDRAM memory controller, and 32-bit PCI con- troller. These additional functional units, which are typically part of the core system logic that surrounds the microproces- sor, allow the Crusoe Processor to provide a highly integrated and cost effective platform solution for the x86 mobile market.
    [Show full text]
  • IEEE Paper Template in A4
    Dipali M. Dhaskat et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.4, April- 2014, pg. 944-953 Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 3, Issue. 4, April 2014, pg.944 – 953 RESEARCH ARTICLE V-ISA use in Transmeta Crusoe Processor Dipali M. Dhaskat1, P. P. Karde2 ¹ Computer Science and information technology, HVPM COET, Amravati, India ² Computer Science and information technology, HVPM COET, Amravati, India 1 [email protected]; 2 [email protected] Abstract— A virtual instruction set architecture (V-ISA) implemented via a processor-specific software translation layer can provide great flexibility to processor designers .Recent examples such as DAISY and Crusoe. Crusoe is the new microprocessor which has been designed especially for the mobile computing market. This microprocessor was developed by a small Silicon Valley start-up company called Transmeta Corp in the January 2000. After five years of secret toil at an expenditure of $100. This processor was based on the x86 architecture with a software layer called Code Morphing Software(CMS) comprised of an interpreter, a run time system, and code optimizer running on top of the processor. Crusoe is the first processor whose instruction set is implemented in the software; the benefit of that being - the software could ―learn‖ the behaviour of a program as it runs, improving with time by recognizing patterns previously encountered and making smart decisions based on those patterns, thus making it the first ―smart‖ processor.
    [Show full text]
  • Bringing Virtualization to the X86 Architecture with the Original Vmware Workstation
    12 Bringing Virtualization to the x86 Architecture with the Original VMware Workstation EDOUARD BUGNION, Stanford University SCOTT DEVINE, VMware Inc. MENDEL ROSENBLUM, Stanford University JEREMY SUGERMAN, Talaria Technologies, Inc. EDWARD Y. WANG, Cumulus Networks, Inc. This article describes the historical context, technical challenges, and main implementation techniques used by VMware Workstation to bring virtualization to the x86 architecture in 1999. Although virtual machine monitors (VMMs) had been around for decades, they were traditionally designed as part of monolithic, single-vendor architectures with explicit support for virtualization. In contrast, the x86 architecture lacked virtualization support, and the industry around it had disaggregated into an ecosystem, with different ven- dors controlling the computers, CPUs, peripherals, operating systems, and applications, none of them asking for virtualization. We chose to build our solution independently of these vendors. As a result, VMware Workstation had to deal with new challenges associated with (i) the lack of virtual- ization support in the x86 architecture, (ii) the daunting complexity of the architecture itself, (iii) the need to support a broad combination of peripherals, and (iv) the need to offer a simple user experience within existing environments. These new challenges led us to a novel combination of well-known virtualization techniques, techniques from other domains, and new techniques. VMware Workstation combined a hosted architecture with a VMM. The hosted architecture enabled a simple user experience and offered broad hardware compatibility. Rather than exposing I/O diversity to the virtual machines, VMware Workstation also relied on software emulation of I/O devices. The VMM combined a trap-and-emulate direct execution engine with a system-level dynamic binary translator to ef- ficiently virtualize the x86 architecture and support most commodity operating systems.
    [Show full text]