The Pentium 4 and the G4e: an Architectural Comparison
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance
White Paper Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation White Paper Inside Intel®Core™ Microarchitecture Introduction Introduction 2 The Intel® Core™ microarchitecture is a new foundation for Intel®Core™ Microarchitecture Design Goals 3 Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art multi-core optimized Delivering Energy-Efficient Performance 4 and power-efficient microarchitecture is designed to deliver Intel®Core™ Microarchitecture Innovations 5 increased performance and performance-per-watt—thus increasing Intel® Wide Dynamic Execution 6 overall energy efficiency. This new microarchitecture extends the energy efficient philosophy first delivered in Intel's mobile Intel® Intelligent Power Capability 8 microarchitecture found in the Intel® Pentium® M processor, and Intel® Advanced Smart Cache 8 greatly enhances it with many new and leading edge microar- Intel® Smart Memory Access 9 chitectural innovations as well as existing Intel NetBurst® microarchitecture features. What’s more, it incorporates many Intel® Advanced Digital Media Boost 10 new and significant innovations designed to optimize the Intel®Core™ Microarchitecture and Software 11 power, performance, and scalability of multi-core processors. Summary 12 The Intel Core microarchitecture shows Intel’s continued Learn More 12 innovation by delivering both greater energy efficiency Author Biographies 12 and compute capability required for the new workloads and usage models now making their way across computing. With its higher performance and low power, the new Intel Core microarchitecture will be the basis for many new solutions and form factors. In the home, these include higher performing, ultra-quiet, sleek and low-power computer designs, and new advances in more sophisticated, user-friendly entertainment systems. -
Pentium II Processor Performance Brief
PentiumÒ II Processor Performance Brief January 1998 Order Number: 243336-004 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel’s Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The Pentium® II processor may contain design defects or errors known as errata. Current characterized errata are available on request. MPEG is an international standard for video compression/decompression promoted by ISO. Implementations of MPEG CODECs, or MPEG enabled platforms may require licenses from various entities, including Intel Corporation. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be obtained from by calling 1-800-548-4725 or by visiting Intel’s website at http://www.intel.com. -
Intel Architecture Optimization Manual
Intel Architecture Optimization Manual Order Number 242816-003 1997 5/5/97 11:38 AM FRONT.DOC Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The Pentium®, Pentium Pro and Pentium II processors may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Such errata are not covered by Intel’s warranty. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be obtained from: Intel Corporation P.O. -
Validating the Intel® Pentium® 4 Microprocessor
16.1 9DOLGDWLQJ WKH ,QWHO 3HQWLXP 0LFURSURFHVVRU %RE%HQWOH\ ,QWHO&RUSRUDWLRQ 1(WK$YHQXH +LOOVERUR2UHJRQ86$ EREEHQWOH\#LQWHOFRP ABSTRACT 2. OVERVIEW Developing a new leading edge IA-32 microprocessor is an The Pentium® 4 processor is Intel’s most advanced IA-32 immensely complicated undertaking. In the case of the Pentium® microprocessor, incorporating a host of new microarchitectural 4 processor, the microarchitecture is significantly more complex features including a 400-MHz system bus, hyper-pipelined than any previous IA-32 microprocessor and the implementation technology, advanced dynamic execution, rapid execution engine, borrows almost nothing from any previous implementation. This advanced transfer cache, execution trace cache, and Streaming paper describes how we went about the task of finding bugs in the SIMD (Single Instruction, Multiple Data) Extensions 2 (SSE2). Pentium® 4 processor design prior to initial silicon, and what we For the most part, we applied similar tools and methodologies to found along the way. validating the Pentium® 4 processor that we had used previously General Terms on the Pentium® Pro processor. However, we developed new Management, Verification. methodologies and tools in response to lessons learnt from previous projects, and to address some of the new challenges that the Pentium® 4 processor design presented from a validation 1. INTRODUCTION perspective. In particular, the use of Formal Verification, Cluster Validation case studies are relatively rare in the literature of Test Environments and focused Power Reduction Validation were computer architecture and design ([1] and [2] contain lists of either new or greatly extended from previous projects; each of some recent papers) and case studies of commercial these is discussed in more detail in a section of this paper microprocessors are even rarer. -
The Microarchitecture of the Pentium 4 Processor
The Microarchitecture of the Pentium 4 Processor Glenn Hinton, Desktop Platforms Group, Intel Corp. Dave Sager, Desktop Platforms Group, Intel Corp. Mike Upton, Desktop Platforms Group, Intel Corp. Darrell Boggs, Desktop Platforms Group, Intel Corp. Doug Carmean, Desktop Platforms Group, Intel Corp. Alan Kyker, Desktop Platforms Group, Intel Corp. Patrice Roussel, Desktop Platforms Group, Intel Corp. Index words: Pentium® 4 processor, NetBurst™ microarchitecture, Trace Cache, double-pumped ALU, deep pipelining provides an in-depth examination of the features and ABSTRACT functions of the Intel NetBurst microarchitecture. This paper describes the Intel® NetBurst™ ® The Pentium 4 processor is designed to deliver microarchitecture of Intel’s new flagship Pentium 4 performance across applications where end users can truly processor. This microarchitecture is the basis of a new appreciate and experience its performance. For example, family of processors from Intel starting with the Pentium it allows a much better user experience in areas such as 4 processor. The Pentium 4 processor provides a Internet audio and streaming video, image processing, substantial performance gain for many key application video content creation, speech recognition, 3D areas where the end user can truly appreciate the applications and games, multi-media, and multi-tasking difference. user environments. The Pentium 4 processor enables real- In this paper we describe the main features and functions time MPEG2 video encoding and near real-time MPEG4 of the NetBurst microarchitecture. We present the front- encoding, allowing efficient video editing and video end of the machine, including its new form of instruction conferencing. It delivers world-class performance on 3D cache called the Execution Trace Cache. -
Intel(R) Pentium(R) 4 Processor on 90 Nm Process Datasheet
Intel® Pentium® 4 Processor on 90 nm Process Datasheet 2.80 GHz – 3.40 GHz Frequencies Supporting Hyper-Threading Technology1 for All Frequencies with 800 MHz Front Side Bus February 2005 Document Number: 300561-003 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The Intel® Pentium® 4 processor on 90 nm process may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. 1Hyper-Threading Technology requires a computer system with an Intel® Pentium® 4 processor supporting HT Technology and a Hyper-Threading Technology enabled chipset, BIOS and operating system. -
Design and VHDL Implementation of an Application-Specific Instruction Set Processor
Design and VHDL Implementation of an Application-Specific Instruction Set Processor Lauri Isola School of Electrical Engineering Thesis submitted for examination for the degree of Master of Science in Technology. Espoo 19.12.2019 Supervisor Prof. Jussi Ryynänen Advisor D.Sc. (Tech.) Marko Kosunen Copyright © 2019 Lauri Isola Aalto University, P.O. BOX 11000, 00076 AALTO www.aalto.fi Abstract of the master’s thesis Author Lauri Isola Title Design and VHDL Implementation of an Application-Specific Instruction Set Processor Degree programme Computer, Communication and Information Sciences Major Signal, Speech and Language Processing Code of major ELEC3031 Supervisor Prof. Jussi Ryynänen Advisor D.Sc. (Tech.) Marko Kosunen Date 19.12.2019 Number of pages 66+45 Language English Abstract Open source processors are becoming more popular. They are a cost-effective option in hardware designs, because using the processor does not require an expensive license. However, a limited number of open source processors are still available. This is especially true for Application-Specific Instruction Set Processors (ASIPs). In this work, an ASIP processor was designed and implemented in VHDL hardware description language. The design was based on goals that make the processor easily customizable, and to have a low resource consumption in a System- on-Chip (SoC) design. Finally, the processor was implemented on an FPGA circuit, where it was tested with a specially designed VGA graphics controller. Necessary software tools, such as an assembler were also implemented for the processor. The assembler was used to write comprehensive test programs for testing and verifying the functionality of the processor. This work also examined some future upgrades of the designed processor. -
Comparative Architectures
Comparative Architectures CST Part II, 16 lectures Lent Term 2006 David Greaves [email protected] Slides Lectures 1-13 (C) 2006 IAP + DJG Course Outline 1. Comparing Implementations • Developments fabrication technology • Cost, power, performance, compatibility • Benchmarking 2. Instruction Set Architecture (ISA) • Classic CISC and RISC traits • ISA evolution 3. Microarchitecture • Pipelining • Super-scalar { static & out-of-order • Multi-threading • Effects of ISA on µarchitecture and vice versa 4. Memory System Architecture • Memory Hierarchy 5. Multi-processor systems • Cache coherent and message passing Understanding design tradeoffs 2 Reading material • OHP slides, articles • Recommended Book: John Hennessy & David Patterson, Computer Architecture: a Quantitative Approach (3rd ed.) 2002 Morgan Kaufmann • MIT Open Courseware: 6.823 Computer System Architecture, by Krste Asanovic • The Web http://bwrc.eecs.berkeley.edu/CIC/ http://www.chip-architect.com/ http://www.geek.com/procspec/procspec.htm http://www.realworldtech.com/ http://www.anandtech.com/ http://www.arstechnica.com/ http://open.specbench.org/ • comp.arch News Group 3 Further Reading and Reference • M Johnson Superscalar microprocessor design 1991 Prentice-Hall • P Markstein IA-64 and Elementary Functions 2000 Prentice-Hall • A Tannenbaum, Structured Computer Organization (2nd ed.) 1990 Prentice-Hall • A Someren & C Atack, The ARM RISC Chip, 1994 Addison-Wesley • R Sites, Alpha Architecture Reference Manual, 1992 Digital Press • G Kane & J Heinrich, MIPS RISC Architecture -
CS152: Computer Systems Architecture Pipelining
CS152: Computer Systems Architecture Pipelining Sang-Woo Jun Winter 2021 Large amount of material adapted from MIT 6.004, “Computation Structures”, Morgan Kaufmann “Computer Organization and Design: The Hardware/Software Interface: RISC-V Edition”, and CS 152 Slides by Isaac Scherson Eight great ideas ❑ Design for Moore’s Law ❑ Use abstraction to simplify design ❑ Make the common case fast ❑ Performance via parallelism ❑ Performance via pipelining ❑ Performance via prediction ❑ Hierarchy of memories ❑ Dependability via redundancy But before we start… Performance Measures ❑ Two metrics when designing a system 1. Latency: The delay from when an input enters the system until its associated output is produced 2. Throughput: The rate at which inputs or outputs are processed ❑ The metric to prioritize depends on the application o Embedded system for airbag deployment? Latency o General-purpose processor? Throughput Performance of Combinational Circuits ❑ For combinational logic o latency = tPD o throughput = 1/t F and G not doing work! PD Just holding output data X F(X) X Y G(X) H(X) Is this an efficient way of using hardware? Source: MIT 6.004 2019 L12 Pipelined Circuits ❑ Pipelining by adding registers to hold F and G’s output o Now F & G can be working on input Xi+1 while H is performing computation on Xi o A 2-stage pipeline! o For input X during clock cycle j, corresponding output is emitted during clock j+2. Assuming ideal registers Assuming latencies of 15, 20, 25… 15 Y F(X) G(X) 20 H(X) 25 Source: MIT 6.004 2019 L12 Pipelined Circuits 20+25=45 25+25=50 Latency Throughput Unpipelined 45 1/45 2-stage pipelined 50 (Worse!) 1/25 (Better!) Source: MIT 6.004 2019 L12 Pipeline conventions ❑ Definition: o A well-formed K-Stage Pipeline (“K-pipeline”) is an acyclic circuit having exactly K registers on every path from an input to an output. -
Quantum Chemical Computational Methods Have Proved to Be An
34671 K. Immanuvel Arokia James et al./ Elixir Comp. Sci. & Engg. 85 (2015) 34671-34676 Available online at www.elixirpublishers.com (Elixir International Journal) Computer Science and Engineering Elixir Comp. Sci. & Engg. 85 (2015) 34671-34676 Outerloop pipelining in FFT using single dimension software pipelining K. Immanuvel Arokia James1 and M. J. Joyce Kiruba2 1Department of EEE VEL Tech Multi Tech Dr. RR Dr. SR Engg College, Chennai, India. 2Department of ECE, Dr. MGR University, Chennai, India. ARTICLE INFO ABSTRACT Article history: The aim of the paper is to produce faster results when pipelining above the inner most loop. Received: 19 May 2012; The concept of outerloop pipelining is tested here with Fast Fourier Transform. Reduction Received in revised form: in the number of cycles spent flushing and filling the pipeline and the potential for data reuse 22 August 2015; is another advantage in the outerloop pipelining. In this work we extend and adapt the Accepted: 29 August 2015; existing SSP approach to better suit the generation of schedules for hardware, specifically FPGAs. We also introduce a search scheme to find the shortest schedule available within the Keywords pipelining framework to maximize the gains in pipelining above the innermost loop. The FFT, hardware compilers apply loop pipelining to increase the parallelism achieved, but Pipelining, VHDL, pipelining is restricted to the only innermost level in nested loop. In this work we extend and Nested loop, adapt an existing outer loop pipelining approach known as Single Dimension Software FPGA, Pipelining to generate schedules for FPGA hardware coprocessors. Each loop level in nine Integer Linear Programming (ILP). -
The Intel X86 Microarchitectures Map Version 2.0
The Intel x86 Microarchitectures Map Version 2.0 P6 (1995, 0.50 to 0.35 μm) 8086 (1978, 3 µm) 80386 (1985, 1.5 to 1 µm) P5 (1993, 0.80 to 0.35 μm) NetBurst (2000 , 180 to 130 nm) Skylake (2015, 14 nm) Alternative Names: i686 Series: Alternative Names: iAPX 386, 386, i386 Alternative Names: Pentium, 80586, 586, i586 Alternative Names: Pentium 4, Pentium IV, P4 Alternative Names: SKL (Desktop and Mobile), SKX (Server) Series: Pentium Pro (used in desktops and servers) • 16-bit data bus: 8086 (iAPX Series: Series: Series: Series: • Variant: Klamath (1997, 0.35 μm) 86) • Desktop/Server: i386DX Desktop/Server: P5, P54C • Desktop: Willamette (180 nm) • Desktop: Desktop 6th Generation Core i5 (Skylake-S and Skylake-H) • Alternative Names: Pentium II, PII • 8-bit data bus: 8088 (iAPX • Desktop lower-performance: i386SX Desktop/Server higher-performance: P54CQS, P54CS • Desktop higher-performance: Northwood Pentium 4 (130 nm), Northwood B Pentium 4 HT (130 nm), • Desktop higher-performance: Desktop 6th Generation Core i7 (Skylake-S and Skylake-H), Desktop 7th Generation Core i7 X (Skylake-X), • Series: Klamath (used in desktops) 88) • Mobile: i386SL, 80376, i386EX, Mobile: P54C, P54LM Northwood C Pentium 4 HT (130 nm), Gallatin (Pentium 4 Extreme Edition 130 nm) Desktop 7th Generation Core i9 X (Skylake-X), Desktop 9th Generation Core i7 X (Skylake-X), Desktop 9th Generation Core i9 X (Skylake-X) • Variant: Deschutes (1998, 0.25 to 0.18 μm) i386CXSA, i386SXSA, i386CXSB Compatibility: Pentium OverDrive • Desktop lower-performance: Willamette-128 -
I386-Based Computer Architecture and Elementary Data Operations
Leonardo Journal of Sciences Issue 3, July-December 2003 ISSN 1583-0233 p. 9-23 I386-Based Computer Architecture and Elementary Data Operations Lorentz JÄNTSCHI Technical University of Cluj-Napoca, Romania http://lori.academicdirect.ro Abstract Computers using in a very large field of sciences and not only in sciences is a reality now. Research, evidence, automation, entertainment, communication are makes by computer. To create easy to use, professional, and efficient applications is not an easy task. Compatibility problems, when data are ports from different applications, are frequently solves using operating system modules (such as ODBC – open database connectivity). The aim of this paper was to describe i.386-based computer architecture and to debate the elementary data operators. Keywords i386 computer architecture, elementary data operations Introduction Computers using in a very large field of sciences and not only in sciences is a reality now. Research, evidence, automation, entertainment, communication are makes by computer. The peoples who work with the computer are splits in two categories. Users want at key applications to exploit them. May be the majority of computer are simply users that uses 9 http://ljs.academicdirect.ro i386-Based Computer Architecture and Elementary Data Operations Lorentz JÄNTSCHI at key applications at office in a job specific action. Developers are computer specialists, which use the software and the hardware knowledge to create, test, and upgrade computers and programs. In computers industry (and not only) there exists so called brands. Most of the peoples it heard about Microsoft [1] or IBM [2]. These are brands. A brand is generally a corporation, frequently a multinational one, which produce a significant quantity (percents of total) of specific products for world users.