Work Factorization for Efficient Throughput Architectures

Total Page:16

File Type:pdf, Size:1020Kb

Work Factorization for Efficient Throughput Architectures Work factorization for efficient throughput architectures Sylvain Collange Departamento de Ciência da Computação, ICEx Universidade Federal de Minas Gerais [email protected] February 01, 2012 GPGPU in HPC, today Graphics Processing Unit (GPU) Made for video games: mass market Low unit price, amortized R&D Inexpensive, high-performance parallel processor 2002: General-Purpose computation on GPU (GPGPU) 2012: 3 out of top 5 supercomputers use GPUs #4 Dawning Nebulae (China) #2 Tianhe-1A (China) #5 Tsubame 2.0 (Japan) 2 GPGPU in the future? Yesterday (2000-2010) Homogeneous multi-core Discrete components Central Graphics Processing Unit Processing Today (2011-...) (CPU) Unit (GPU) Chip-level integration Intel Sandy Bridge AMD Fusion NVIDIA Denver/Maxwell project… Throughput- optimized Many embedded SoCs cores Latency- Tomorrow optimized Hardware Heterogeneous multi-core cores accelerators GPUs to blend into Heterogeneous multi-core chip throughput-optimized cores? 3 Outline Background: GPU architecture requirements From a sequential processor to a GPU Parallel control regularity Parallel memory locality Parallel value locality 4 What do we need GPUs for? 1.3D graphics rendering for games Complex texture mapping, lighting computations… 2.Computer Aided Design workstations Complex geometry 3.GPGPU Complex synchronization, data movements One chip to rule them all Find the common denominator 5 The (simplest) graphics rendering pipeline Primitives (triangles…) Vertices Fragment shader Textures Vertex shader Z-Compare Blending Clipping, Rasterization Attribute interpolation Pixels Framebuffer Fragments Z-Buffer Programmable Parametrizable stage stage 6 How much performance do we need … to run 3DMark 11 at 50 frames/second? Element Per frame Per second Vertices 12.0M 600M Primitives 12.6M 630M Fragments 180M 9.0G Instructions 14.4G 720G Intel Core i7 2700K: 56 Ginsn/s peak We need to go 13x faster 7 Source: Damien Triolet, Hardware.fr Constraints Memory wall Memory speed does not increase as fast as computing speed More and more difficult to hide memory latency Power wall Power consumption of transistors does not decrease as fast as density increases Performance is now limited by power consumption 8 Latency vs. throughput Latency: time to solution CPUs Minimize time, at the expense of power Throughput: quantity of tasks processed per unit of time GPUs Assumes unlimited parallelism Minimize energy per operation 9 Amdahl's law Bounds speedup attainable on a parallel machine 1 S= S Speedup P P Ratio of parallel Time to run 1−P Time to run sequential portions N parallel portions portions N Number of processors S (speedup) N (available processors) G. Amdahl. Validity of the Single Processor Approach to Achieving Large-Scale 10 Computing Capabilities. AFIPS 1967. Why heterogeneous architectures? 1 S= Time to run P Time to run sequential portions 1−P parallel portions N Latency-optimized multi-core (CPU) Low efficiency on parallel portions: spends too much resources Throughput-optimized multi-core (GPU) Low performance on sequential portions Heterogeneous multi-core (CPU+GPU) Use the right tool for the right job Allows aggressive optimization for latency or for throughput M. Hill, M. Marty. Amdahl's law in the multicore era. IEEE Computer, 2008. 11 Programming model: multi-threading 1 vertex = 1 thread = Computes spacial coordinates, texture coordinates… 1 fragment = 1 thread = Computes color, lighting… GPGPU Bulk-Synchronous Parallel (BSP) model NVIDA CUDA, OpenCL… Barrier Program describes operation to apply SPMD: Single Program, Multiple Data 12 L. Valiant. A bridging model for parallel computation. Comm. ACM 1990. Threading granularity Coarse-grained threading X Decouple tasks to reduce conflicts and inter-thread communication e.g. MPI, OpenMP T0 T1 T2 T3 X[0..3] X[4..7] X[8..11] X[12-15] Fine-grained threading X Interleave tasks Exhibit locality: neighbor threads share memory T0 T1 T2 T3 X[0] X[1] X[2] X[3] Exhibit regularity: neighbor X[4] X[5] X[6] X[7] threads have a similar behavior X[8] X[9] X[10] X[11] X[12] X[13] X[14] X[15] e.g. OpenCL, CUDA 13 Parallel regularity Similarity in behavior between threads Regular Irregular Control Thread regularity 1 2 3 4 1 2 3 4 i=17 i=17 i=17 i=17 i=21 i=4 i=17 i=2 switch(i) { T i m case 2:... e case 17:... case 21:... } Memory load load load load load load load load regularity A[8] A[9] A[10] A[11] A[8] A[0] A[11] A[3] r=A[i] A Memory Data a=32 a=32 a=32 a=32 a=17 a=-5 a=11 a=42 regularity b=52 b=52 b=52 b=52 r=a*b b=15 b=0 b=-2 b=52 14 Outline Background: GPU architecture requirements From a sequential processor to a GPU Parallel control regularity Parallel memory locality Parallel value locality 15 First step: sequential, pipelined processor Let's build a GPU Our application: scalar-vector multiplication: X ← a∙X First idea: run each thread sequentially for i = 0 to n-1 X[i] ← a * X[i] Source code add i ← 18 Fetch move i ← 0 loop: store X[17] Decode load t ← X[i] Memory mul Execute mul t ← a×t store X[i] ← t Memory add i ← i+1 branch i<n? loop Sequential CPU Machine code 16 Homogeneous multi-core Replication of the complete execution engine move i ← slice_begin loop: load t ← X[i] add i ← 18 F add i ← 50 F mul t ← a×t IF IF M store X[i] ← t store X[17] IDD store X[49] DID e add i ← i+1 m o r branch i<slice_end? loop mul EXX mul XEX y Machine code LSUMem LSUMem Threads: T0 T1 Improves throughput thanks to explicit parallelism 17 Interleaved multi-threading Time-multiplexing of processing units Same software view mul Fetch move i ← slice_begin mul loop: Decode load t ← X[i] mul t ← a×t add i ←73 Execute store X[i] ← t add i ← 50 Memory add i ← i+1 branch i<slice_end? loop load X[89] Memory store X[72] load-store Machine code load X[17] unit store X[49] Threads: T0 T1 T2 T3 Hides latency thanks to explicit parallelism 18 Single Instruction, Multiple Threads (SIMT) Factorization of fetch/decode, load-store units Fetch 1 instruction on behalf of several threads Read 1 memory location and broadcast to several registers T0 (0-3) load F M T1 (0-3) store e D m o r T2 y (0) mul (1) mul (2) mul (3) mul X T3 (0) (1) (2) (3) Mem In NVIDIA-speak SIMT: Single Instruction, Multiple Threads Convoy of synchronized threads: warp Improves Area/Power-efficiency thanks to regularity Consolidates memory transactions: less memory pressure 19 What about SIMD? Single Instruction Multiple Data for i = 0 to n-1 step 4 X[i..i+3] ← a * X[i..i+3] Source code loop: add i ← 20 F vload T ← X[i] vstore X[16..19 M vmul T ← a×T D e vstore X[i] ← T m o r add i ← i+4 vmul X y branch i<n? loop Machine code Mem SIMD CPU Synchronous programs PRAM-like model Vectors, not threads: no “true” thread divergence allowed 20 Flynn's taxonomy Classification of parallel architectures Single Instruction Multiple Instruction F F F F F Single Data X X SISD MISD F Multiple Data F F F F X X X X X X X X SIMD MIMD M. Flynn. Some Computer Organizations and Their Effectiveness. IEEE TC 1972. 21 Flynn's taxonomy revisited …to account for multi-threading Resource: Instruction RF, Execute Memory F pipeline stage Fetch X (Data) M (Address) T T T T T T T T T T T T 0 1 2 3 0 1 2 3 0 1 2 3 Single resource F X M SIMT SDMT SAMT T T T T T T T T T T T T Multiple 0 1 2 3 0 1 2 3 0 1 2 3 resources F F F F X X X X M M M M MIMT MDMT MAMT Mostly orthogonal Mix and match to build your own _I_D_A_T pipeline! 22 Examples: conventional design points MI MD MA MT Multi-core T 0 F X M Most CPUs of today T 1 F X M T MIMD(MAMT) 2 F X M Short-vector SIMD MD X Multimedia instruction set SI SA ST T F X M extensions in CPUs 0 SIMD(SAST) X MD GPU T X 0 SI SA MT NVIDIA, AMD, Intel... GPUs T 1 F X M T SI(MDSA)MT 2 X 23 GPU design space: not just SIMT Examples How can we run SPMD threads? NVIDIA GeForce MIMD GTX 280 (2008) (multi-core) Spacial / horizontal F F F F SIMT X X X X F NVIDIA GTX 480 X X X X (2010) AMD Radeon Fine-grained Temporal / vertical 5870 (2011) SIMT multi-threading F X F X X F X X F X Switch-on-event X AMD Radeon F X multi-threading 7870 (2012) F X F X F X NVIDIA Echelon F X project (2017?) Programmer's point of view: only threads 24 25 SM16 … SM1 Core 32 … Warp 2 Warp 4 Warp 48 Core 18 Core 17 Core 16 … Warp 1 Warp 3 Warp 47 Core 2 Core 1 Up to 24576 threads in flight 1580 Gflop/s 2×16 2×16 cores/ SM / 48 SM, warps 16 SMs / 16 SMs chip SIMT: warps of 32 threads warps SIMT: Time Example GPU: NVIDIA GeForce GTX 580 GeForce GTX Example GPU: NVIDIA Outline Background: GPU architecture requirements From a sequential processor to a GPU Parallel control regularity Parallel memory locality Parallel value locality 26 Divergence statistics 50% — 85% of branches are uniform Inside a warp, all threads take the branch or none do Easy case Need to avoid costly predication on uniform branches Fully dynamic, hardware implementation 27 How to keep threads synchronized? Issue: control divergence x = 0; // Uniform condition Rules of the game if(tid > 17) { One thread per Processing Element x = 1; (PE) } All PE execute the same instruction // Divergent conditions if(tid < 2) { PEs can be individually disabled if(tid == 0) { x = 2; } else { Thread 0 Thread 1 Thread 2 Thread 3 x = 3; } 1 instruction PE 0 PE 1 PE 2 PE 3 } 28 The standard way: mask stack Code Hardware mask stack x = 0; 1 activity bit / thread // Uniform condition if(tid > 17) { 1111 s k i tid=0 tid=2 p x = 1; } 1111 // Divergent conditions tid=1 tid=3 if(tid < 2) { push if(tid == 0) { 1111 1100 push x = 2; 1111 1100 1000 } pop 1111 1100 else { push x = 3; 1111 1100 0100 } pop 1111 1100 pop } 1111 A.
Recommended publications
  • Reviving the Development of Openchrome
    Reviving the Development of OpenChrome Kevin Brace OpenChrome Project Maintainer / Developer XDC2017 September 21st, 2017 Outline ● About Me ● My Personal Story Behind OpenChrome ● Background on VIA Chrome Hardware ● The History of OpenChrome Project ● Past Releases ● Observations about Standby Resume ● Developmental Philosophy ● Developmental Challenges ● Strategies for Further Development ● Future Plans 09/21/2017 XDC2017 2 About Me ● EE (Electrical Engineering) background (B.S.E.E.) who specialized in digital design / computer architecture in college (pretty much the only undergraduate student “still” doing this stuff where I attended college) ● Graduated recently ● First time conference presenter ● Very experienced with Xilinx FPGA (Spartan-II through 7 Series FPGA) ● Fluent in Verilog / VHDL design and verification ● Interest / design experience with external communication interfaces (PCI / PCIe) and external memory interfaces (SDRAM / DDR3 SDRAM) ● Developed a simple DMA engine for PCI I/F validation w/Windows WDM (Windows Driver Model) kernel device driver ● Almost all the knowledge I have is self taught (university engineering classes were not very useful) 09/21/2017 XDC2017 3 Motivations Behind My Work ● General difficulty in obtaining meaningful employment in the digital hardware design field (too many students in the field, difficulty obtaining internship, etc.) ● Collects and repairs abandoned computer hardware (It’s like rescuing puppies!) ● Owns 100+ desktop computers and 20+ laptop computers (mostly abandoned old stuff I
    [Show full text]
  • Intel® G31 Express Chipset Product Brief
    Product Brief Intel® G31 Express Chipset Intel® G31 Express Chipset Flexibility and scalability for essential computing The Intel® G31 Express Chipset supports Intel’s upcoming 45nm processors and enables Windows Vista* premium experience for value conscious consumers. The Intel G31 Express Chipset Desktop PC platforms, combined with either the Intel® Core™2 Duo or the Intel® Core™2 Quad processor, deliver new technologies and innovating capabilities for all consumers. With a 1333MHz system bus, DDR2 memory technology and support for Windows Vista Premium, the Intel G31 Express chipset enables scalability and performance for essential computing. Support for 45nm processor technology and Intel® Fast Memory Access (Intel® FMA) provide increased system performance for today’s computing needs. The Intel G31 Express Chipset enables a Intel® I/O Controller Hub (Intel® ICH7/R) balanced platform for everyday computing needs. The Intel G31 Express Chipset elevates storage performance with Serial Intel® Viiv™ processor technology ATA (SATA) and enhancements to Intel® Matrix Storage Technology2. Intel® Viiv™ processor technology1 is a set of PC technologies designed This chipset has four integrated SATA ports for transfer rates up to 3 for the enjoyment of digital entertainment in the home. The Intel G31 Gb/s (300 MB/s) to SATA hard drives or optical devices. Support for RAID Express Chipset supports Intel Viiv processor technology with either the 0, 1, 5 and 10 allows for different RAID capabilities that address specific Intel® ICH7R or ICH7DH SKUs. needs and usages. For example, critical data can be stored on one array designed for high reliability, while performance-intensive applications like Faster System Performance games can reside on a separate array designed for maximum The Intel® G31 Express Chipset Graphics Memory Controller Hub (GMCH) performance.
    [Show full text]
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]
  • A Programming Model and Processor Architecture for Heterogeneous Multicore Computers
    A PROGRAMMING MODEL AND PROCESSOR ARCHITECTURE FOR HETEROGENEOUS MULTICORE COMPUTERS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Michael D. Linderman February 2009 c Copyright by Michael D. Linderman 2009 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Professor Teresa H. Meng) Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Professor Mark Horowitz) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Professor Krishna V. Shenoy) Approved for the University Committee on Graduate Studies. iii Abstract Heterogeneous multicore computers, those systems that integrate specialized accelerators into and alongside multicore general-purpose processors (GPPs), provide the scalable performance needed by computationally demanding information processing (informatics) applications. However, these systems often feature instruction sets and functionality that significantly differ from GPPs and for which there is often little or no sophisticated compiler support. Consequently developing applica- tions for these systems is difficult and developer productivity is low. This thesis presents Merge, a general-purpose programming model for heterogeneous multicore systems. The Merge programming model enables the programmer to leverage different processor- specific or application domain-specific toolchains to create software modules specialized for differ- ent hardware configurations; and provides language mechanisms to enable the automatic mapping of processor-agnostic applications to these processor-specific modules.
    [Show full text]
  • Virtual Texturing
    Virtual Texturing DIPLOMARBEIT zur Erlangung des akademischen Grades Diplom-Ingenieur im Rahmen des Studiums Computergraphik/Digitale Bildverarbeitung eingereicht von Albert Julian Mayer Matrikelnummer 0126505 an der Fakultät für Informatik der Technischen Universität Wien Betreuung Betreuer: Associate Prof. Dipl.-Ing. Dipl.-Ing. Dr.techn. Michael Wimmer Wien, 14.10.2010 (Unterschrift Verfasser/) (Unterschrift Betreuer) Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at • • • Abstract Virtual texturing (as presented by Mittring in ’Advanced Virtual Texture Topics’ and in distinction to clipmap-style systems, to which this term is also applied) is a solution to the problem of real-time rendering of scenes with vast amounts of texture data which does not fit into graphics or main memory. Virtual texturing works by preprocessing the aggregate texture data into equally-sized tiles and determining the necessary tiles for rendering before each frame. These tiles are then streamed to the graphics card and rendering is performed with a special virtual texturing fragment shader that does texture coordinate adjustments to sample from the tile storage texture. A thorough description of virtual texturing and related topics is given, along with an examination of specific challenges including preprocessing, visible tile determination, texture filtering, tile importance metrics and many more. Tile determination in view space is examined in detail and an implementation for compressing the resulting buffer in OpenCL is presented. Rendering with correct texture filtering from a texture which contains de-correlated texture tiles is attained by using tile borders with specific coordi- nate adjustment and gradient correction in the fragment shader.
    [Show full text]
  • HP Rp3000 Point of Sale System
    HP rp3000 Point of Sale System HP recommends Windows Vista® Business TOUGH TOUGH TOUGH TOUGH Withstand harsh retail environments for years. The retail-hardened HP rp3000 POS System offers a highly scalable system that makes setting-up, operations, and upgrading across an entire fleet easy and hassle-free. Backed by HP quality support and services for smooth and uninterrupted operations, the HP rp3000 POS System takes care of your every POS need so you can focus on your business. HP recommends Windows Vista® Business. Built to take on tough retail environments The HP rp3000 POS System is designed to function uninterrupted through long business hours, exposure to temperatures beyond optimal operating ranges, and rough physical handling. Count on the acclaimed service and support of HP and its valued partners to deliver end-to-end service in one phone call, and additional service contract extensions to help keep your business running smoothly throughout the harshest retail conditions. Create an outstanding in-store experience The HP rp3000 POS System lets you get closer to your customers with a flexible and compact solution that maximises counter space, as well as features that track purchasing patterns, and access buyer data for a greater understanding of your customers and their habits. Simplified set-up and use so you can focus on your business Readily set-up and operational, this hassle-free system can help your business save significant time and money, making it ideal for new retail stores or those looking to upgrade from an electronic cash register. And, as business grows and expands, the wide range of HP POS peripherals, and host of qualified POS applications make it simple to upgrade your system or add another lane.
    [Show full text]
  • PC Hardware Contents
    PC Hardware Contents 1 Computer hardware 1 1.1 Von Neumann architecture ...................................... 1 1.2 Sales .................................................. 1 1.3 Different systems ........................................... 2 1.3.1 Personal computer ...................................... 2 1.3.2 Mainframe computer ..................................... 3 1.3.3 Departmental computing ................................... 4 1.3.4 Supercomputer ........................................ 4 1.4 See also ................................................ 4 1.5 References ............................................... 4 1.6 External links ............................................. 4 2 Central processing unit 5 2.1 History ................................................. 5 2.1.1 Transistor and integrated circuit CPUs ............................ 6 2.1.2 Microprocessors ....................................... 7 2.2 Operation ............................................... 8 2.2.1 Fetch ............................................. 8 2.2.2 Decode ............................................ 8 2.2.3 Execute ............................................ 9 2.3 Design and implementation ...................................... 9 2.3.1 Control unit .......................................... 9 2.3.2 Arithmetic logic unit ..................................... 9 2.3.3 Integer range ......................................... 10 2.3.4 Clock rate ........................................... 10 2.3.5 Parallelism .........................................
    [Show full text]
  • HP Compaq 6910P Notebook PC
    QuickSpecs HP Compaq 6910p Notebook PC Overview HP recommends Windows Vista® Business 1. Volume mute button with LED indicator 13. Touchpad buttons 2. Volume scroll zone with up/down LED indicators 14. Hard drive activity / HP 3D DriveGuard LED 3. Integrated microphone 15. Battery charging LED 4. RJ-11/modem port 16. Power/standby LED 5. RJ-45/Ethernet port 17. Wireless on/off LED 6. USB 2.0 port 18. Power button with LED 7. Optical drive 19. HP Info / HP QuickLook button with LED indicator 8. Integrated Smart Card Reader 20. Wireless on/off button with LED indicator 9. HP Fingerprint Sensor 21. HP Presentation button with LED indicator 10. Pointstick 22. Ambient Light Sensor DA - 12699 North America — Version 17 — March 17, 2008 Page 1 QuickSpecs HP Compaq 6910p Notebook PC Overview 11. Pointstick buttons 23. Two WWAN antennas 12. Touchpad with scroll zone 24. Three WLAN antennas 1. 2 USB 2.0 ports 9. Power connector 2. 1394a port 10. Kensington Lock slot 3. Stereo microphone in 11. Integrated stereo speakers 4. Stereo headphone/line out 12. Secure Digital (SD) slot 5. Type I/II PC Card slot 13. Fast Infrared port 6. PC Card eject button 14. Hard drive activity / HP 3D DriveGuard LED 7. VGA/external monitor connector 15. Battery charging LED 8. S-Video TV out 16. Power/standby LED 17. Wireless on/off LED DA - 12699 North America — Version 17 — March 17, 2008 Page 2 QuickSpecs HP Compaq 6910p Notebook PC Overview At A Glance Genuine Windows Vista Business*, Genuine Windows Vista Home Basic, Genuine Windows XP Professional, or FreeDOS ATI
    [Show full text]
  • Intel HD Graphics Directx Developer's Guide (Sandy Bridge)
    Intel® Processor Graphics DirectX* Developer's Guide How to maximize graphics performance on Intel® microarchitecture codename Sandy Bridge Copyright © 2008-2010 Intel Corporation All Rights Reserved Document Number: 321371-002 Revision: 2.9.6 Contributors: Jeff Freeman, Chris McVay, Chuck DeSylva, Luis Gimenez, Katen Shah, Jeff Frizzell, Ben Sluis, Anthony Bernecky, Raghu Muthyalampalli, Ganeshkumar Doraisamy, Steven Smith, Axel Mamode World Wide Web: http://www.intel.com Document Number: 321671-005US Intel® Processor Graphics DirectX* Developer's Guide Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
    [Show full text]
  • Intel® Q965 Express Chipset Development Kit
    Product Brief Development Kit Embedded Computing Intel® Q965 Express Chipset Development Kit Product Overview The Intel® Q965 Express chipset development kit delivers out- standing next-generation graphics performance with innovative features for interactive client and embedded computing solutions. It integrates Intel® Graphics Media Accelerator 3000 (Intel® GMA 3000) and PCI Express* x16 external graphics, and supports Intel® Active Management Technology (Intel® AMT)1 with system defense. The Intel® Q965 Express chipset platform with LGA-775 package provides scalable performance and an ideal price/performance solution for embedded computing applications. The Intel® Core™2 Duo processor E6400Δ at 2.13 GHz with 1066 MHz front-side bus (FSB) enhances the microarchitecture with Intel® 64 technology – Intel® ICH8 DO I/O Controller Hub (Intel® 64)§ to support 64-bit computing needs. It meets the – Intel® PRO/1000 LAN 82566DM Gigabit Ethernet controller current and future demands of high-performance, embedded computing while remaining software-compatible with previous • Super I/O provides legacy floppy, serial and parallel ports, members of the Intel® microprocessor family. PS/2 mouse and keyboard support • User-accessible on-board connectors: The Intel Q965 Express chipset provides a high-performance dual-channel memory interface delivering significant graphics – PS/2 mouse and keyboard performance over previous Intel® platforms. With support for dual – Parallel port independent display, enhanced modes for widescreen flat panels, – Serial port header and optimized 3D, embedded platforms based on the Intel Q965 Express chipset development kit can deliver an intense, realistic – Ten (10) Hi-Speed USB 2.0 ports, including six (6) via visual experience without requiring a separate graphics card.
    [Show full text]
  • Intel's Core 2 Family
    Intel’s Core 2 family - TOCK lines References Dezső Sima Vers. 1.0 Januar 2019 Contents (1) • 1. Introduction • 2. The Core 2 line • 3. The Nehalem line • 4. The Sandy Bridge line • 5. The Haswell line • 6. The Skylake line • 7. The Kaby Lake line • 8. The Kaby Lake Refresh line • 9. The Coffee Lake line • 10. The Coffee Lake line Refresh Contents (2) • 11. The Cannon Lake line (outlook) • 12. Sunny Cove • 13. References 13. References 12. References (1) [1]: Singhal R., “Next Generation Intel Microarchitecture (Nehalem) Family: Architecture Insight and Power Management, IDF Taipeh, Oct. 2008, http://intel.wingateweb.com/taiwan08/ published/sessions/TPTS001/FA08%20IDFTaipei_TPTS001_100.pdf [2]: Bryant D., “Intel Hitting on All Cylinders,” UBS Conf., Nov. 2007, http://files.shareholder.com/downloads/INTC/0x0x191011/e2b3bcc5-0a37-4d06- aa5a-0c46e8a1a76d/UBSConfNov2007Bryant.pdf [3]: Fisher S., “Technical Overview of the 45 nm Next Generation Intel Core Microarchitecture (Penryn),” IDF 2007, ITPS001, http://isdlibrary.intel-dispatch.com/isd/89/45nm.pdf [4]:Pabst T., The New Athlon Processor: AMD Is Finally Overtaking Intel, Tom's Hardware, August 9, 1999, http://www.tomshardware.com/reviews/athlon-processor,121-2.html [5]: Carmean D., “Inside the Pentium 4 Processor Micro-architecture,” Aug. 2000, http://people.virginia.edu/~zl4j/CS854/pda_s01_cd.pdf [6]: Shimpi A. L. & Clark J., “AMD Opteron 248 vs. Intel Xeon 2.8: 2-way Web Servers go Head to Head,” AnandTech, Dec. 17 2003, http://www.anandtech.com/showdoc.aspx?i=1935&p=1 [7]: Völkel F., “Duel of the Titans: Opteron vs. Xeon : Hammer Time: AMD On The Attack,” Tom’s Hardware, Apr.
    [Show full text]
  • Intel® Atom™ Processor N270 and Mobile Intel 945GSE Express
    Product Brief Development Kit Embedded Computing Intel® Atom™ Processor N270 and Mobile Intel® 945GSE Express Chipset Development Kit Product Overview The Intel® Atom™ processor N270Δ and Mobile Intel® 945GSE Express chipset development kit provides robust performance- per-watt, power-efficient graphics and rich I/O capabilities for cost-effective embedded solutions. The chipset features an integrated 32-bit 3D graphics engine based on Intel® Graphics Media Accelerator 950 (Intel® GMA 950) architecture, a 533 MHz front-side bus (FSB), single-channel 400/533 MHz DDR2 system memory (SODIMM and/or memory down), Intel® Matrix Storage Technology and Intel® High Definition Audio1 interface. The chipset delivers outstanding system perfor- mance and flexibility through high-bandwidth interfaces such as PCI Express,* PCI, Serial ATA, and Hi-Speed USB 2.0 connectivity. This platform offers an excellent solution for embedded market segments such as digital signage, interactive clients (kiosks, point-of-sale terminals), thin clients, digital security, residential Board Peripheral Features gateways, print imaging, and commercial and industrial control. It is part of Intel’s comprehensive validation process, enabling fast • VGA port and DVI-D deployment of next-generation platforms to help developers maxi- • Two (2) SATA ports mize competitive advantage while minimizing development risks. • One (1) parallel ATA port (44-pin mobile header) This and other development kits from Intel provide a working • Eight (8) USB 2.0 ports (four back-panel, two front-panel, system with a range of performance options that can be modified one internal, one PCIe mini-card) or used immediately for product development, and allow software vendors to test BIOS and operating system software.
    [Show full text]