Lecture 20: – Speeds and Feeds When Employing a GPU

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 20: – Speeds and Feeds When Employing a GPU 3/23/2020 ECE408 / CS483 Spring 2020 Objectives • to understand the impact of data transfers on Applied Parallel Programming performance when using a GPU as a co-processor – speeds and feeds of traditional CPU Lecture 20: – speeds and feeds when employing a GPU GPU as part of the PC Architecture • to develop a knowledge base for performance tuning for modern GPU’s © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 1 2 Review: Canonical CUDA Program Structure Bandwidth: The Gravity of Modern Computer Systems • Global variables declaration Bandwidth between key components • Kernel functions ultimately dictates system performance, – __global__ void kernelOne(…) • especially for GPUs processing • Main () // host code – allocate memory space on the device – cudaMalloc(&d_GlblVarPtr, bytes ) large amounts of data. – transfer data from host to device – cudaMemcpy(d_GlblVarPtr, h_Gl…) – execution configuration setup • Tricks like buffering, reordering, caching can – kernel call – kernelOne<<<execution configuration>>>( args… ); repeat temporarily defy the rules in some cases. – transfer results from device to host – cudaMemcpy(h_GlblVarPtr,…) as needed • Ultimately, performance falls back to – optional: compare against golden (host computed) solution what the “speeds and feeds” dictate. © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 3 4 1 3/23/2020 (Original) PCI Bus Specification Classic (Historical) PC Architecture A Humble Beginning • Northbridge connects 3 • Connected to the southBridge components that must – Originally 33 MHz, 32-bit wide, 132 MB/second peak transfer rate communicate at high speed – Later, 66 MHz, 64-bit, 528 MB/second peak – CPU, DRAM, video CPU – Upstream bandwidth remain slow for device (~256MB/s peak) – Video needs first-class – Shared bus with arbitration access to DRAM • Winner of arbitration becomes bus master and can connect to CPU or DRAM through the southbridge and northbridge – Previous NVIDIA cards are connected to AGP, up to 2 GB/s transfers • Southbridge serves as a concentrator for slower I/O devices Core Logic Chipset © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 5 6 PCI as Memory Mapped I/O PCI Express (PCIe) • PCI device registers switched, point-to-point are mapped into the connection CPU’s physical address space • each card has dedicated – Accessed through “link” to the central switch, loads/ stores (kernel with no arbitration mode) • Addresses are assigned • packet switches: messages to the PCI devices at form virtual channel boot time • prioritized packets for QoS – All devices listen for their addresses (such as for real-time video streaming) © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 7 8 2 3/23/2020 PCIe Generations PCIe Gen 3 Links and Lanes • Each link consists of one or • Within a generation, number of lanes more lanes in a link can be scaled – Each lane is 1-bit wide (4 wires, each 2-wire pair can transmit – using distinct physical channels 8Gb/s in one direction) • Upstream and downstream (more bits / wider transfers) simultaneous and symmetric – Each Link can combine 1, 2, 4, – ×1, ×2, ×4, ×8, ×16, ×32, … 8, 12, 16 lanes- x1, x2, etc. – Each byte data is 128b/130b encoded into 130 bits with equal number of 1’s and 0’s; net data rate 7.8768 Gb/s per lane each • Each new generation aims to way. – Thus, the net data rates are 985 double the speed. MB/s (x1) 1.97 GB/s (x2), 3.94 GB/s (x4), 7.9 GB/s (x8), 15.8 GB/s (x16), each way © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 9 10 Foundation: 8/10 bit encoding Current: 128/130 bit encoding • Goal is to maintain DC • 00000000, 00000111, • Same goal: maintain DC • Scrambler function: balance while have 11000001 bad balance while have long runs of 0s, 1s sufficient state transition • 01010101, 11001100 sufficient state transition vanishingly small for clock recovery good for clock recovery • Instead of guaranteed • The difference of 1s and • Find 256 good patterns • 1.5% overhead instead run length of 8/10b 0s in a 20-bit stream among 1024 total of 20% • At least one bit shift should be ≤ 2 patterns of 10 bits to every 66 bits • There should be no encode an 8-bit datum more than 5 consecutive • a 20% overhead 1s or 0s in any stream © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 11 12 3 3/23/2020 Patterns Contain Many 0s and 1s Recent PCIe PC Architecture A question for fun: Today, PCIe forms the – if we need 2128 code words interconnect backbone within PC. – chosen from all 2130 130-bit patterns – how many 0s/1s must we consider including? Northbridge and Answer: 63-67 (of either type) Southbridge are PCIe switches. Thus 128b/130b code words are pretty well-balanced, Source: Jon Stokes, PCI Express: An Overview and have lots of 0-1 transitions (for clock recovery). (http://arstechnica.com/articles/ paedia/hardware/pcie.ars) © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 13 14 GeForce GTX 1080 (Pascal) Recent PCIe PC Architecture GPU Consumer Card Details How is PCI supported? SLI Connector (NVIDIA to NVIDIA) • Need a PCI-PCIe bridge, which is DP Out • sometimes included as HDMI Out part of Southbridge, or • can add as a separate PCIe I/O card. DVI Out Current systems integrate PCIe controllers directly on chip with CPU. 8GB/256-bit GDDR5X Source: Jon Stokes, PCI Express: 1.25 GHz mem clock An Overview 16x PCI-Express (http://arstechnica.com/articles/ 2.5 GHz write clock w/ QDR = 10Gb/s/pin paedia/hardware/pcie.ars) 256 bit bus = 320 GB/s 8 pieces of 8Gb (16 Mb x 32 x 16 banks) © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 15 16 4 3/23/2020 PCIe Data Transfer using DMA Pinned Memory DMA (Direct Memory • DMA uses physical • If a source or destination Access) is used to fully Main Memory (DRAM) addresses of a cudaMemCpy() in utilize the bandwidth of an • The OS could the host memory is not I/O bus accidentally page out the pinned, it needs to be first copied to a pinned • DMA uses physical CPU data that is being read or address for source and written by a DMA and memory – extra destination page in another virtual overhead • Transfers a number of Global DMA page into the same • cudaMemcpy is much bytes requested by OS Memory location faster with pinned host GPU card • Pinned memory cannot memory source or • Needs pinned memory (or other I/O cards) not be paged out destination © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 17 18 Allocate/Free Pinned Memory Using Pinned Memory (a.k.a. Page Locked Memory) • cudaHostAlloc() • Use the allocated memory and its pointer the – Three parameters same way those returned by malloc(); – Address of pointer to the allocated memory • The only difference is that the allocated – Size of the allocated memory in bytes memory cannot be paged by the OS – Option – use cudaHostAllocDefault for now • The cudaMemcpy function should be about 2X faster with pinned memory • cudaFreeHost() • Pinned memory is a limited resource whose – One parameter over-subscription can have serious – Pointer to the memory to be freed consequences © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 19 20 5 3/23/2020 Important Trends • Knowing yesterday, today, and tomorrow – The PC world is becoming flatter – CPU and GPU are being fused together – Outsourcing of computation is becoming easier… ANY MORE QUESTIONS? © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2018 ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign ECE408/CS483/CSE408, ECE 498AL, University of Illinois, Urbana-Champaign 21 22 6.
Recommended publications
  • ATX-945G Industrial Motherboard in ATX Form Factor with Intel® 945G Chipset
    ATX-945G Industrial Motherboard in ATX form factor with Intel® 945G chipset Supports Intel® Core™2 Duo, Pentium® D CPU on LGA775 socket 1066/800/533MHz Front Side Bus Intel® Graphics Media Accelerator 950 Dual Channel DDR2 DIMM, maximum 4GB Integrated PCI Express LAN, USB 2.0 and SATA 3Gb/s ATX-945G Product Overview The ATX-945G is an ATX form factor single-processor (x8) and SATA round out the package for a powerful industrial motherboard that is based on the Intel® 945G industrial PC. I/O features of the ATX-945G include a chipset with ICH7 I/O Controller Hub. It supports the Intel® 32-bit/33MHz PCI Bus, 16-bit/8MHz ISA bus, 1x PCI Express Core™2 Duo, Pentium® D, Pentium® 4 with Hyper-Threading x16 slot, 2x PCI Express x1 slots, 3x PCI slots, 1x ISA slot Technology, or Celeron® D Processor on the LGA775 socket, (shared w/ PCI), onboard Gigabit Ethernet, LPC, EIDE Ultra 1066/800/533MHz Front Side Bus, Dual Channel DDR2 DIMM ATA/100, 4 channels SATA 3Gb/s, Mini PCI card slot, UART 667/533/400MHz, up to 4 DIMMs and a maximum of 4GB. The compatible serial ports, parallel port, floppy drive port, HD Intel® Graphic Media Accelerator 950 technology with Audio, and PS/2 Keyboard and Mouse ports. 2048x1536x8bit at 75Hz resolution, PCI Express LAN, USB 2.0 Block Diagram CPU Core™2 Duo Pentium® D LGA775 package 533/800/1066MHz FSB Dual-Core Hyper-Threading 533/800/1066 MHz FSB D I M Northbridge DDR Channel A M ® x Intel 945G GMCH 2 DDRII 400/533/667 MHz D I CRT M M DB-15 DDR Channel B x 2 Discrete PCIe x16 Graphics DMI Interface 2 GB/s IDE Device
    [Show full text]
  • Motherboards, Processors, and Memory
    220-1001 COPYRIGHTED MATERIAL c01.indd 03/23/2019 Page 1 Chapter Motherboards, Processors, and Memory THE FOLLOWING COMPTIA A+ 220-1001 OBJECTIVES ARE COVERED IN THIS CHAPTER: ✓ 3.3 Given a scenario, install RAM types. ■ RAM types ■ SODIMM ■ DDR2 ■ DDR3 ■ DDR4 ■ Single channel ■ Dual channel ■ Triple channel ■ Error correcting ■ Parity vs. non-parity ✓ 3.5 Given a scenario, install and configure motherboards, CPUs, and add-on cards. ■ Motherboard form factor ■ ATX ■ mATX ■ ITX ■ mITX ■ Motherboard connectors types ■ PCI ■ PCIe ■ Riser card ■ Socket types c01.indd 03/23/2019 Page 3 ■ SATA ■ IDE ■ Front panel connector ■ Internal USB connector ■ BIOS/UEFI settings ■ Boot options ■ Firmware upgrades ■ Security settings ■ Interface configurations ■ Security ■ Passwords ■ Drive encryption ■ TPM ■ LoJack ■ Secure boot ■ CMOS battery ■ CPU features ■ Single-core ■ Multicore ■ Virtual technology ■ Hyperthreading ■ Speeds ■ Overclocking ■ Integrated GPU ■ Compatibility ■ AMD ■ Intel ■ Cooling mechanism ■ Fans ■ Heat sink ■ Liquid ■ Thermal paste c01.indd 03/23/2019 Page 4 A personal computer (PC) is a computing device made up of many distinct electronic components that all function together in order to accomplish some useful task, such as adding up the numbers in a spreadsheet or helping you to write a letter. Note that this defi nition describes a computer as having many distinct parts that work together. Most PCs today are modular. That is, they have components that can be removed and replaced with another component of the same function but with different specifi cations in order to improve performance. Each component has a specifi c function. Much of the computing industry today is focused on smaller devices, such as laptops, tablets, and smartphones.
    [Show full text]
  • ECS P4M890T-M2 Manual.Pdf
    Preface Copyright This publication, including all photographs, illustrations and software, is protected under international copyright laws, with all rights reserved. Neither this manual, nor any of the material contained herein, may be reproduced without written consent of the author. Version 1.0B Disclaimer The information in this document is subject to change without notice. The manufacturer makes no representations or warranties with respect to the contents hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. The manufacturer reserves the right to revise this publication and to make changes from time to time in the content hereof without obligation of the manufacturer to notify any person of such revision or changes. Trademark Recognition Microsoft, MS-DOS and Windows are registered trademarks of Microsoft Corp. MMX, Pentium, Pentium-II, Pentium-III, Pentium-4, Celeron are registered trademarks of Intel Corporation. Other product names used in this manual are the properties of their respective owners and are acknowledged. Federal Communications Commission (FCC) This equipment has been tested and found to comply with the limits for a Class B digital device, pursuant to Part 15 of the FCC Rules. These limits are designed to provide reason- able protection against harmful interference in a residential installation. This equipment generates, uses, and can radiate radio frequency energy and, if not installed and used in accordance with the instructions, may cause harmful interference
    [Show full text]
  • Comparison of High Performance Northbridge Architectures in Multiprocessor Servers
    Comparison of High Performance Northbridge Architectures in Multiprocessor Servers Michael Koontz MS CpE Scholarly Paper Advisor: Dr. Jens-Peter Kaps Co-Advisor: Dr. Daniel Tabak - 1 - Table of Contents 1. Introduction ...............................................................................................................3 2. The x86-64 Instruction Set Architecture (ISA) ...................................................... 3 3. Memory Coherency ...................................................................................................4 4. The MOESI and MESI Cache Coherency Models.................................................8 5. Scalable Coherent Interface (SCI) and HyperTransport ....................................14 6. Fully-Buffered DIMMS ..........................................................................................16 7. The AMD Opteron Northbridge ............................................................................19 8. The Intel Blackford Northbridge Architecture .................................................... 27 9. Performance and Power Consumption .................................................................32 10. Additional Considerations ..................................................................................34 11. Conclusion ............................................................................................................ 36 - 2 - 1. Introduction With the continuing growth of today’s multi-media, Internet based culture, businesses are becoming more dependent
    [Show full text]
  • AMD Opteron™ 4200 Series Processor
    AMD Opteron™ 4200 Series Processor AMD Opteron™ 4200 Series Processor What’s new in the AMD Opteron™ 4200 Series Processor (Codenamed “Valencia”) and the new “Bulldozer” Microarchitecture? Platform Processor Socket Chipset Opteron 4000 Opteron 4200 C32 56x0 / 5100 (codenamed “San Marino” or “Adelaide”) (codenamed “Valencia") (with BIOS update) In their first microarchitecture rebuild since Opteron, AMD introduces the Bulldozer module, implemented in both the Opteron 4200 and 6200 Series. The modules are engineered to emphasize throughput, core count, and parallel server workloads. A module consists of two tightly coupled cores, in which some core resources are shared. The effect is between a core with two threads and a dual-core processor, in which each core is fully independent. Performance, scalability, and efficiency for today’s applications. Processor Features Platform Features . 32nm process technology . 2 DDR3 memory channels, LRDIMM, RDIMM, UDIMM . Up to 8 Bulldozer cores up to 1600 GT/s . Evolved integrated northbridge o 1.25 and 1.35V Low-voltage DRAM o 8M L3 cache o 1.5V also available o Bandwidth improvements . 2 HyperTransport™ technology 3.0 links up to 6.4 GT/s o Memory capacity improvements . Advanced Platform Management Link (APML) for o Power efficiency improvements system management . Up to16MB combined L2 + L3 cache . Compatible with existing C32 socket with BIOS update Product Specifications Clock Frequency / Turbo L3 Max Memory System Bus Process TDP1 / ACP2 Model Cores L2 Cache Core Frequency (GHz) Cache Speed Speed 4284 3.0 / 3.7 8 4 x 2MB 4280 2.8 / 3.5 95 W / 75W 4238 3.3 / 3.7 8MB DDR3-1600 MHz 6.4 GT/s 4234 3.1 / 3.5 6 3 x 2MB 32nm 4226 2.7 / 3.1 4274 HE 2.5 / 3.5 8 4 x 2MB 65 W / 50W 8MB DDR3-1600 MHz 6.4 GT/s 4228 HE 2.8 / 3.6 6 3 x 2MB 35W / 32W 4256 EE 1.6 / 2.8 8 4 x 2MB 8MB DDR3-1600 MHz 6.4 GT/s 1 TDP stands for Thermal Design Power.
    [Show full text]
  • ESPRIMO Mobile V5505
    Issue February 2008 ESPRIMO Mobile V5505 Pages 3 The ESPRIMO Mobile V5505 is a versatile all-round notebook, equally suitable for occasional or professional users, and heavy-duty operation by mobile field sales people. This professional notebook features a superb 15.4-inch WXGA display, an ergonomic keyboard and a built-in super multi DVD writer drive. LAN are integrated for easier connection, courtesy of the latest Intel® Centrino® Duo Mobile Technology based on Intel® 965GM chipset. It has great connections; four USB ports and an integrated 3in1 Card Reader. Connect to printers, scanners, cameras or any other accessory. Elegance Designed for the most demanding mobile users, ESPRIMO Mobile is the perfect synthesis of form, function and style. Power-saving Intel® Centrino® Duo Mobile for long independent working Increased productivity due to latest Intel® 965GM chipset which supports latest technology standards: DDR2 memory, PCI Express and S-ATA Ergonomics Enjoy the viewing quality of brilliant 15.4-inch WXGA TFT display Convenient working with full sized keyboard Connectivity Ideal connectivity through WLAN. Antennas are integrated for best signal reception. Variety of interfaces for best connection to the peripherals Reliability Germany‘s quality standard Award-winning best-in-class manufacturing Mechanical and function stability through extensive quality tests Data Sheet ⏐ Issue: February 2008⏐ ESPRIMO Mobile V5505 Page 2 / 3 System ESPRIMO Mobile V5505 Processor Intel® Core™2 Duo Processor Up to T7500 (2.2 GHz) Second level
    [Show full text]
  • Blackford:Blackford: AA Dualdual Processorprocessor Chipsetchipset Forfor Serversservers Andand Workstationsworkstations
    Blackford:Blackford: AA DualDual ProcessorProcessor ChipsetChipset forfor ServersServers andand WorkstationsWorkstations Kai Cheng, Sundaram Chinthamani, Sivakumar Radhakrishnan, Fayé Briggs and Kathy Debnath IntelIntel CorporationCorporation 8/22/20068/22/2006 © Copyright 2006, Intel Corporation. All rights reserved. Hot Chips 2006 *Third party marks and brands are the property of their respective owners. Digital Enterprise Group 1 LegalLegal DisclaimeDisclaimerr • Intel, the Intel logo, Centrino, the Centrino logo, Intel Core, Core Inside, Pentium, Pentium Inside, Itanium, Itanium Inside, Xeon, Xeon Inside, Pentium III Xeon, Celeron, Celeron Inside, and Intel SpeedStep are trademarks or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries. • This document is provided “as is” with no warranties whatsoever, including any warranty of merchantability, non-infringement fitness for any particular purpose, or any warranty otherwise arising out of any proposal, specification or sample • Information in this document is provided in connection with Intel products. No license, express or implied, by estoppels or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. • Intel does not control or audit the design or implementation of 3rd party benchmarks or websites referenced in this document.
    [Show full text]
  • Memory Controller Design and Optimizations for High-Performance Systems
    MEMORY CONTROLLER DESIGN AND OPTIMIZATIONS FOR HIGH-PERFORMANCE SYSTEMS Project Report By Yanwei Song, Raj Parihar In Partial Fulfillment of the Course CSC458 – Parallel and Distributed Systems April 2009 University of Rochester Rochester, New York TABLE OF CONTENT ABSTRACT........................................................................................................................ 3 1. INTRODUCTION ...................................................................................................... 4 2. OVERVIEW: DRAM BASCIS .................................................................................. 5 3. DRAM MEMORY CONTROLLER .......................................................................... 7 4. PARALLELISM IN MEMORY SYSTEM.............................................................. 10 5. SURVEY: TECHNIQUES/ IMPLEMENTATIONS ............................................... 11 6. SIMULATIONS AND ANALYSIS......................................................................... 13 7. FUTURE WORK...................................................................................................... 17 8. CONCLUSION......................................................................................................... 18 REFERENCES ................................................................................................................. 19 2 ABSTRACT “Memory wall” continues to grow despite the technological advancement and gap between memory sub-system and processor’s clock speed is still increasing. On-chip
    [Show full text]
  • AMD Athlon Northbridge with 4X AGP and Next Generation Memory
    AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck, Steve Polzin, Sridhar Subramanian, Fred Weber Advanced Micro Devices, Sunnyvale, CA * AMD Athlon was formerly code-named AMD-K7 1 Outline of the Talk • Introduction • Architecture • Clocking and Gearbox • Performance • Silicon Statistics • Conclusion 2 Introduction • NorthBridge: “Electronic traffic cop” that directs data flow between the main memory and the rest of the system • Bridge the gap between processor speed and memory speed – Higher bandwidth busses • Example: AGP 2.0, EV6 and AMD Athlon system bus – Better memory technology • Example: Double data rate SDRAM, RDRAM 3 System Block Diagram 100MHz for PC-100 SDRAM AMD Athlon 200MHz for DDR SDRAM System Bus 533-800MHz for RDRAM DRAM CPU 200 MHz, 64 bits NorthBridge Scaleable to 400 MHz 66MHz, 32 bits (1x,2x,4x) Graphics CPU AGP Bus Device 33MHz, 32 bits PCI PCI Bus Devices SouthBridge ISA Bus IDE USBSerial Printer Port Port 4 Features of the AMD Athlon Northbridge • Can support one or two AMD Athlon or EV6 processors • 200MHz data rate (scaleable to 400MHz), 64-bit processor interface • 33MHz, 32-bit PCI 2.2 compliant interface • 66MHz, 32-bit AGP 2.0 compliant interface supports 1x, 2x and 4x data transfer modes • Versions for SDRAM, DDR SDRAM and RDRAM memory • Single bit error correction and multiple bit error detection (ECC) • Distributed Graphics Aperture Remapping Table (GART) • Power management features including powerdown self-refresh of SDRAM
    [Show full text]
  • TECHNICAL MANUAL of Intel 945GC +Intel 82801G Based Mini-ITX M/B
    TECHNICAL MANUAL Of Intel 945GC +Intel 82801G Based Mini-ITX M/B For Intel Atom Processor NO.G03-NC92-F Rev1.0 Release date: Oct., 2008 Trademark: * Specifications and Information contained in this documentation are furnished for information use only, and are subject to change at any time without notice, and should not be construed as a commitment by manufacturer. Environmental Protection Announcement Do not dispose this electronic device into the trash while discarding. To minimize pollution and ensure environment protection of mother earth, please recycle. ii TABLE OF CONTENT USER’S NOTICE .................................................................................................................................. iv MANUAL REVISION INFORMATION ............................................................................................ iv ITEM CHECKLIST.............................................................................................................................. iv CHAPTER 1 INTRODUCTION OF THE MOTHERBOARD 1-1 FEATURE OF MOTHERBOARD..................................................................................... 1 1-2 SPECIFICATION.................................................................................................................. 2 1-3 LAYOUT DIAGRAM ........................................................................................................... 3 CHAPTER 2 JUMPER SETTING, CONNECTORS AND HEADERS 2-1 JUMPER SETTING ..............................................................................................................6
    [Show full text]
  • Manual Title
    HP ProBook 4415s Notebook PC HP ProBook 4416s Notebook PC HP ProBook 4515s Notebook PC Maintenance and Service Guide Addendum Addendum Part Number: 577544-001 Addendum revision history Part number Publication date Description -001 September 2009 Updated commodities in the Product Description table. Updated spare parts throughout MSG. © Copyright 2009 Hewlett-Packard Development Company, L.P. AMD Athlon, AMD Sempron, AMD Turion, and ATI Mobility Radeon are trademarks of Advanced Micro Devices, Inc. Bluetooth is a trademark owned by its proprietor and used by Hewlett-Packard Company under license. Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. First Edition: September 2009 Product description changes The item descriptions in the table below are supplemental to the Product Description table in chapter 1. 14.0-in 14.0-in 15.6-in 15.6-in UMA Discrete UMA Discrete RS880M RX881 RS880M RX881 Description 4415s 4416s 4515s 4515s AMD Turion™ II Ultra Dual-Core Mobile M600 processor, 2.4-GHz with 2-MB XXXX L2 cache AMD Turion II Dual-Core Mobile M520 processor, 2.3-GHz with 1-MB L2 cache XXXX AMD Turion II Dual-Core Mobile M500 processor, 2.2-GHz with
    [Show full text]
  • Cache Hierarchy and Memory Subsystem of the Amd Opteron Processor
    [3B2-14] mmi2010020016.3d 30/3/010 12:7 Page 16 .......................................................................................................................................................................................................................... CACHE HIERARCHY AND MEMORY SUBSYSTEM OF THE AMD OPTERON PROCESSOR .......................................................................................................................................................................................................................... THE 12-CORE AMD OPTERON PROCESSOR, CODE-NAMED ‘‘MAGNY COURS,’’ COMBINES ADVANCES IN SILICON, PACKAGING, INTERCONNECT, CACHE COHERENCE PROTOCOL, AND SERVER ARCHITECTURE TO INCREASE THE COMPUTE DENSITY OF HIGH-VOLUME COMMODITY 2P/4P BLADE SERVERS WHILE OPERATING WITHIN THE SAME POWER ENVELOPE AS EARLIER-GENERATION AMD OPTERON PROCESSORS.AKEY ENABLING FEATURE, THE PROBE FILTER, REDUCES BOTH THE BANDWIDTH OVERHEAD OF TRADITIONAL BROADCAST-BASED COHERENCE AND MEMORY LATENCY. ......Recent trends point to high and and latency sensitive, which favors the use growing demand for increased compute den- of high-performance cores. In fact, a com- sity in large-scale data centers. Many popular mon use of chip multiprocessor (CMP) serv- server workloads exhibit abundant process- ers is simply running multiple independent and thread-level parallelism, so benefit di- instances of single-threaded applications in rectly from additional cores. One approach multiprogrammed mode. In addition, for Pat Conway to exploiting
    [Show full text]