Wide I/O DRAM Architecture Utilizing Proximity Communication

Total Page:16

File Type:pdf, Size:1020Kb

Wide I/O DRAM Architecture Utilizing Proximity Communication Wide I/O DRAM Architecture Utilizing Proximity Communication by Qawi Harvard Thesis Defense – October 8th, 2009 Introduction Bandwidth and power consumption of dynamic random access memory stifles computer performance scaling Background Status of Proximity Communication DRAM Market Analysis 4 Gb DRAM Architecture Wide I/O DRAM Architecture Utilizing Proximity Communication Qawi Harvard – Oct. 8th,2009 Thesis Defense 2 Background Memory Gap Main memory does not scale with processor performance Power Current consumption is rising Bandwidth increases power Voltage scaling masks the issue Density Memory channel loading Limits bandwidth Proximity Communication Proposed by Ivan Sutherland – US Patent #6,500,696 Promises to reduce power and increase bandwidth Qawi Harvard – Oct. 8th,2009 Thesis Defense 3 Proximity Communication Chip 1 Chip 2 Transmit Receive Chip 1 Chip 2 Receive Transmit Capacitive Coupled Proximity Communication Top metal forms the parallel plates Chip-to-chip communication through coupling capacitor Ref:[1] Qawi Harvard – Oct. 8th,2009 Thesis Defense 4 Proximity Communication Benefits Increased I/O density Avoids on/off chip wires Eases chip replacement at the system level Enhances system level testability Enables smaller chip sizes Removes the need for ESD protection Challenges Mechanical misalignment Applying power to the chips Thermal solution Ref:[1-5] Qawi Harvard – Oct. 8th,2009 Thesis Defense 5 Proximity Communication Parallel Plate Capacitance A 0 aF C 8.9 m d 0 2 10 pF/mm 4000 2 1000 Chip-to-chip separation Proximity Communication d = 1 µm Area Ball Bonding 100 One channel I/O Density per per mm Density I/O 50 fF 10 2 2003 2004 2005 2006 2007 2008 2009 2010 200 signals/mm [1] Ref:[1] Qawi Harvard – Oct. 8th,2009 Thesis Defense 6 Proximity Communication Mechanical Misalignment Six axis Multiple sources z θx θy Separation Tilt θz y x Translation Rotation Ref:[5] Qawi Harvard – Oct. 8th,2009 Thesis Defense 7 Proximity Communication Electronic Sensors Chip-to-chip separation sensors (0.2 µm resolution) Vernier scale incorporated on chip (1.0 µm resolution) 1.0 Electrical Re-Alignment 0.9 0.8 Receive array 0.7 0.6 0.5 Micro-transmit array 0.4 0.3 Electronic steering circuit 0.2 0.1 Normalized Received SignalReceived Normalized 0 0 20 40 60 80 100 120 Misalignment (µm) Transmit Chip Inactive Transmit Pad 1 0 1 0 1 0 1 0 1 0 1 Active Transmit Pad Receiver Pad Ref:[1-5] 1 0 1 0 1 ?? 0 1 0 1 Receive Chip Qawi Harvard – Oct. 8th,2009 Thesis Defense 8 DRAM Market Analysis Revisit the Memory Gap “Performance” becomes a relative term Dichotomy in scaling Why Density? Why Not Latency? 100000 100000 10000 10000 Processor IPS DRAM Density 1000 1000 100 100 Processor IPS 10 DRAM Latency 10 Relative Performance (%) Performance Relative Relative Performance (%) Performance Relative 1 1 1980 1985 1990 1995 2000 2005 2010 1980 1985 1990 1995 2000 2005 2010 Ref:[6] Qawi Harvard – Oct. 8th,2009 Thesis Defense 9 DRAM Market Analysis Moore’s Law 41% increase in transistor count per year Selling Price 36% historic decline per year Putting it into Perspective 1 0 0 0 . 0 0 0 0 100,000.00 1 0 0 . 0 0 0 0 1 9 71975 9 Historically the price per bit has 1 Gb 2009 → $2.00 1977 10,000.00 1974 1980 1979 declined by 9% every quarter 1976 1 0 . 0 0 0 0 1981 1981 1984 (1974 – 2008). 1978 1983 2 Gb 2011 → $1.64 1,000.00 1 9 8 2 19801 9 8 3 1988 1985 1987 19821985 1989 1 . 0 0 0 0 1984 1989 100.00 19901993 1 9 8 6 1 9 8 7 1 9 9 5 H i s t o r i c a l 1991 1993 Density or Bust!! 1986 19881991 1994 1995 price•per•bit decline has 1 9 9 6 0 . 1 0 0 0 1992 averaged 35.5% 1990 10.00 1992 19941 9 9 7 (1978 - 2002) 1997 2 0 0 0 Price per Bit (Millicents) 1998 0 . 0 1 0 0 1996 1999 1.00 1 9 9 9 2 0 0 3 Price per Bit (Milicents) 1998 2 0 0 1 20012 0 0 4 F 2000 22003 0 0 5 F 2002 2 02005 0 6 F 0 .0.10 0 0 1 0 20022 0 0 7 F 2 0 0 8 F 2004 0 .0.01 0 0 0 1 1 100 1 0 , 0 0 0 12 Cumulative Bit Volume (10 ) 1 , 0 0 0 , 0 0 0 Ref:[7-8] 100,000,000 Qawi Harvard – Oct. 8th,2009 Thesis Defense 10 DRAM Market Analysis Cost Low cost manufacturing process Generations o 3 metal layers Features • Increased usage of each level PM Simple RAS/CAS o Small chip size FPM Fast CAS Access • Limits I/O count Latched Output Moore’s Law EDO Programmable Burst & Synchronous w/Clock SDR Latency Multi-Bank 41% scaling per year LVTTL Interface o Wordline cross sectional area DDR Data Clocked on Both Clock Data Strobe Edges SSTL 2.5 Interface o Tight metal pitch DDR2 ODT Posted CAS o Contact resistance OCD SSTL 1.8 Interface DDR3 Standard Low Voltage Option Drive/ODT Calibration Physics of Scaling Dynamic ODT Write Leveling DDR4 Faster Latency must increase Lower Power Ref:[7-8,13] Qawi Harvard – Oct. 8th,2009 Thesis Defense 11 DRAM Market Analysis 700 ) Home PC 600 mA Plugged into a wall 500 400 ♦ DDR ● DDR4 ▲ DDR3 Mobile 300 Projections ■ DDR2 Battery life 200 100 Current Consumption ( Consumption Current Server 0 266 333 400 533 666 800 1066 1333 1600 2133 2666 3200 Power consumption Data Rate (MHz) Cooling Memory Mezzanine Tray (Sun SPARC Enterprise T5240 Server Only) Trending Up Poor Efficiency UltraSPARC UltraSPARC Bandwidth Driven Ref:[14-17] Qawi Harvard – Oct. 8th,2009 Thesis Defense 12 DRAM Market Analysis 4000 Interface versus Core 3500 SDR DDR DDR2 DDR3 DDR4 (1n) (2n) (4n) (8n) (16n) 3000 Interface bares the burden 2666 2666 Core cycles 2500 ■ Interface 2133 2000 ● Core 1600 1600 DRAM Pre-fetch 1500 1333 Frequency (MHz) Frequency 800 1000 1066 Doubles at each generation 667 667 400 533 333 500 267 100 200 200 Density limited by 67 67 100 133 167 167 200 133 167 133 167 167 bandwidth 0 133 167200 SSTL loading in memory channel Increase chip count per module Channel DIMMS or Devices per per Devices or DIMMS 400 800 1066 1333 1600 Ref:[14-17] Data Rate (MHz) Qawi Harvard – Oct. 8th,2009 Thesis Defense 13 4 Gb DRAM Architecture Possible Architecture Compared to ITRS 2012 Production Release 74 mm2 56 % Array Efficiency COLUMN 40 nm SPINE ROW Ref:[10,12,20-24] Qawi Harvard – Oct. 8th,2009 Thesis Defense 14 4 Gb DRAM Architecture 6F2 Memory Cell 6 F 2 Feature Size = 40 nm F Gate Isolation 2 Cell Area = 0.0096 µm UNITCELL 3F Pitch per Wordline Bitline Memory Wordline 2F Pitch per Bitline Capacitor 256 kb Array Macro Core Array 512 bitlines ≈ 43.6 µm 512 wordlines ≈ 65.4 µm UNITCELL UNITCELL Periphery Circuitry UNITCELL 4 µm space allocated Ref:[24] Qawi Harvard – Oct. 8th,2009 Thesis Defense 15 4 Gb DRAM Architecture Ref:[25] Qawi Harvard – Oct. 8th,2009 Thesis Defense 16 4 Gb DRAM Architecture 3.5 mm 256 Mb Array 1.9 mm 32 x 32 256 kb macros 1.6 mm 32 43 .6 m 4 m 1.532 mm 256M Array 256M Array 2.3 mm 2.3 ROW 1 Gb Array mm 2.5 Multiple implementations COLUMN COLUMN 4.9 mm 4.9 256M Array 256M Array ROW GLOBAL CORNER COLUMN Qawi Harvard – Oct. 8th,2009 Thesis Defense 17 4 Gb DRAM Architecture Chip Size = 71.4 mm2 ITRS Array Efficiency = 57.7% 74 mm2 7.0 mm 56% Array Efficiency 256M 256M 256M 256M Row Row Wide I/O Architecture Array Array Array Array Moving the pads Column Column Column Column Centralized Row 256M 256M 256M 256M Row Row Array Array Array Array Centralized Column SPINE 0.4 mm 10.2 mm 10.2 256M 256M 256M 256M Row Row Array Array Array Array Column Column Column Column 256M 256M 256M 256M Row Row Array Array Array Array Ref:[29] Qawi Harvard – Oct. 8th,2009 Thesis Defense 18 Wide I/O Chip Architecture 64 Bytes per Chip Chip Size = 67.0 mm2 6.2% Chip Size Reduction Array Efficiency = 61.5% 6.7 mm 6.6% Increase in Array RO 256M Array W 256M Array Efficiency 256M 256M 256M 256M Array Array RORow Array Array Challenges 256M Array W 256M Array RO W 256M Array mm 4.6 Routing from the edge 256M 256M 256M 256M Array Array RORow Array Array Array I/O route increase W 256M Array o 2.3 mm → 4.6 mm RO 256M Array W 256M Array 10.0 mm 10.0 256M 256M 256M 256M Additional row decode Array Array RORow Array Array 256M Array W 256M Array Create Eight Internal Banks 256M Array 256M 256M 256M 256M Array Array RORow Array Array W 256M Array Proximity Interface Qawi Harvard – Oct. 8th,2009 Thesis Defense 19 4 Gb DRAM Architecture 1 mm Allocation for Proximity Chip Size = 69.68 mm2 Array Efficiency = 59.2% Channel 6.7 mm Buffers at the Center 3.2 mm Increase global I/O metal usage 512M Bank 512M Bank ROW Array I/O Routing Reduced to mm 2.3 2.3 mm 512M Bank 512M Bank Architecture NOT Efficient for ROW Proximity Communication 10.4 mm 10.4 512M Bank 512M Bank ROW 6.7 mm versus 10.4 mm mm 7.0 Buffers required 512M Bank 512M Bank Large metal usage ROW Proximity Interface Qawi Harvard – Oct.
Recommended publications
  • Computational Science and Engineering İSTANBUL TECHNICAL
    İSTANBUL TECHNICAL UNIVERSITY INFORMATICS INSTITUTE A NEW PARALLEL PROGRAMING LANGUAGE FORTRESS: FEATURES AND APPLICATIONS M.Sc. Thesis by Erdem ÜNEY Department : Informatics Institute Programme : Computational Science and Engineering SEPTEMBER 2009 İ STANBUL TECHNICAL UNIVERSITY INFORMATICS INSTITUTE A NEW PARALLEL PROGRAMING LANGUAGE FORTRESS: FEATURES AND APPLICATIONS M.Sc. Thesis by Erdem ÜNEY (702051007) Date of submission : 28 August 2009 Date of defense examination: 11 September 2009 Supervisor (Chairman) : Prof. Dr. H. Nüzhet DALFES (İTU) Members of the Examining Committee : Prof. Dr. Serdar ÇELEBİ (İTU) Prof. Dr. Hasan DAĞ (KHAS) SEPTEMBER 2009 İSTANBUL TEKNİK ÜNİVERSİTESİ BİLİŞİM ENSTİTÜSÜ YENİ BİR PARALEL PROGRAMLAMA DİLİ FORTRESS: ÖZELLİKLERİ VE UYGULAMALARI YÜKSEK LİSANS TEZİ Erdem ÜNEY (702051007) Tezin Enstitüye Verildiği Tarih : 28 Ağustos 2009 Tezin Savunulduğu Tarih : 11 Eylül 2009 Tez Danışmanı : Prof. Dr. H. Nüzhet DALFES (ITU) Diğer Jüri Üyeleri : Prof. Dr. Serdar ÇELEBİ (ITU) Prof. Dr. Hasan DAĞ (KHAS) EYLÜL 2009 In the loving and constantly illuminating memory of my father, Tuncer Üney… FOREWORD I would like to express my deep appreciation and thanks for my advisor, Prof. Dalfes. Every student presents his regards to his advisor but without the support of my advisor from the days of my undergraduate thesis, I would not be able to seek the academic appretiation and complete the process of graduating. I also would like to thank my good friends İlker Kopan, Sayat Baronyan and lovely Pelin Çallı for their support and motivation during the course of my thesis. Last but not least I would like to thank my little brother and my mother, who is constantly pushing me to go forward.
    [Show full text]
  • Wide I O DRAM Architecture Utilizing Proximity Communication
    WIDE I/O DRAM ARCHITECTURE UTILIZING PROXIMITY COMMUNICATION by Qawi IbnZayd Harvard A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering Boise State University October 2009 BOISE STATE UNIVERSITY GRADUATE COLLEGE DEFENSE COMMITTEE APPROVAL of the thesis submitted by Qawi IbnZayd Harvard We have read and discussed the thesis submitted by student Qawi IbnZayd Harvard, and we have also evaluated his presentation and response to questions during the final oral examination. We find that the student has passed the final oral examination, and that the thesis is satisfactory for a master’s degree and ready for any final modifications that we may explicitly require. ______________________ __________________________________________ Date R. Jacob Baker, Ph.D. Chair, Supervisory Committee ______________________ __________________________________________ Date Sin Ming Loo, Ph.D. Member, Supervisory Committee ______________________ __________________________________________ Date Thad Welch, Ph.D. Member, Supervisory Committee i BOISE STATE UNIVERSITY GRADUATE COLLEGE FINAL READING APPROVAL of the thesis submitted by Qawi IbnZayd Harvard To the Graduate College of Boise State University: I have read the thesis of Qawi IbnZayd Harvard in its final form and have found that (1) the modifications required by the defense committee are complete; (2) the format, citations, and bibliographic style are consistent and acceptable; (3) the illustrative materials including figures, tables, and charts are in place; and (4) the final manuscript is ready for submission to the Graduate College. ______________________ __________________________________________ Date R. Jacob Baker, Ph.D. Chair, Supervisory Committee Approved for the Graduate College: ______________________ __________________________________________ Date John R. Pelton, Ph.D.
    [Show full text]
  • Multi-Threading in Electrictm, a Java VLSI CAD Tool
    Multi-threading in ElectricTM, a Java VLSI CAD Tool Gilda Garretón Sun Microsystems Laboratories JUG.cl May 2008 Agenda •Who we are •Research activities •Electric, a Java VLSI framework •Multi-Threading 2 projects •Lessons learned Sun Microsystems Laboratories @ 2008 Copyrights 2 Sun Microsystems Laboratories •Center for innovation •Mission •Research Strategy •Collaboration Sun Microsystems Laboratories @ 2008 Copyrights 3 Our Technology Changes the World Fortress Sun SPOT Darkstar Digital Rights Tools for Search Solaris Honeycom Elliptic Demand on b Curve Forecasting Power Cryptogra PC phy UltraSPARC® Electric Sun Sun Ray V9 Cluster Developed in Sun Labs, transferred to Sun products and to the outside world Sun Microsystems Laboratories @ 2008 Copyrights 4 Sun Labs Project Focus System System System Network Hardware Software Science Clients Sun Microsystems Laboratories @ 2008 Copyrights 5 VLSI Research Group •Diverse group (~20) Moscow >Hardware (14) >Software (4) >Both (2) •Education >BS, MS, PhD •Rank >1 PE, 4 DEs and 1 Sun Fellow •Collaboration aims innovation >Research: circuit (HW) and CAD (SW) Sun Microsystems Laboratories @ 2008 Copyrights 6 VLSI Circuit Research •Design circuits to enable novel architectures •Low-power circuits •High-speed circuits •Asynchronous circuits •Communication Links >Proximity Communication Chip2 Chip1 1 Transmit Receive 2 Receive Transmit Sun Microsystems Laboratories @ 2008 Copyrights 7 VLSI CAD Research •Challenges > Size and performance > Hierarchical representation > Shrinking technologies •Research
    [Show full text]
  • Session the Use of Robots and Gamification in Education
    Int'l Conf. Frontiers in Education: CS and CE | FECS'16 | 1 SESSION THE USE OF ROBOTS AND GAMIFICATION IN EDUCATION Chair(s) TBA ISBN: 1-60132-435-9, CSREA Press © 2 Int'l Conf. Frontiers in Education: CS and CE | FECS'16 | ISBN: 1-60132-435-9, CSREA Press © Int'l Conf. Frontiers in Education: CS and CE | FECS'16 | 3 An Evaluation of Simulation in Lego Mindstorms Robot Programming Coursework Frank Klassner1,2 and Sandra Kearney1,2 1Department of Computing Sciences, Villanova University, Villanova, PA USA 2Center of Excellence in Enterprise Technology, Villanova University, Villanova, PA USA Abstract – Robotics programming has become a significant Combining this observation with the research cited at the element of undergraduate computer science curricula over the beginning of the section, it is natural to ask what role past decade. This paper presents an initial examination of the simulation can play in conjunction with physical robots in effectiveness of simulators in helping undergraduates in computer science education, particularly in cases where computer science courses produce moderately debugged code students might be distracted from programming issues by on simulated robots before further refining it on physical “messy” robot interactions with the real world. There is also robots (Mindstorms NXT). The examination included a study the question of how to educate computer science majors about of how students evaluated their experience programming the industrial practice of using simulation to design and test robots in Java with and without a simulator in novice through robot systems before they are construed. advanced level courses. The paper describes the simulator for the study, the robot projects studied by the students, and This paper reports on a research project that evaluates the the survey instrument used in the examination.
    [Show full text]
  • Tightly-Coupled and Fault-Tolerant Communication in Parallel Systems
    Tightly-Coupled and Fault-Tolerant Communication in Parallel Systems Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Universität Mannheim vorgelegt von Dipl.-Inf. David Christoph Slogsnat aus Heidelberg Mannheim, 2008 Dekan: Prof. Dr. Matthias Krause, Universität Mannheim Referent: Prof. Dr. Ulrich Brüning, Universität Heidelberg Koreferent: Prof. Dr. Reinhard Männer, Universität Heidelberg Tag der mündlichen Prüfung: 4. August 2008 Abstract The demand for processing power is increasing steadily. In the past, single processor archi- tectures clearly dominated the markets. As instruction level parallelism is limited in most applications, significant performance can only be achieved in the future by exploiting par- allelism at the higher levels of thread or process parallelism. As a consequence, modern “processors” incorporate multiple processor cores that form a single shared memory multi- processor. In such systems, high performance devices like network interface controllers are connected to processors and memory like every other input/output device over a hierarchy of periph- eral interconnects. Thus, one target must be to couple coprocessors physically closer to main memory and to the processors of a computing node. This removes the overhead of today’s peripheral interconnect structures. Such a step is the direct connection of Hyper- Transport (HT) devices to Opteron processors, which is presented in this thesis. Also, this work analyzes how communication from a device to processors can be optimized on the protocol level. As today’s computing nodes are shared memory systems, the cache coherence protocol is the central protocol for data exchange between processors and devices. Consequently, the analysis extends to classes of devices that are cache coherence protocol aware.
    [Show full text]
  • Multi-Threading in VLSI CAD
    How Open Source and Collaboration aid Innovation in VLSI CAD Gilda Garretón Sun Microsystems Laboratories February 2010 Agenda • Collaboration in a research lab • VLSI circuit research • VLSI CAD research • VLSI design process • CAD projects in multithreading • Conclusions 2 Garretón Sun Microsystems Laboratories @ 2010 Copyrights A Research Lab • Applied research aligned with company business • Expert in n engineering fields, but knowledgeable in m (where m > n) • Communicate and collaborate with colleagues • Collaborate with universities • Contribute to open-source and standards initiatives • “Innovate, Demonstrate, Transfer” 3 Garretón Sun Microsystems Laboratories @ 2010 Copyrights Technology Changes the World Fortress Sun SPOT Darkstar Digital Rights Tools for Demand Search Solaris on Honeycomb Elliptic Curve Forecasting Power PC Cryptography UltraSPARC® V9 Electric Sun Cluster Sun Ray Developed in Sun Labs, transferred to Sun products and to the outside world 4 Garretón Sun Microsystems Laboratories @ 2010 Copyrights Projects Clustering System System Network System Science Hardware Software Clients 5 Garretón Sun Microsystems Laboratories @ 2010 Copyrights Projects Clustering System System Network System Science Hardware Software Clients 6 Garretón Sun Microsystems Laboratories @ 2010 Copyrights VLSI Research Research Cloud VLSI Circuit Optics/photonics Research Research VLSI CAD Datacenter Research Switch Research Proximity CAD Tools Packaging Communication Prototype Test Chips Switch Prototype 7 Garretón Sun Microsystems Laboratories
    [Show full text]
  • How Collaboration Aids Innovation
    Research in Industrial Labs: How Collaboration Aids Innovation Tarik Ono Gilda Garretón Sun Microsystems Laboratories October 6, 2006 Overview Research Lab VLSI Circuit Research Datacenter VLSI CAD Switch Research Research Proximity Communication CAD Tools Test Chips Switch Prototype 2 Sun Microsystems Laboratories Life in a Research Lab Research Lab VLSI Circuit Research Datacenter VLSI CAD Switch Research Research Proximity Communication CAD Tools Test Chips Switch Prototype 3 Sun Microsystems Laboratories Life in a Research Lab: Sun Microsystems Laboratories • Applied research aligned with company business • Expert in n engineering fields, but knowledgeable in m (where m > n) • Communicate and collaborate with colleagues • Collaborate with universities • Contribute to open-source and standards initiatives • “Innovate, Demonstrate, Transfer” 4 Sun Microsystems Laboratories VLSI Circuit and CAD Research Research Lab VLSI Circuit Research Datacenter VLSI CAD Switch Research Research Proximity Communication CAD Tools Test Chips Switch Prototype * VLSI = “Very Large Scale Integration” CAD = “Computer Aided Design” 5 Sun Microsystems Laboratories Usual Interaction Model Company A Company B Circuit Design CAD Development 6 Sun Microsystems Laboratories Usual Interaction Model Sales Company A Company B Circuit Design !#@! CAD Development 7 Sun Microsystems Laboratories VLSI Circuit and CAD Research Group 8 Sun Microsystems Laboratories VLSI Circuit Research Research Lab VLSI Circuit Research Datacenter VLSI CAD Switch Research Research Proximity
    [Show full text]
  • Final Year Project Thesis Mo S H a Di
    Final Year Project Thesis Mo S H A Di Shape-Shifting Human Addressable Digital Money by Chishala Matete Mpundu 29071011 Thesis submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science at the Department of Computer Studies, under the School of Natural Sciences, University of Zambia. Supervisor Mr. David M. Zulu 2013 1 DECLARATION I, the undersigned here declare that the Shape-Shifting Human Addressable Digital Money System is my own work, that it has not been submitted for any degree or examination in any other university, and that all the sources I have used or quoted have been indicated and acknowledged by complete references. Name: Chishala Matete Mpundu Signature:…………. Supervisor: Mr. David M. Zulu Signature:.………… Date: September 2013 2 ACKNOWLEDGEMENT I would like to thank all the lecturers at the Department of Computer Studies for teaching me most of what I know and instilling a level of discipline. I thank my supervisor, Mr. David M. Zulu for the guidance and help he offered in this undertaking and I thank my family for being there for me in the good and bad times when it all seemed impossible. I thank God for bringing me this far because years back I never thought I‟d even have an opportunity of having tertiary education. 3 Contents Abstract ................................................................................................................................................................. 6 Chapter 1 .............................................................................................................................................................
    [Show full text]