TECHNOLOGY ROADMAP DOCUMENT FOR SKA SIGNAL PROCESSING
Document number ...... WP2‐040.030.011‐TD‐001 Revision ...... 1 Author ...... W.Turner Date ...... 2011‐02‐27 Status ...... Issued
Name Designation Affiliation Date Signature Additional Authors
Submitted by:
W. Turner Signal Processing SPDO 2011‐03‐26 Domain Specialist
Approved by:
P. Dewdney Project Engineer SPDO 2010‐03‐29
WP2‐040.030.011‐TD‐001 Revision : 1
DOCUMENT HISTORY
Revision Date Of Issue Engineering Change Comments Number
1 ‐ ‐ First issue
DOCUMENT SOFTWARE
Package Version Filename
Wordprocessor MsWord Word 2007 02‐WP2‐040.030.011.TD‐001‐1_SKATechnologyRoadmap
Block diagrams
Other
ORGANISATION DETAILS
Name SKA Program Development Office Physical/Postal Jodrell Bank Centre for Astrophysics Address Alan Turing Building The University of Manchester Oxford Road Manchester, UK M13 9PL Fax. +44 (0)161 275 4049 Website www.skatelescope.org
2011‐02‐27 Page 2 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
TABLE OF CONTENTS
1 INTRODUCTION ...... 9 1.1 Purpose of the document ...... 9 1.2 Technology Readiness Levels ...... 10
2 REFERENCES ...... 12
3 PROCESSING ...... 14 3.1 General Purpose Processor ...... 14 3.1.1 Theoretical Processing Performance ...... 17 3.1.2 Cost ...... 17 3.1.3 Thermal Dissipation ...... 17 3.1.4 Scalability ...... 18 3.2 Graphics Processing Unit ...... 19 3.2.1 Intel ...... 19 3.2.2 ATI (AMD) ...... 21 3.2.3 NVIDIA ...... 22 3.2.4 Theoretical Processing Performance ...... 23 3.2.5 Cost ...... 24 3.2.6 Thermal Dissipation ...... 24 3.3 Field Programmable Gate Array...... 25 3.3.1 Theoretical Processing Performance ...... 28 3.3.2 Cost ...... 28 3.3.3 Thermal Dissipation ...... 30 3.3.4 Hard Copy ...... 31 3.4 Application Specific Integrated Circuit ASIC ...... 31 3.4.1 Process Size ...... 31 3.4.2 Masking Costs ...... 35 3.4.3 Yield and Die Costs ...... 35 3.4.4 Prototyping ...... 37 3.5 Gap between FPGAs and ASICS ...... 37 3.5.1 Theoretical Processing Performance ...... 38 3.5.2 Cost ...... 38 3.5.3 Thermal Dissipation ...... 39 3.6 Network on Chip, NoC...... 39
4 STORAGE ...... 42 4.1 SRAM ...... 45 4.1.1 SRAM performance ...... 46 4.1.2 SRAM Thermal Dissipation ...... 46 4.1.3 SRAM Cost ...... 46 4.2 Dynamic Random Access Memory, DRAM ...... 46
2011‐02‐27 Page 3 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
4.2.1 DRAM Performance ...... 47 4.2.2 DRAM Cost ...... 48 4.2.3 DRAM Thermal Dissipation ...... 49 4.3 Flash Memory ...... 50 4.3.1 NAND Cost ...... 52 4.3.2 NAND Thermal Dissipation ...... 52 4.4 Storage Class Memory ...... 52 4.4.1 SCM Performance ...... 53 4.4.2 SCM Cost ...... 54 4.4.3 SCM Thermal Dissipation ...... 54
5 DISK STORAGE ...... 54 5.1.1 Disk Performance ...... 55 5.1.2 Disk Thermal Dissipation ...... 56 5.1.3 Disk Cost ...... 56
6 NETWORK ...... 57 6.1 Infiniband ...... 57 6.1.1 Infiniband Performance Roadmap ...... 57 6.1.2 Host Channel Adapters ...... 58 6.1.3 Infiniband switches ...... 58 6.2 Ethernet ...... 59 6.2.1 100 G bit/s Ethernet Switches ...... 60 6.2.2 Terabit Ethernet ...... 60 6.2.3 Ethernet Cost ...... 60 6.2.4 Ethernet Thermal Dissipation ...... 60 6.3 Optical Interconnect ...... 61 6.3.1 Performance...... 63 6.3.2 Thermal Dissipation ...... 64 6.3.3 Cost ...... 64
7 APPENDIX 1 ...... 64 7.1 Moore’s Law ...... 64 7.2 Transistor Size ...... 66 7.3 Breaking Moore’s Law ...... 67 7.4 Moore’s Law and Processing Capability ...... 67
8 APPENDIX 2 ...... 68 8.1 Tilera ...... 68 8.2 Clearspeed ...... 69 8.3 PicoChip...... 70 8.4 Other Technologies ...... 71
2011‐02‐27 Page 4 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
LIST OF FIGURES
Figure 1 Computations per kilowatt hour over time ...... 15 Figure 2 Intel’s Tick Tock Roadmap ...... 16 Figure 3 Parallel speed up ...... 18 Figure 4 Intel’s Science Computing Road‐Map ...... 20 Figure 5 Intel Roadmap ...... 20 Figure 6 ATI Graphics accelerator with 8 GPU cards ...... 21 Figure 7 NVIDIA GPU Historic Roadmap ...... 22 Figure 8 NVIDIA Tesla S2050unit plan view ...... 23 Figure 9 Tesla S2050 Architecture ...... 23 Figure 10 CUDA GPU Processing power per Watt Road‐map ...... 24 Figure 11 Gates per unit area of silicon as a function of process size ...... 32 Figure 12 IBM ASIC Gate Delays ...... 32 Figure 13 IBM ASIC Dynamic Power...... 33 Figure 14IBM ASIC Static Power ...... 34 Figure 15 Total chip dynamic and static power dissipation trends ...... 34 Figure 16 Mask Tooling Costs ...... 35 Figure 17 Example NoC and processing Tile ...... 40 Figure 18 Silicon Implementation ...... 40 Figure 19 Artist’ concept of 3D silicon processor chip with optical IO layer featuring on‐chip nanophotonic network ...... 42 Figure 20 Storage taxonomy ...... 43 Figure 21 Storage Hierarchy...... 44 Figure 22 Samsung’s Memory Technology and Solutions Roadmap ...... 44 Figure 23 Samsung’s DRAM Historic Roadmap ...... 47 Figure 24 Samsung’s DRAM Historic Roadmap ...... 47 Figure 25 Samsung DDR DRAM Performance Roadmap ...... 48 Figure 26 DRAM Chip Selling Price December 2010 ...... 49 Figure 27 Samsung DRAM: Measured Thermal Dissipation ...... 49 Figure 28 NAND and NOR Flash Memory Schematics and Cell layout ...... 50 Figure 29 Intel Micron Historic Flash Roadmap ...... 51 Figure 30 NAND Cost per M Byte Road Map ...... 52 Figure 31 SCM Roadmap in relation to NAND, DRAM and Hard Disk (HDD) ...... 54 Figure 32 Historic Roadmap for Disk Areal Density ...... 55 Figure 33 Historic Roadmap for Disk Bandwidth ...... 55 Figure 34 Infiniband Roadmap ...... 58 Figure 35 Ethernet PHY standards ...... 59 Figure 36 Alcatel Lucent Power Consumption Roadmap ...... 61 Figure 37 CFP Hardware Specification Power Interlock...... 61 Figure 38 IBM Terra Bus Overview ...... 62 Figure 39 IBM Terrabus Integrated Circuit Connectivity ...... 63 Figure 40 IBM Terrabus Integrated Circuit and Printed Circuit board Optical Connectivity ...... 63
2011‐02‐27 Page 5 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
Figure 41 Numbers of Transistors for Intel Processors ...... 64 Figure 42 ITRS transistor cost predictions ...... 65 Figure 43 Roadmap of Transistor Size ...... 66 Figure 44 Physical Scaling of Parameters for a Semi‐conductor gate ...... 67 Figure 45 Tilera Tile Processor architecture ...... 69 Figure 46 Clearspeed’s CSX 700 ...... 70 Figure 47 PicoChip’s Pico Array Architecture...... 71
LIST OF TABLES
Table 1 Technology readiness levels as risk likelihood indicators ...... 10 Table 2 Technology Readiness Level Definitions ...... 11 Table 3 Intel’s Tick Tock Time Line ...... 16 Table 4 Xilinx Current Virtex 6 product range ...... 26 Table 5 Xilinx Next Generation FPGA (Virtex 7) ...... 27 Table 6 Xilinx pricing on 29th December 2010 for Virtex 6 Devices ...... 29 Table 7 FPGA to ASIC Gap Summary ...... 37 Table 8 NoC Packet transmission Energies ...... 41 Table 9 Current Baseline and Prototypical Memory Technologies (ITRS 2007) ...... 45 Table 10 Semiconductor parameter growth ...... 65 Table 11 Device Scaling factors ...... 66
2011‐02‐27 Page 6 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
LIST OF ABBREVIATIONS
AA ...... Aperture Array Ant...... Antenna API ...... Application Programming Interface ASIC ...... Application Specific Integrated Circuit BER ...... Bit Error Rate CAD ...... Computer Aided Design CAGR ...... Compound Annual Growth Rate CoDR ...... Conceptual Design Review COTS ...... Commercial off te Shelf cm ...... centmetre CPU ...... Central Processing Unit DDR ...... Double Data Rate DOD ...... Department of Defence DRAM ...... Dynamic Random Access Memory DRM ...... Design Reference Mission DSP ...... Digital Signal Processor EDA ...... Electronic Design Automation EoR ...... Epoch of Reionisation EX ...... Example FFT ...... Fast Fourier Transform FLOPS ...... Floating Point Operations per second FoV ...... Field of View FPGA ...... Field Programmable Gate Array GPU ...... Graphics Processing Unit HCA ...... Host Channel Adapter HDD ...... Hard Disk Drive HDL ...... High Definition Language HDR ...... High Data Rate Hz ...... Herz IDR ...... Internal Data Rate IFFT ...... Inverse Fast Fourier Transform I/O ...... input/ output IP ...... Intellectual Property K ...... Kelvin LNA ...... Low Noise Amplifier MAC ...... Multiply Accumulate MLM ...... Multi-Layer Mask
2011‐02‐27 Page 7 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
MMF ...... Multi Mode Fibre MPW ...... Multi-Project Wafer MW ...... Mega Watt nm ...... nano metre NoC ...... Network on Chip NDA ...... Non Disclosure Agreement NDR ...... Next Data Rate NRE ...... Non Recurring Engineering Ny ...... Nyquist OH ...... Over Head ONoC ...... Optical Network on Chip OS ...... Operating System OTPF ...... Observing Time Performance Factor Ov ...... Over sampling PAF ...... Phased Array Feed PCI ...... Peripheral Component Interconect PCIe ...... PCI Express PrepSKA...... Preparatory Phase for the SKA Rd ...... read RFI ...... Radio Frequency Interference rms ...... root mean square RRAM ...... Resistive Random Access Memory SCM ...... Storage Centric Memory SEFD...... System Equivalent Flux Density SER ...... Soft Error Rate SKA ...... Square Kilometre Array SKA1 ...... SKA Phase 1 SKA2 ...... SKA Phase 2 SKADS ...... SKA Design Studies SMF ...... Single Mode Fibre SPDO ...... SKA Program Development Office SRAM ...... Static Random Access Memory SSD ...... Solid State Drive SSFoM ...... Survey Speed Figure of Merit TBD ...... To be decided TRL ...... Technology Readiness Level Wr ...... write Wrt ...... with respect to
2011‐02‐27 Page 8 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
1 Introduction The aim of this document is provide an overview of the technology that could potentially form the basis of the signal processing for the SKA telescope. It is intended that this document should be reviewed and updated on an annual basis leading up to phase 1 and phase 2 of the telescope to provide an up to date perspective as input to the technology selection process. This is intended to be a complementary activity abstracted from specific Concept Designs. Consequently, the document focus is the technology options and their attributes rather than design details. It is intended that the document should provide a wide coverage of technology; however, the level of detail provided on specific technologies will be proportional to the perceived relevance of the technology at the time of writing. One limitation of this document is that its scope is restricted to information available in the public domain. For obvious reasons, commercial manufacturers tend to be quite guarded about their specific road maps and may only release details under Non Disclosure Agreements, NDAs. However, this is not considered a major limitation in providing a reasonable overview for a technology roadmap particularly one that is to be updated on an annual basis. This document is part of a series generated in support of the Signal Processing CoDR which includes the following:
Signal Processing High Level Description
Technology Roadmap
Design Concept Descriptions
Signal Processing Requirements
Signal Processing Costs
Signal Processing Risk Register
Signal Processing Strategy to Proceed to the Next Phase
Signal Processing Co DR Review Plan
Software & Firmware Strategy
1.1 Purpose of the document
The overall purpose of this document is to identify the road map of processing and communication technology applicable to the SKA signal processing. This is to include:
Identify known potential technologies applicable to the SKA
Where possible project attributes of known technology to the time frame of the SKA in terms of:
o Performance
o Cost
2011‐02‐27 Page 9 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
o Thermal Dissipation
Provide an overview of potential future technologies that may be applicable to the SKA within the time frame of the SKA1 or SKA2.
List ‘also ran’ technologies that have been considered but have been considered unsuitable in their current format
1.2 Technology Readiness Levels
For a document detailing a technology roadmap the issue of technology readiness needs to be raised. The Risk Management PLAN MGT‐040.040.000‐MP‐001 iss 1 proposes that a condensed version of the United States Department of Defence (DOD) and NASA technology readiness levels (TRL) be used to estimate the likelihood of occurrence for the relevant technology and these are shown in Table 1
Table 1 Technology readiness levels as risk likelihood indicators
It is important to note that the technology readiness may differ from one hierarchical level to the next. For example ‐ individual components may be freely available implying that the risk for procurement at the component level is low. However, if these components have not yet been integrated and shown to fulfil the required functions in the required environment at the next hierarchical level, the risk at this higher level will be high.
The definitions of the technology readiness levels are shown in Table 2. These definitions should be taken into account along with the risk likelihood level when using the roadmap to inform any concept implementation.
2011‐02‐27 Page 10 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
Table 2 Technology Readiness Level Definitions
2011‐02‐27 Page 11 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
2 References [1] International Technology Roadmap for Semiconductors (ITRS), available at www.itrs.net. [2] Terrabus: a Chip‐to‐Chip Parallel Optical Interconnect J A Kash et al. [3] Progress in Digital Integrated Electronics G Moore Technical Digest‐IEEE Int’l Electronic Devices Meeting Vol 21 1975 pp 11‐13 [4] Establishing Moore’s Law Ethan Mollick IEEE Annals of the History of Computing vol 28 No. 3 2006 pp 62 ‐ 75 [5] Three Steps to the Thermal Noise Death of Moore’s Law Jacek Izydorczyk IEEE trans VLSI Systems Vol 18 No.1 2010 pp 161 ‐ 165 [6] Limits to Binary Logic Switch Scaling—A Gedanken Model Victor V. Zhirnov et al Proc. IEEE vol 91 no 11 2003 pp 1934 ‐ 1939 [7] Limits on Silicon Nanoelectronics for Terascale Integration J. D Meindl Vol293 Science [8] Microprocessor Scaling: What Limits Will Hold? Jacek Izydorczyk IEEE Computer Aug 2010 [9] Emerging Research Memory and Logic Technology J A Hutchby et al. IEEE Circuits & Devices Magazine vol 21 No. 3 2005 pp 47 – 51 [10] Future Trends in Microelectronics S Luryi, J Xu & A Zaslavsky John Wiley & Sons [11] The High‐K Solution M T Bohr, R Chau & K Mistry IEEE Spectrum vol 44 No. 10 2007 pp 29‐ 35 [12] Quantifying and Exploring the Gap Between FPGAs and ASICS Ian Kuon & Jonathan Rose Springer [13] Explaining the gap between ASIC and custom power: a custom perspective A Chang, W J Dally DAC ’05 Proceedings of the 42nd annual conference on Design automation pp 281 – 284 ACM New York 2005 [14] Closing the Gap Between ASIC & Custom Tools and Techniques for High‐Performance ASIC Design D.Chinnery, K Keutzer Kluwer New York 2002 [15] Closing the Power Gap Between ASIC and Custom: an ASIC perspective. DAC ’05 Proceedings of the 42nd annual conference on Design automation pp 275 – 280 ACM New York 2005 [16] The role of custom design in ASIC chips DAC ’00 Proceedings of the 37th annual conference on Design automation pp 643 – 647 ACM New York 2005 [17] J G. Koomey Assessing Trends in the Electrical Efficiency of Computation Over Time report to Microsoft and Intel Corporations [18] Computer Architecture a Quantitative Approach Hennessy and Patterson [19] A 51mW 1.6 GHz on‐chip network for low‐power heterogeneous SoC platform Kangmin Lee et al, IEEE Int. Solid‐States Circuit Conference, Digest of Technical papers, pp 152‐512 Feb 2004 [20] An 800MHz star‐connected on‐chip network for application to systems on a chip: Se‐Joong Lee et al, IEEE Int. Solid‐States Circuits Conf. Digest of Technical papers, pp.468‐469 Feb 2003 [21] Low‐Power NoC for High‐Performance SoC Design, Hoi‐Jun Yoo, Kangmin Lee, Jun Kyoung Kim, CRC Press 2008 An 80‐Tile 1 .28 TFLOPS Network‐on‐Chip in 65nm CMOS, Sriram Vangali, Jason Howard, Gregory Ruhl, Saurabh Dighe, Howard Wilson, James Tschanz, David Finan, Priya Iyerl, Arvind Singh, Tiju
2011‐02‐27 Page 12 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
Jacob, Shailendra Jain, Sriram Venkataraman, Yatin Hoskote, Nitin Borkar ISSCC 2007/1 Session 5/1 Microprocessors / 52
2011‐02‐27 Page 13 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
3 Processing The scale of the SKA Signal Processing has some onerous processing and signal transport requirements due to its sheer scale whilst being constrained by cost and thermal dissipation.
Of the potential solutions, four processing technologies are currently popular with astronomy engineering community and potentially offers solutions within the timeframe of the SKA:
General Purpose Processor
GPU
FPGA
ASIC
However, there are other interesting developments that aren’t in the mainstream that could potentially pave the way to a solution. The Appendix details some of these options.
3.1 General Purpose Processor
The term general purpose processor is nominally used to identify x86 architecture processors manufactured by Intel and AMD and are typically programmed in a high level language. Other processors also fall into this category such as Motorola’s Vector processing and Sun’s Niagara. Each of these processors is aimed at providing a highly flexible programming platform coupled to a supporting an Operating System, OS. One cost of providing this general purpose capability is the power efficiency of the platform that requires extra hardware to support the inbuilt flexibility. For example, the processing unit will typically be 32 or 64 bit floating point irrespective of the data word length. A metric typically used to indicate the processing efficiency is processing capability per kilowatt, kW.
Figure 1 details the roadmap of the theoretical processing capability per kW hour of dissipation for general purpose computer over the period 1945 through to 2010. Projecting from this graph suggests 2.7 x 1016 computations per kW hour by 2015 or alternatively 7.5 x 1015 computations per second per Mega watt dissipation. An industry target of ~20 MW exists for Exascale computing by 2020. This can be shown to be consistent with projections from Figure 1.
2011‐02‐27 Page 14 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
(J G. Koomey Stanford)[17]
Figure 1 Computations per kilowatt hour over time
At present (October 2010) Intel processor chips dominate the Top 500 supercomputers with over 80% of processors being Intel. On this basis, the roadmap of Intel processors is presented as being representational of the roadmap for x86 architecture general purpose processors. The information presented is in the public domain and has largely been harvested from the Internet including Intel’s own web‐site.
Intel’s strategy for processor developments is based on a time line known as ‘the Tick Tock roadmap’ and is detailed in Figure 2. The Tick of the time line represents a process change and the Tock represents a processor architecture change. The current technology is at a 45nm process with the Nehalem architecture. The top end performance of the 45nm technology is likely to be achieved with the ‘Beckton’ Xeon processor which should provide 8 processor cores running at up to 2.3 GHz for 130 Watts processor dissipation and at a unit price of $3.7k.
2011‐02‐27 Page 15 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
Figure 2 Intel’s Tick Tock Roadmap
Architecture Change Fabrication Release Energy scaling Delay Scaling Process Date
Tick Shrink/derivative (Penryn) 45nm 2008 0.5 > 0.7
Tock Microarchitecture (Nehalem) 2009
Tick Shrink/derivative (Westmere) 32nm 2010 0.5 > 0.7
Tock Microarchitecture (Sandy 2011 0.5 > 0.7 Bridge)
Tick Shrink/derivative Ivy Bridge 22nm 2012 0.5 > 0.7
Tock Microarchitecture Haswell 2013 0.5 > 0.7
Tick Shrink/derivative Rockwell 16nm 2014 0.5 ~1
Tock Microarchitecture TBD 2015 0.5 ~1
Table 3 Intel’s Tick Tock Time Line
Table 3 summarises Intel’s tick‐tock roadmap process through to the 16nm process. Intel also has some more speculative projections through to 4nm technology by 2022.
These figures suggest that there should be a factor of two improvement in thermal dissipation for the same processing capability for each die shrink. To achieve this presents some technical challenges as leakage current becomes more of a problem as feature size is reduced. A discussion of this issue is provided later in the document as it is applicable to other processing technologies too.
2011‐02‐27 Page 16 of 71
WP2‐040.030.011‐TD‐001 Revision : 1
Another major architectural limitation is the thermal density achievable by the processor chip’s packaging which is currently of the order of 140 W per cm2 for a commercial 2 dimensional device. It is this limitation that has recently brought a halt to ever increasing processor clock rates and driven the architecture down the path of multi‐core processing. The use of three dimensional packaging can provide a one off step improvement on the achievable thermal density.
3.1.1 Theoretical Processing Performance
Typically, the theoretical maximum processing power, in G FLOPS, offered by a single general purpose (x86) processor is: