Intrachip Optical Networks for a Future Supercomputer-On-A-Chip
Total Page:16
File Type:pdf, Size:1020Kb
IntraChip Optical Networks for a Future Supercomputer-on-a-Chip Jeffrey Kash, IBM Research IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Acknowledgements IBM Research: Yurii Vlasov, Clint Schow, Will Green, Fengnian Xia, Jose Moreira, Eugen Schenfeld, Jose Tierno, Alexander Rylyakov, Columbia University: Keren Bergman, Luca Carloni, Rick Osgood Cornell University: David Albonesi, Alyssa Apsel, Michal Lipson, Jose Martinez UC Santa Barbara: Daniel Blumenthal, John Bowers 2 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Outline Optics in today’s HPCs Trends in microprocessor design – Multi-core designs for power efficiency Vision for future IntraChip Optical Networks (ICON) – 3D stack of logic, memory, and global optical interconnects Required devices and processes – Low power and small footprint 3 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Today’s High Performance Server Clusters: Racks are mainly electrically connected, but going optical NEC Earth Simulator (during installation) Real systems are 10-100s of server racks -All copper -Next gen: Optics and several racks of switches – Rack-to-rack interconnects (≤100m) now moving to optics – Interconnects within racks (≤5m) now primarily copper Over time, optics will increasingly replace copper at shorter and shorter distances All Electrical All Optical • Backplane and card interconnects (≤1m) after rack-to-rack – Trend will accelerate as bitrates (in the media) increase and costs come down Snap 12 module • 2.5Gb/s Æ 5 Gb/s Æ 10Gb/s Æ20Gb/s(?) 12 Tx or Rx at 2.5Gb/s placement at back of rack • Target ~ $1/Gb/s IBM Federation Switch for ASCI Purple (LLNL) (backside of a switch rack) - Copper (bulk, bend, weight, air cooling) - Optical (very organized but more expensive) 4 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Beyond bitrate, density is a major driver of optics. Connectors Cables HM-Zd 10Gbps connector 40 differential pairs (25mm wide) Fiber-Ribbon MT fiber ferrule High-Speed 48 fibers Copper Cabling extendable to 72 or 96 (7mm wide) Electrical Transmission Lines and Optical Waveguides 400μm 35 x 35μm 343 62.5μm pitch 1.26 6 7.24 mils But optics must be packaged deep within the system to achieve density improvements 5 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Packaging of Optical Interconnects is Critical • Better to put optics close to logic rather than at the card edge 9Avoids distortion, power, & cost of electrical link on each end of optical link 9Breaks through pin-count limitation of multi-chip modules (MCMs) Operation to >15 Gb/s: no equalization required Opto module ~2cm traces Laser+driver IC 1.7cm traces fiber NIC Operation at 10 Gb/s: Bandwidth limited Ceramic 1cm Flex equalization required by # of pins Organic card Optics on-MCM Optical bulkhead connector 1.7cm~2cm tracestraces Opto module NIC Laser+driver IC Ceramic Organic cardOrganic card 1cm Flex fiber >12.5cm traces Good:Optics Optics on-card on-card (with or w/o via stubs) Colgan, et. al., “Direct integration of dense parallel optical interconnects on a first level package for high-end servers”, ECTC 2005, 55th, pp. 228- 233, Vol. 1., 31 May-3 June 2005. 6 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Current architecture: Electronic Packet Switching Current architecture (electronic switch Central switch racks chips, interconnected by electrical or optical links, in multi-stage networks) works well now--- – Scalable BW & application- optimized cost • Multiple switches in parallel – Modular building blocks • many identical switch chips & links) -- but challenging in the future – Switch chip throughput stresses the hardest aspects of chip design • I/O & packaging – Multi-stage networks will require multiple E-O-E conversions • N-stage Exabyte/s network = N*Exabytes/s of cost N*Exabytes/s of power Mare Nostrum, Barcelona Supercomputing Center 7 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Possible new architecture: Optical Circuit Switching (Optics is not electronics, maybe a different architecture can use it better) All-Optical Packet Switches are hard Scalable Optical Circuit Switch (OCS) – e.g., IBM/Corning OSMOSIS project • Expensive, and required complex electrical control network – No optical memory or optical logic – Probably not cost-competitive against electronic packet switches, even in 2015- 2020 But Optical Circuit Switches (~10millisecond switching time) are available today – Several technologies (MEMS, piezo-, thermo-,..) OCS – Low power • OCS power essentially zero, compared to electronic switch • no extra O-E-O conversion OCS Concept – But require single-mode optics Input fiber Output fibers (one channel – In ~2015, with silicon photonics, ~1nsec shown) switching time • Does 6 orders of magnitude make approach more suitable to general-purpose computing? MEMS-based OCS HW is commercially 2-axis MEMS available (Calient, Glimmerglass,..) Mirror • 20 ms switching time (one channel shown) • <100 Watts 8 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Outline Optics in today’s HPCs Trends in microprocessor design – Multi-core designs for power efficiency Vision for future IntraChip Optical Networks (ICON) – 3D stack of logic, memory, and global optical interconnects Required devices and processes – Low power and small footprint 9 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Chip MultiProcessors (CMPs) IBM Cell, Sun Niagara, Intel Montecito, … (note that the processors on the chip are not identical) IBM Cell: Parameter Value Technology process 90nm SOI with low-κ dielectrics and 8 metal layers of copper interconnect Chip area 235mm^2 Number of transistors ~234M Operating clock frequency 4Ghz Power dissipation ~100W Percentage of power dissipation due to 30-50% global interconnect Intra-chip, inter-core communication 1.024 Tbps, 2Gb/sec/lane (four shared bandwidth buses, 128 bits data + 64 bits address each) I/O communication bandwidth 0.819 Tbps (includes external memory) 10 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation …but perhaps a hierarchical design of several cores grouped into a supercore will emerge ~2017 Multiple “supercores” on a chip Electrical communication within supercore Optical communications between supercores After Moray McLaren, HP Labs 11 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Theme: How to continue to get exponential performance increase over time (Moore’s Law extension) from silicon ICs even though CMOS scaling by itself is no longer enough (Moore’s Law extension) Communications and Architecture Exa-scale (~2017) Can Si photonics provide this Increased # of performance increase? (log) Processors Peta-scale (~2012) Uniprocessor (original Moore’s Law applies here) performance Transistors Performance Tera-scale (today) Time (linear) IBM Cell Processor 9 processors, ~200GFLOPs On- and Off-chip BW~100GB/sec (0.5B/FLOP) BW requirements must scale with System Performance, ~1Byte/FLOP 12 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Outline Optics in today’s HPCs Trends in microprocessor design – Multi-core designs for power efficiency Vision for future IntraChip Optical Networks (ICON) – 3D stack of logic, memory, and global optical interconnects Required devices and processes – Low power and small footprint 13 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Inter-core communication trends – network on chip INTEL Polaris 2007 Research Chip: 100 Million Transistors ● 80 cores (tiles) ● 275mm2 i.e., 3D Integration (why not go to optical plane, too?) Higher BW and lower Power with Optics? 14 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Photonics in Multi-Core Processors Intra-Chip Communications Network Photonics changes the rules OPTICS: Modulate/receive ultra-high bandwidth data stream once per communication event ELECTRONICS: Broadband switch fabric is uses | Buffer, receive and re-transmit very little power at every switch ○ highly scalable | Off chip is pin-limited and Off-chip and on-chip can use really power hungry essentially the same technology ○ Much more off-chip BW available RX RX RX RX TX RX TX RX TX TX TX TX 15 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Integration Concept Processor System Stack 3D layer stacking will be prevalent in the 22nm timeframe BEOL vertical electrical interconnects Intra-chip optics can take advantage of this technology Processor Plane w/ local memory cache Photonics layer (with supporting electrical circuits) more easily Memory Plane integrated with high performance logic and memory layers Memory Plane Layers can be separately optimized for performance and Memory Plane yield Optical Off-chip Interconnects Photonic Network Interconnect Plane (includes optical devices, electronic drivers & amplifiers and electronic control network) 16 IntraChip Optical Networks 29 February, 2008 © 2008 IBM Corporation Vision for Silicon Photonics: Intra-Chip Optical Networks Pack ~36 IBM Cell processor “supercores” on a single ~600mm2 die in 22nm CMOS • In each Cell supercore, there are 9 cores (PPE + 8SPEs) • 324 processors in one chip • Power and area dramatically lower than today at comparable clock speeds • Each supercore is electrically interconnected • Communication between supercores and off-chip are optical • BW between supercores is similar to today’s off-Cell BW (i.e., 1-2Tbps per