ISSCC 2019 / FORUM / F2: MEMORY-CENTRIC COMPUTING FROM IoT TO ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

F2: Memory-Centric Computing from IoT to Artificial Intelligence and Machine Learning

Organizer: Fatih Hamzaoglu, Intel, Hillsboro, OR Committee: Meng-Fan Chang, National Tsing Hua University, Hsinchu, Taiwan Ki-Tae Park, Samsung, Hwaseong, Korea Yasuhiko Taito, Renesas, Kodaira, Tokyo, Japan Alicia Klinefelter, Nvidia, Durham, NC Naveen Verma, Princeton University, Princeton, NJ

This forum will present state-of-the-art memory-centric architectures, as well as future innovative solutions to enable energy-efficient, high- performance AI/ML applications. It will also describe the challenges and solutions from edge processors to cloud applications, such as algorithmic accuracy, cost, security and practicality, including technology readiness to areas where further technology development is needed. At the high- performance and machine-learning end, emerging and storage-class memories are going to change the memory hierarchy. Meanwhile, low-power high-bandwidth DRAMs and SRAMs continue to be innovated around, to remain the workhorses of the latest process nodes (HBM, GDDR6, 7nm- FinFET SRAM etc.). Furthermore, with the explosive growth of memory intensive workloads like machine learning, video capture/playback, language translation, etc. there is a tremendous interest in preforming some compute near memory, by placing logic inside the DRAM/NVM main-memory die (AKA near-memory compute), or even doing the compute within the SRAM/STTRAM/RRAM array embedded within the compute die (AKA in- memory compute). In either case, the motivation is to reduce the significant data movement between main/embedded memory and compute units, as well as to reduce latency by preforming many operations in parallel, inside the array. Many challenges to productize these ideas remain to be addressed, including area-cost trade-offs for adding logic in the memory die for near-memory compute or augmenting embedded arrays with mixed-signal circuits to enable in-memory compute. Noticeable degradation of S/N ratio, especially in the in-memory compute case, is lurking and may direct its use to specific applications.

Hardware Enabled AI Bill Dally, Nvidia, Santa Clara, CA The current resurgence of artificial intelligence is due to advances in deep learning. Systems based on deep learning now exceed human capability in speech recognition, object classification, and playing games like Go. Deep learning has been enabled by powerful, efficient computing hardware. The algorithms used have been around since the 1980s, but it has only been in the last few years - when powerful GPUs became available to train networks - that the technology has become practical. This talk will describe current hardware for deep learning and research to make this hardware more efficient. Dedicated accelerators, special instructions, data representation, sparsity, and analog methods will be discussed.

Bill Dally is chief scientist at NVIDIA and senior vice president of NVIDIA Research, the company’s world-class research organization, which is chartered with developing the strategic technologies that will help drive the company’s future growth and success. Dally joined NVIDIA in 2009 after spending 12 years at , where he was chairman of the computer science department and the Willard R. and Inez Kerr Bell Professor of Engineering. Dally and his Stanford team developed the system architecture, network architecture, signaling, routing and synchronization technology that is found in most large parallel computers today. Dally was at the Massachusetts Institute of Technology from 1986 to 1997, where he and his team built the J-Machine and M-Machine, experimental parallel computer systems that pioneered the separation of mechanism from programming models and demonstrated very low overhead synchronization and communication mechanisms. From 1983 to 1986, he was at the California Institute of Technology, where he designed the MOSSIM Simulation Engine and the Torus Routing chip, which pioneered wormhole routing and virtual-channel flow control. Dally is a cofounder of Velio Communications and Stream Processors. He is a member of the National Academy of Engineering, a Fellow of the American Academy of Arts & Sciences, a Fellow of the IEEE and the ACM. He received the 2015 Funai Achievement Award from the Information Processing Society of Japan, the 2010 Eckert-Mauchly Award, considered the highest prize in computer architecture, as well as the 2004 IEEE Computer Society Seymour Cray Computer Engineering Award and the 2000 ACM Maurice Wilkes Award. Dally has published more than 200 papers, holds more than 100 issued patents and is the author of three textbooks, “Digital Systems Engineering,” “Principles and Practices of Interconnection Networks” and “Digital Design, A Systems Approach.” Dally received a Bachelor of Science degree in from , a Master of Science degree in electrical engineering from Stanford University, and a Ph.D. in computer science from the California Institute of Technology.

498 • 2019 IEEE International Solid-State Circuits Conference 978-1-5386-8531-0/19/$31.00 ©2019 IEEE ISSCC 2019 / February 17, 2019 / 8:00 AM

Embedded Memory Solutions for AI, ML and IoT Masanori Hayashikoshi, Renesas Electronics, Tokyo, Japan The extensive sensor nodes have a significant role in obtaining a large amount of real-time information for IoT, and power reduction for accumulating the huge information data without degrading processing performance is becoming more important. In this presentation, non- volatile memory technologies for low-power operation and power management schemes for low-power computing suitable for IoT will be mainly described. In addition, the embedded non-volatile memory solution for endpoint AI applications will be discussed. Masanori Hayashikoshi received the BS and MS degrees in electronic engineering from Kobe University, Hyogo, Japan, in 1984 and 1986, respectively, and the Ph.D. degree in electrical engineering and computer science from Kanazawa University, Ishikawa, Japan, in 2018. He is a Senior Principal Specialist of Shared R&D Division 1 in Renesas Electronics Corporation. He has been engaged in the research and development of Flash memory, high density DRAM, low power SDRAM, embedded MRAM for MCUs, and Normally-off computing architecture for further low-power solution with NVRAMs. He is now engaged in a feasibility study of in-memory computing for future neural network application.

High-Bandwidth Memory (HBM) DRAM for Energy-Efficient Near-Memory Computing Kyomin Sohn, Samsung Electronics, Hwaseong, Korea HBM DRAM is the best memory solution for HPC, high-end graphics and AI applications. It provides unparalleled high bandwidth by wide- IO with stacked DRAMs using TSV technology. However, it has also many challenges like power, thermal, and testability. As one of the new solutions for the data-intensive computing era, HBM DRAM has additional possibilities and chances for more efficient and powerful near-memory computing. In this talk, HBM as a near-data-processing platform will be presented in various aspects. Kyomin Sohn received the B.S. and M.S. degrees in Electrical Engineering in 1994 and 1996, respectively, from Yonsei University, Seoul. From 1996 to 2003, he was with Samsung Electronics, Korea, involved in SRAM Design Team. He received the Ph.D. degree in Electrical Engineering and Computer Science in 2007 from KAIST, Daejeon, Korea. He rejoined Samsung Electronics in 2007, where he has been involved in the DRAM Design Team. He is a Master (Technical VP) in Samsung and he is responsible for HBM DRAM Design and Future Technology of DRAM Design. His interests include the next generation 3D-DRAM, robust memory design, and processing-in-memory for artificial intelligence applications.

Novel Memory/Storage Solutions for Memory-Centric Computing Mohamed Arafa, Intel, Chandler, AZ The exponential growth in connected devices is generating a staggering number of digital records. This era of big data is driving fundamental changes in both memory and storage hierarchy; data and compute need to be brought closer together to avoid protocol inefficiencies. In this presentation, novel solutions for memory-centric architecture will be discussed with a focus on their value, performance, and power. Mohamed Arafa is a Sr. Principal Engineer with the Data Center Group Intel Corporation. He recently has been focusing on the definition and evaluation of novel server memory architectures. He has more than 20 years of industry experience and seven years of academic/research experience. He authored/co-authored more than 30 technical papers and holds 10 US patents. He also has been an adjunct professor in the Electrical Engineering department at Arizona State University. Dr. Arafa holds a Ph.D. in Electrical Engineering from the University of Illinois at Urbana-Champaign and an MBA from the W.P. Carey School of Business.

The Deep In-memory Architecture for Energy Efficient Machine Learning Naresh Shanbhag, University of Illinois at Urbana-Champaign, Urbana, IL The Deep In-Memory Architecture (DIMA) is a non-von Neumann approach that breaches the memory wall by reading functions of data, embedding analog operations around the bitcell array, and leveraging the learning capabilities of inference algorithms. DIMA reads generate an inference leading to 50X-to-100X reduction in the energy-latency product. This talk will describe key DIMA principles, design challenges, their solutions realized via IC prototypes, and discuss future prospects. Naresh R. Shanbhag is the Jack S. Kilby Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. He worked at AT&T Bell Laboratories at Murray Hill (1993-95) on VDSL chip-sets. Dr. Shanbhag’s research interests are in the design of energy-efficient systems, architectures, and integrated circuits for machine learning, signal processing, and communications. He is an IEEE Fellow and served as the Director of the Systems On Nanoscale Information fabriCs (SONIC) Center (2013-17). He was a co-founder and Chief Technology Officer of Intersymbol Communications, Inc., (now part of Finisar Corporation, Inc.,), a start-up providing DSP-enhanced ICs for dispersion compensation of OC-192 links.

Low Power SRAM for IoT and ML Applications YK Chong, ARM, Austin, TX The Internet of Things (IoT) and Machine Learning (ML) involve a tremendous amount of data movement. IoT and ML bandwidths are often limited by the high energy cost of data communication. Consequently, it is important to have low-power memories to transfer the data. In this presentation, I discuss several low power memory solutions for IoT and ML applications. I further examine reliability and safety memory techniques that are essential for designs in the rapidly growing automotive sector. Finally, I address the design and testability challenges of embedded MRAM and other non-volatile replacements for the current ML, IoT, and automotive memories. YK Chong received the B.S. and M.S. degrees from Mississippi State University, Starkville, Mississippi in 1993 and 1995, respectively. He is currently a Senior Technical Director and Distinguished Engineer at Advanced Product Development, ARM Physical Design Group. He leads the memory technology development of Client, Infrastructure, and Machine Learning applications at ARM. He is also responsible for leading the Design Technology Co-Optimization (DTCO) with major foundries. He holds more than 30 U.S. patents in integrated circuit design and technology.

DIGEST OF TECHNICAL PAPERS • 499 ISSCC 2019 / FORUM / F2: MEMORY-CENTRIC COMPUTING FROM IoT TO ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Advanced Memory, Logic and 3D Technologies for In-Memory Computing and Machine Learning Stefan Cosemans, imec, Leuven, Belgium

Artificial neural network (ANN) inference is now key to many applications. Some run in the cloud, others on wireless IoT sensors, but they all require an unprecedented amount of computation and storage. This talk first discusses novel memory, logic and 3D technologies that can improve power, performance and cost of digital inference. Second, it examines the potential of analog in-memory computing for inference, with attention to the challenges and opportunities at device, circuit, architecture and algorithm levels.

Stefan Cosemans is a Principal Member of Technical Staff at imec, Leuven, Belgium. His current focus is on circuits and memory devices for machine-learning accelerators. He received his Ph.D. degree in 2009 from KU Leuven for his work on low-power SRAM circuits. From 2010 to 2014, he worked at imec on circuit design for RRAM, STT-MRAM and other novel memory devices. From 2015 to 2017, he led the development of single-rail near-threshold SRAM compilers at sureCore Ltd. In 2017, he joined imec’s Machine Learning program.

Deep-Learning Hardware Acceleration: Opportunities in Memory Design Leland Chang, IBM T. J. Watson Research Center, Yorktown Heights, NY Deep-learning hardware acceleration has become a critical need across many application domains. While reduced precision arithmetic enables energy-efficient compute, feeding data to many parallel engines remains a key bottleneck – one that strongly depends on memory innovation to improve core, chip, and system architectures. This talk describes on-chip/external memory, disk requirements to support a diversity of neural networks and considers the impact of coming developments in algorithms and technology. Leland Chang is a Principal Research Staff Member and Senior Manager of AI Hardware Acceleration at the IBM T. J. Watson Research Center. His work has spanned technology, circuits, and systems, including contributions such as the development of FinFET devices, SRAM and register file arrays, and integrated voltage regulators as well as the management of teams that have driven CMOS technology roadmaps, IBM server and mainframe microprocessor products, and AI hardware accelerators. He received the B.S., M. S. and Ph.D. degrees in electrical engineering and computer sciences from UC Berkeley and has authored more than 80 papers and 120 patents. He is a former program committee member and memory subcommittee chair of the ISSCC.

500 • 2019 IEEE International Solid-State Circuits Conference 978-1-5386-8531-0/19/$31.00 ©2019 IEEE