F2: MEMORY-CENTRIC COMPUTING from Iot to ARTIFICIAL INTELLIGENCE and MACHINE LEARNING

ISSCC 2019 / FORUM / F2: MEMORY-CENTRIC COMPUTING FROM IoT TO ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING F2: Memory-Centric Computing from IoT to Artificial Intelligence and Machine Learning Organizer: Fatih Hamzaoglu, Intel, Hillsboro, OR Committee: Meng-Fan Chang, National Tsing Hua University, Hsinchu, Taiwan Ki-Tae Park, Samsung, Hwaseong, Korea Yasuhiko Taito, Renesas, Kodaira, Tokyo, Japan Alicia Klinefelter, Nvidia, Durham, NC Naveen Verma, Princeton University, Princeton, NJ This forum will present state-of-the-art memory-centric architectures, as well as future innovative solutions to enable energy-efficient, high- performance AI/ML applications. It will also describe the challenges and solutions from edge processors to cloud applications, such as algorithmic accuracy, cost, security and practicality, including technology readiness to areas where further technology development is needed. At the high- performance and machine-learning end, emerging and storage-class memories are going to change the memory hierarchy. Meanwhile, low-power high-bandwidth DRAMs and SRAMs continue to be innovated around, to remain the workhorses of the latest process nodes (HBM, GDDR6, 7nm- FinFET SRAM etc.). Furthermore, with the explosive growth of memory intensive workloads like machine learning, video capture/playback, language translation, etc. there is a tremendous interest in preforming some compute near memory, by placing logic inside the DRAM/NVM main-memory die (AKA near-memory compute), or even doing the compute within the SRAM/STTRAM/RRAM array embedded within the compute die (AKA in- memory compute). In either case, the motivation is to reduce the significant data movement between main/embedded memory and compute units, as well as to reduce latency by preforming many operations in parallel, inside the array. Many challenges to productize these ideas remain to be addressed, including area-cost trade-offs for adding logic in the memory die for near-memory compute or augmenting embedded arrays with mixed-signal circuits to enable in-memory compute. Noticeable degradation of S/N ratio, especially in the in-memory compute case, is lurking and may direct its use to specific applications. Hardware Enabled AI Bill Dally, Nvidia, Santa Clara, CA The current resurgence of artificial intelligence is due to advances in deep learning. Systems based on deep learning now exceed human capability in speech recognition, object classification, and playing games like Go. Deep learning has been enabled by powerful, efficient computing hardware. The algorithms used have been around since the 1980s, but it has only been in the last few years - when powerful GPUs became available to train networks - that the technology has become practical. This talk will describe current hardware for deep learning and research to make this hardware more efficient. Dedicated accelerators, special instructions, data representation, sparsity, and analog methods will be discussed. Bill Dally is chief scientist at NVIDIA and senior vice president of NVIDIA Research, the company’s world-class research organization, which is chartered with developing the strategic technologies that will help drive the company’s future growth and success. Dally joined NVIDIA in 2009 after spending 12 years at Stanford University, where he was chairman of the computer science department and the Willard R. and Inez Kerr Bell Professor of Engineering. Dally and his Stanford team developed the system architecture, network architecture, signaling, routing and synchronization technology that is found in most large parallel computers today. Dally was at the Massachusetts Institute of Technology from 1986 to 1997, where he and his team built the J-Machine and M-Machine, experimental parallel computer systems that pioneered the separation of mechanism from programming models and demonstrated very low overhead synchronization and communication mechanisms. From 1983 to 1986, he was at the California Institute of Technology, where he designed the MOSSIM Simulation Engine and the Torus Routing chip, which pioneered wormhole routing and virtual-channel flow control. Dally is a cofounder of Velio Communications and Stream Processors. He is a member of the National Academy of Engineering, a Fellow of the American Academy of Arts & Sciences, a Fellow of the IEEE and the ACM. He received the 2015 Funai Achievement Award from the Information Processing Society of Japan, the 2010 Eckert-Mauchly Award, considered the highest prize in computer architecture, as well as the 2004 IEEE Computer Society Seymour Cray Computer Engineering Award and the 2000 ACM Maurice Wilkes Award. Dally has published more than 200 papers, holds more than 100 issued patents and is the author of three textbooks, “Digital Systems Engineering,” “Principles and Practices of Interconnection Networks” and “Digital Design, A Systems Approach.” Dally received a Bachelor of Science degree in electrical engineering from Virginia Tech, a Master of Science degree in electrical engineering from Stanford University, and a Ph.D. in computer science from the California Institute of Technology. 498 • 2019 IEEE International Solid-State Circuits Conference 978-1-5386-8531-0/19/$31.00 ©2019 IEEE ISSCC 2019 / February 17, 2019 / 8:00 AM Embedded Memory Solutions for AI, ML and IoT Masanori Hayashikoshi, Renesas Electronics, Tokyo, Japan The extensive sensor nodes have a significant role in obtaining a large amount of real-time information for IoT, and power reduction for accumulating the huge information data without degrading processing performance is becoming more important. In this presentation, non- volatile memory technologies for low-power operation and power management schemes for low-power computing suitable for IoT will be mainly described. In addition, the embedded non-volatile memory solution for endpoint AI applications will be discussed. Masanori Hayashikoshi received the BS and MS degrees in electronic engineering from Kobe University, Hyogo, Japan, in 1984 and 1986, respectively, and the Ph.D. degree in electrical engineering and computer science from Kanazawa University, Ishikawa, Japan, in 2018. He is a Senior Principal Specialist of Shared R&D Division 1 in Renesas Electronics Corporation. He has been engaged in the research and development of Flash memory, high density DRAM, low power SDRAM, embedded MRAM for MCUs, and Normally-off computing architecture for further low-power solution with NVRAMs. He is now engaged in a feasibility study of in-memory computing for future neural network application. High-Bandwidth Memory (HBM) DRAM for Energy-Efficient Near-Memory Computing Kyomin Sohn, Samsung Electronics, Hwaseong, Korea HBM DRAM is the best memory solution for HPC, high-end graphics and AI applications. It provides unparalleled high bandwidth by wide- IO with stacked DRAMs using TSV technology. However, it has also many challenges like power, thermal, and testability. As one of the new solutions for the data-intensive computing era, HBM DRAM has additional possibilities and chances for more efficient and powerful near-memory computing. In this talk, HBM as a near-data-processing platform will be presented in various aspects. Kyomin Sohn received the B.S. and M.S. degrees in Electrical Engineering in 1994 and 1996, respectively, from Yonsei University, Seoul. From 1996 to 2003, he was with Samsung Electronics, Korea, involved in SRAM Design Team. He received the Ph.D. degree in Electrical Engineering and Computer Science in 2007 from KAIST, Daejeon, Korea. He rejoined Samsung Electronics in 2007, where he has been involved in the DRAM Design Team. He is a Master (Technical VP) in Samsung and he is responsible for HBM DRAM Design and Future Technology of DRAM Design. His interests include the next generation 3D-DRAM, robust memory design, and processing-in-memory for artificial intelligence applications. Novel Memory/Storage Solutions for Memory-Centric Computing Mohamed Arafa, Intel, Chandler, AZ The exponential growth in connected devices is generating a staggering number of digital records. This era of big data is driving fundamental changes in both memory and storage hierarchy; data and compute need to be brought closer together to avoid protocol inefficiencies. In this presentation, novel solutions for memory-centric architecture will be discussed with a focus on their value, performance, and power. Mohamed Arafa is a Sr. Principal Engineer with the Data Center Group Intel Corporation. He recently has been focusing on the definition and evaluation of novel server memory architectures. He has more than 20 years of industry experience and seven years of academic/research experience. He authored/co-authored more than 30 technical papers and holds 10 US patents. He also has been an adjunct professor in the Electrical Engineering department at Arizona State University. Dr. Arafa holds a Ph.D. in Electrical Engineering from the University of Illinois at Urbana-Champaign and an MBA from the W.P. Carey School of Business. The Deep In-memory Architecture for Energy Efficient Machine Learning Naresh Shanbhag, University of Illinois at Urbana-Champaign, Urbana, IL The Deep In-Memory Architecture (DIMA) is a non-von Neumann approach that breaches the memory wall by reading functions of data, embedding analog operations around the bitcell array, and leveraging the learning capabilities of inference algorithms. DIMA reads generate an inference leading to 50X-to-100X reduction in the energy-latency product. This talk will describe key DIMA principles,

F2: MEMORY-CENTRIC COMPUTING from Iot to ARTIFICIAL INTELLIGENCE and MACHINE LEARNING

The Conference Program Booklet

NVIDIA Chief Scientist Bill Dally Receives Lifetime Achievement Award from Leading Japanese Tech Society

NVIDIA Chief Scientist Bill Dally Receives Computer Architecture's Highest Honor

Full Conference Guide

Nvidia Corporation Nvidia Nvidia Corporation 2015 Annual Review Notice of Annual Meeting Proxy Statement and Form 10-K

NVIDIA GPU Computing Theater

PDF (Issue 3, Winter 2003)