Rise of Volumetric Data and Scale-Up Enterprise Computing
Total Page:16
File Type:pdf, Size:1020Kb
Rise of Volumetric Data and Scale-Up Enterprise Computing Sponsored by: Atos, Numascale and Intel Mise en Scene This paper captures the continued collaboration among Intel, Atos and Numascale to enable cost effective Scale-Up ecosystem on x86 server platform. Intel is the technology provider (QuickPath Interconnect, Ultra Path Interconnect), Atos is the server platform provider (BullSequana S), Numascale is the node controller architecture provider (xNC). The volumetric data is rising fast, how to process 50+ Zettabytes of today’s data through efficient computing is not a small task to say the least. This paper describes the plan, methodology and the roadmap to address this critical computing need. It will introduce Atos platform with external Node Controller (xNC) architecture, jointly developed with Numascale and creating customer value from one generation of Scale-Up server system to the next. 02 The Rise of Volumetric Data Processing: AI, Machine Learning and Natural Language Depth.Scale.Latency.Gravity.Volume As the planet hurls through the galaxy, satellites circle the globe, we humans continue to consume, build, model and render data at unprecedented levels. 200 175 150 129.5 101 100 79.5 64.5 Data volume in zetabytes 50.5 50 41 33 26 18 12.5 15.5 6.5 9 2 5 0 2010 2011 2012 2013 2014 2015 2016 2017 2018* 2019* 2020* 2021* 2022* 2023* 2024* 2025* The current data forecast for 2025 is almost One-dimensional (1D), two-dimensional (2D), more bits all help when using AI data models 175ZB (Zettabytes). The forecasts have three-dimensional (3D) and four-dimensional and creative Data Scientists to build next been traditionally very accurate within 2-5% (4D) data creation have changed the generation models to understand virology, over long-time horizons in my experience. compute architectures we have to build. magnetic resonance and telehealth services. Our species requires this data for so many Dynamic new math libraries and artificial The depth of the data matters, when it is functions from finance, healthcare, education, intelligence (AI) instructions will provide your life. research, communication, transportation our customers with capabilities we could and entertainment to name a few. We share only imagine a decade ago. A new era of Non-Uniform Memory Access (https:// our experiences, our health, wealth and design is required… en.wikipedia.org/wiki/Non-uniform_memory_ well-being. Our lives, biometry, births, deaths access) was originally designed to provide and legacy. We have become our own Why does this matter? Why should we single operating system instances to scale videographers through very capable globally care? 3D and 4D data has already begun to beyond single socket central processing sourced smartphones nearly 17 years after lead our fight for survival with Covid-19 and units (CPU). This work pioneered by Atos/ their first introduction. By 2025, nearly 3.7B healthcare professionals. These models can Bull, Sequent/IBM, DEC and Intel in the mid- people will use this technology to access then be shared, anonymized and rendered 1990’s has become a foundation of scaling the internet on a regular basis, according the to allow doctors, nurses, medical technicians compute architectures today. From CPU CNBC. We have developed the capability to and healthcare companies to closely to Rack Design, the principles of scaling communicate across the globe in near real examine the results. There are many active and Non-Uniform Memory Architectures time. In the last 20 years we have created teams across the globe working tirelessly (NUMA) can be found. Scale is critical more devices to consume, create, alter, and to continue to find a cure for years. The to provide greater results, larger databases re-imagine data than any industry analyst depth of the data matters. The more data and more compute, memory, interconnect or pundit could have possibly imagined. sets, more points of reference, more bytes, and network resources. Rise of Volumetric Data and Scale-Up Enterprise Computing 03 Intel Xeon System on a chip (SoC) architecture Continued emphasis on modularity & balanced performance scalability Intel Xeon Processor E7 (24 cores) 2nd Gen Intel Xeon Scalable Processor (28 cores) QPI QPI PCIE PCIE PCIE PCIE QPI Link Link x16 x16 x8 x4 (ESI) Link R3QPI R2PCI CB DMA R3QPI QPI Agent IIO IOA PIC QPI Agent PCle* x16 2x UPI x20 PCle* x16 DMI x4 On Pkg 1x UPI x20 PCle x16 CBDMA PCle x16 CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC Cache Cache Cache Cache Core Bo LLC LLC Bo Core Core Bo LLC LLC Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core SAD SAD SAD SAD CLX Core CLX Core CLX Core CLX Core CLX Core CLX Core CBO CBO CBO CBO Cache Cache Cache Cache Core Bo LLC LLC Bo Core Core Bo LLC LLC Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core DDR4 MC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC MC DDR4 SAD SAD SAD SAD CBO CBO CBO CBO DDR4 DDR4 CLX Core CLX Core CLX Core CLX Core Cache Cache Cache Cache DDR4 DDR4 Core Bo LLC LLC Bo Core Core Bo LLC LLC Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core SAD SAD SAD SAD CBO CBO CBO CBO CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC Cache Cache Cache Cache Core Bo LLC LLC Bo Core Core Bo LLC LLC Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core SAD SAD SAD SAD CLX Core CLX Core CLX Core CLX Core CLX Core CLX Core CBO CBO CBO CBO Cache Cache Cache Cache Core Bo LLC LLC Bo Core Core Bo LLC LLC Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core SAD SAD SAD SAD CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CBO CBO CBO CBO Cache Cache Cache Cache CLX Core CLX Core CLX Core CLX Core CLX Core CLX Core Core Bo LLC LLC Bo Core Core Bo LLC LLC Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core Core Bo IDI IDI/QPII 2,5MB 2,5MB IDI/QPII IDI Bo Core SAD SAD SAD SAD CBO CBO CBO CBO CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CHA/SF/LLC CLX Core CLX Core CLX Core CLX Core CLX Core CLX Core Home Agent Home Agent SMI SMI SMI SMI Mem Ctlr Mem Ctlr SAD: Source Address Decoder CHA – Caching and Home Agent QPI: Intel QuickPath Interconnect SF – Snoop Filter IIO: Integrated I/O LLC – Last Level Cache PCU: Power Controller Unit CLX Core - 2nd Gen Intel Xeon Scalable Processor core Ubox: Processor Utility Box UPI – Intel UltraPath Interconnect Example of -2-socket Intel Xeon 6258R Memory and cache latencies (28 cores, 2.7GHz) SNC-OFF w/Intel Memory Latency Checker (MLC) L3 hit cache (same socket) 20.2 ns L3 hit cache (remote 2nd socket) 180 ns DRAM hit memory (same socket) 80 ns DRAM hit (remote 2nd socket) 138 ns A bumble bee flap has been recorded at socket Intel Xeon Scalable system up adoption of devices growing at 7.5% a year approximately 5 milliseconds. Most of us to 32-Sockets per physical server node with from a 3.7B users base, the “gravity”, scope today around the world experience our BullSequana S series platform customers all and scale of our “data opportunity” becomes broadband internet speeds at between over the world will enjoy one of “the industry clear. Architectures must be designed 30-100 milliseconds of latency. A bumble leaders” in scalability, NUMA and latency for all types of usage models, atmospheric bee is more latency-aware than most in our optimizations the world has ever seen. conditions, across a wide range of industries, species. Latency is critical, few places more enabling the broadest ecosystem of critical than system architecture and latency Theoretically, NUMA system and micro- applications insuring consistent performance more important than in the processing of architectures can scale almost infinitely over the platform life cycle. 28-Core volumetric, visual and AI data. Each part of with linear graph performance 2-32-Socket solutions with Intel Xeon Scalable the architecture must address latency from characteristics, in the lab. Time, research, processors, Intel Optane Persistent Memory, instructions, through memory, networking, failure, success, fact and re-investment and Intel SSD technology with current Atos interconnect and transport. Each leg of have all proven this theory to be false. BullSequana S platform are at the core a “Bit’s journey” must be as latency-free Gravity often brings many theoretical of our vision to build a more secured, and optimized for performance to insure mathematical and scientific discoveries scalable, high bandwidth, NUMA-optimized a balanced compute architecture. We back to earth. Gravity also provides us platforms for the next decade. have invested across Intel® Xeon® Scalable with insights how to defy and manipulate platform generations with instructions its principles to survive in space.