Eliminate Testing Bottlenecks - Develop and Test Code at Scale on Virtualized Production Systems in the Archanan Development Cloud
Total Page:16
File Type:pdf, Size:1020Kb
Eliminate Testing Bottlenecks - Develop and Test Code at Scale on Virtualized Production Systems in the Archanan Development Cloud Archanan emulates complex high performance and distributed clusters to the network component level with accurate run-time estimates of developers’ codes Supercomputers and other forms of large-scale, distributed computing systems are complex and expensive systems that enable discovery and insight only when they are running as production machines, executing proven applications at scale. Yet researchers and developers targeting their codes for particular systems, such as MareNostrum4 at Barcelona Supercomputing Center (BSC), must be able to develop or port, optimize, and test their software—eventually at scale—for the desired architecture before consuming production computing time. Supercomputing center directors, heads of research departments, and IT managers hosting these clusters have to balance the needs of the users’ production runs against the developers’ requests for time on the machine. The result is that developers wait in queues for the supercomputer. Some coding issues only appear on the full complement of cores for which the code is intended, so the wait times can get longer as core count needs increase. Delaying important development extends time to insight and discovery, impacting a researcher’s, a supercomputing center’s, a research department’s, and a company’s competitive position. What if we could use the cloud to virtualize and emulate an organization’s production system to provide every developer on the team with their own, personal Integrated Development Environment. The Archanan Development Cloud uses the power of emulation to recreate an organization’s production system in the cloud, eliminating test queues and enabling programmers to develop their code in real-time, at scale. { 2 } Emulate A Cluster in Minutes The Archanan Development Cloud is a fully customizable computing system emulation engine that enables developers to eliminate queues by administering personal Integrated Development Environments (IDEs) that emulate the organization’s production system (be it a supercomputer, or a complex distributed network), virtualized in a cloud. Archanan configures the equivalent of an existing supercomputer or allows developers to specify their own cluster from a library of components, complete their code development, debug and test it at scale, and see accurate performance metrics. With Archanan Development Cloud t Developers can develop code at scale and get to production faster with accurate and known performance results, paying only for the time they use the cluster. t Supercomputing center directors and corporate IT managers can maximize production utilization, while accelerating research and production, by eliminating inefficiencies that come with bare- metal test environments. t System Admins can manage a single production environment, recommissioning test hardware to the production system and moving the development environment to the cloud. t OEMs can prove to themselves and their customers the performance of a new design— down to the component level—before they ever stand up the hardware. t Researchers can test their projects using different hardware and configurations before they commit to a system or let Archanan recommend an optimal cluster. © 2019 Archanan. All rights reserved. { 3 } Archanan Emulation Any System, Any Network, Any Scale… While many Cloud Service Providers (CSPs) offer HPC and other forms of complex computing as a service, their configurations are limited to the architectures, topologies, and components the CSP has in-house. Developers are left using the CSP’s architecture to test their code’s accuracy and performance in an environment that is typically widely divergent from the organization’s production system where the code will actually be run. With the Archanan Development Cloud, a developer can stand up a cluster in minutes that accurately emulates the exact architectures, fabric topologies, communications buses, and storage media of the system of their choice. Archanan leverages the advanced infrastructures of high-performance cloud services from Amazon Web Services® (AWS), Microsoft Azure®, Google®, and others to run their configurable emulation engine. Archanan’s portfolio of emulated components includes: t Intel®, ARM®, IBM Power® CPU architectures t Network topologies, such as Dragonfly and hypercube t Host fabric interface technologies, including Ethernet®, InfiniBand® Architecture, and Intel® Omni-Path Architecture t NVMe and SATA communication buses. t And more… With the granularity of components in the Archanan portfolio, results obtained from code runs on the emulator will accurately indicate what can be expected in production on the machine for which it’s developed. We Make Complex, Large-Scale Computing Easily Accessible Users choose or configure their personal supercomputer—complete with the number of cores on which they want to run their code—through an intuitive online interface. The service includes a browser-based Integrated Development Environment (IDE), which can be configured for a familiar experience. The IDE provides tuning and optimization tools, a debugger, reports, and visualizations to accelerate the developer’s journey to finished code. © 2019 Archanan. All rights reserved. { 4 } Supported Capabilities IDE over the browser Code project health dashboard t Syntax highlighting for C, C++ and Python t Number of passing tests t Vi, vim and emacs shortcuts t Code coverage figures t Largest stable scale out tracking Programming languages, models and environments t Coding standard compliance analysis commonly used in HPC t Toolchain for C, C++, Python Code version control t MPI, OpenMP and pthread programming models t Git support t OpenHPC t GitHub integration t Beta supports SLURM scheduler Continuous integration Functional supercomputer emulation t Jenkins and CircleCI support t Beta scales up to 512 compute nodes t Up to 36 cores and 64 GB memory per compute Continuous delivery node t Deployment to AWS S3 t GPGPU support (Nvidia CUDA and OpenCL) t Beta offers up to 100Gbps network throughput Supported Architectures t x86_64, ARM, AMD EPYC, NVIDIA P100 & K80 Scalable code deployment for testing and validation t Upcoming: Power9, Power10, NEC Vector t Deployment up to 512 compute nodes Engines, AMD CPU & GPU, NVIDIA DGX, etc. t Allows to easily assess code scaling limits Supported Software Proprietary parallel debugger t OpenMPI, mpich, Intel MPI, pthread, Open MP, t Tailor made to support large number of NVIDIA CUDA, OpenCL, OpenHPC, Slurm, etc. distributed MPI processes t Upcoming: mvapich, OpenACC, Univa Grid Engine, PBS Pro, Torque Automated unit test stub generation t Templates for GoogleTest Supported Compilers t Templates for GNU Check t GCC, Intel C/C++ Compiler Code coverage measuring Developer Tools t Gcov / lcov t Parallel Debugger, Parallel Profiling, Memory map t CodeCov analysis, network virtualization, visualization tools © 2019 Archanan. All rights reserved. { 5 } Get Out of the Test Queue. Develop Code at Scale! Typically, when it comes to developing, tuning, optimizing, and scaling code for large scale production systems, developers need cycles on testing clusters that they are competing for time on. Often those clusters are being used by other developers, who themselves have been waiting in long queues. Using the Archanan Development Cloud, waiting in a queue will be a thing of the past. Organizations are able to optimize development workflows by enabling their developers to stand up their own—at-scale— production system (whether it’s a supercomputer or other complex distributed computing system) with the exact configuration of the target cluster for which they’re developing code—number of cores, architectures, and topologies. Programmers can develop and port, test and debug, and confidently predict how their codes will run later in production—at scale. There’s no waiting, which means accelerated development. One of the primary reasons why more organizations, especially in the commercial space, aren’t utilizing the power of modern supercomputers, is the considerable challenges of effective coding at these larger, more complex scales - there is a big gap between a laptop and that of a remote, giant collection of distributed, interconnected processors. There has long been talk of contemporary supercomputers being broadly utilized and reaching the industry masses, however this reality has been elusive. By combining hardware-level virtualization and cloud computing, Archanan has figured out how to bridge both the technical, but also economical gaps that have presented adoption challenges for computing at this level. It’s exciting to see that we’re on the precipice of the democratization of high-performance computing across industries, at last.” - John Gustafson, luminary computer scientist, inventor of Gustafson’s Law Visiting Scientist at A*STAR - Agency for Science, Technology and Research Choose the Cluster Whether your development team is working on Artificial Intelligence (AI) algorithms for Tsubame 3 at the Tokyo Institute of Technology, testing IO-bound applications targeted for Oakforest-PACS at the Joint Center for Advanced HPC (JCAHPC), or optimizing applications for a SAP HANA in-memory database on new memory architectures, such as Intel® Optane™, Archanan is working with the world’s supercomputing centers, enterprise IT departments, and other providers of clusters to build emulation profiles of