Eliminate Testing Bottlenecks - Develop and Test Code at Scale on Virtualized Production Systems in the Archanan Development Cloud

Archanan emulates complex high performance and distributed clusters to the network component level with accurate run-time estimates of developers’ codes

Supercomputers and other forms of large-scale, distributed computing systems are complex and expensive systems that enable discovery and insight only when they are running as production machines, executing proven applications at scale. Yet researchers and developers targeting their codes for particular systems, such as MareNostrum4 at Barcelona Supercomputing Center (BSC), must be able to develop or port, optimize, and test their software—eventually at scale—for the desired architecture before consuming production computing time.

Supercomputing center directors, heads of research departments, and IT managers hosting these clusters have to balance the needs of the users’ production runs against the developers’ requests for time on the machine. The result is that developers wait in queues for the . Some coding issues only appear on the full complement of cores for which the code is intended, so the wait times can get longer as core count needs increase.

Delaying important development extends time to insight and discovery, impacting a researcher’s, a supercomputing center’s, a research department’s, and a company’s competitive position. What if we could use the cloud to virtualize and emulate an organization’s production system to provide every developer on the team with their own, personal Integrated Development Environment. The Archanan Development Cloud uses the power of emulation to recreate an organization’s production system in the cloud, eliminating test queues and enabling programmers to develop their code in real-time, at scale. { 2 }

Emulate A Cluster in Minutes

The Archanan Development Cloud is a fully customizable computing system emulation engine that enables developers to eliminate queues by administering personal Integrated Development Environments (IDEs) that emulate the organization’s production system (be it a supercomputer, or a complex distributed network), virtualized in a cloud. Archanan configures the equivalent of an existing supercomputer or allows developers to specify their own cluster from a library of components, complete their code development, debug and test it at scale, and see accurate performance metrics.

With Archanan Development Cloud

tt Developers can develop code at scale and get to production faster with accurate and known performance results, paying only for the time they use the cluster.

tt Supercomputing center directors and corporate IT managers can maximize production utilization, while accelerating research and production, by eliminating inefficiencies that come with bare- metal test environments.

tt System Admins can manage a single production environment, recommissioning test hardware to the production system and moving the development environment to the cloud.

tt OEMs can prove to themselves and their customers the performance of a new design— down to the component level—before they ever stand up the hardware.

tt Researchers can test their projects using different hardware and configurations before they commit to a system or let Archanan recommend an optimal cluster.

© 2019 Archanan. All rights reserved. { 3 }

Archanan Emulation Any System, Any Network, Any Scale…

While many Cloud Service Providers (CSPs) offer HPC and other forms of complex computing as a service, their configurations are limited to the architectures, topologies, and components the CSP has in-house. Developers are left using the CSP’s architecture to test their code’s accuracy and performance in an environment that is typically widely divergent from the organization’s production system where the code will actually be run. With the Archanan Development Cloud, a developer can stand up a cluster in minutes that accurately emulates the exact architectures, fabric topologies, communications buses, and storage media of the system of their choice.

Archanan leverages the advanced infrastructures of high-performance cloud services from ® (AWS), Microsoft Azure®, Google®, and others to run their configurable emulation engine. Archanan’s portfolio of emulated components includes:

tt Intel®, ARM®, IBM Power® CPU architectures

tt Network topologies, such as Dragonfly and hypercube

tt Host fabric interface technologies, including ®, InfiniBand® Architecture, and Intel® Omni-Path Architecture

tt NVMe and SATA communication buses.

tt And more…

With the granularity of components in the Archanan portfolio, results obtained from code runs on the emulator will accurately indicate what can be expected in production on the machine for which it’s developed.

We Make Complex, Large-Scale Computing Easily Accessible

Users choose or configure their personal supercomputer—complete with the number of cores on which they want to run their code—through an intuitive online interface. The service includes a browser-based Integrated Development Environment (IDE), which can be configured for a familiar experience.

The IDE provides tuning and optimization tools, a debugger, reports, and visualizations to accelerate the developer’s journey to finished code.

© 2019 Archanan. All rights reserved. { 4 }

Supported Capabilities

IDE over the browser Code project health dashboard tt Syntax highlighting for C, C++ and Python tt Number of passing tests tt Vi, vim and emacs shortcuts tt Code coverage figures tt Largest stable scale out tracking Programming languages, models and environments tt Coding standard compliance analysis commonly used in HPC tt Toolchain for C, C++, Python Code version control tt MPI, OpenMP and pthread programming models tt Git support tt OpenHPC tt GitHub integration tt Beta supports SLURM scheduler Continuous integration Functional supercomputer emulation tt Jenkins and CircleCI support tt Beta scales up to 512 compute nodes tt Up to 36 cores and 64 GB memory per compute Continuous delivery node tt Deployment to AWS S3 tt GPGPU support (Nvidia CUDA and OpenCL) tt Beta offers up to 100Gbps network throughput Supported Architectures tt x86_64, ARM, AMD EPYC, NVIDIA P100 & K80 Scalable code deployment for testing and validation tt Upcoming: Power9, Power10, NEC Vector tt Deployment up to 512 compute nodes Engines, AMD CPU & GPU, NVIDIA DGX, etc. tt Allows to easily assess code scaling limits Supported Software Proprietary parallel debugger tt OpenMPI, mpich, Intel MPI, pthread, Open MP, tt Tailor made to support large number of NVIDIA CUDA, OpenCL, OpenHPC, Slurm, etc. distributed MPI processes tt Upcoming: mvapich, OpenACC, Grid Engine, PBS Pro, Torque Automated unit test stub generation tt Templates for GoogleTest Supported Compilers tt Templates for GNU Check tt GCC, Intel C/C++ Compiler

Code coverage measuring Developer Tools tt Gcov / lcov tt Parallel Debugger, Parallel Profiling, Memory map tt CodeCov analysis, network virtualization, visualization tools

© 2019 Archanan. All rights reserved. { 5 }

Get Out of the Test Queue. Develop Code at Scale!

Typically, when it comes to developing, tuning, optimizing, and scaling code for large scale production systems, developers need cycles on testing clusters that they are competing for time on. Often those clusters are being used by other developers, who themselves have been waiting in long queues.

Using the Archanan Development Cloud, waiting in a queue will be a thing of the past. Organizations are able to optimize development workflows by enabling their developers to stand up their own—at-scale— production system (whether it’s a supercomputer or other complex distributed computing system) with the exact configuration of the target cluster for which they’re developing code—number of cores, architectures, and topologies. Programmers can develop and port, test and debug, and confidently predict how their codes will run later in production—at scale.

There’s no waiting, which means accelerated development.

One of the primary reasons why more organizations, especially in the commercial space, aren’t utilizing the power of modern , is the considerable challenges of effective coding at these larger, more complex scales - there is a big gap between a and that of a remote, giant collection of distributed, interconnected processors. There has long been talk of contemporary supercomputers being broadly utilized and reaching the industry masses, however this reality has been elusive. By combining hardware-level virtualization and , Archanan has figured out how to bridge both the technical, but also economical gaps that have presented adoption challenges for computing at this level. It’s exciting to see that we’re on the precipice of the democratization of high-performance computing across industries, at last.”

- John Gustafson, luminary scientist, inventor of Gustafson’s Law Visiting Scientist at A*STAR - Agency for Science, Technology and Research

Choose the Cluster

Whether your development team is working on Artificial Intelligence (AI) algorithms for Tsubame 3 at the Tokyo Institute of Technology, testing IO-bound applications targeted for Oakforest-PACS at the Joint Center for Advanced HPC (JCAHPC), or optimizing applications for a SAP HANA in-memory database on new memory architectures, such as Intel® Optane™, Archanan is working with the world’s supercomputing centers, enterprise IT departments, and other providers of clusters to build emulation profiles of their systems down to the component level.

© 2019 Archanan. All rights reserved. { 6 }

Build a Custom Cluster Designed for the Developer’s Journey

Archanan provides flexibility. If the organization is not The Archanan Development Cloud’s browser-based committed to a certain cluster, developers can choose IDE lets coders work anywhere they have an Internet a different cluster, or develop and virtualize their own connection, with any device—desktop, laptop, tablet, test cluster. Programmers and system admins can or even a . By extending the power of their experiment with their own configurations using different development through the power of cloud network topologies, different CPU/GPU architectures, emulation, programmers can be more productive different buses, different libraries, to see on what kind in more places. The IDE is configurable to give a of system their codes will run best. The range of system rich, familiar experience. Its functionality includes component choices allow users to: debugging, tuning, and optimization tools, with visualizations that help quickly identify problem areas tt Compare how their MPI communicates across to accelerate development. InfiniBand versus Intel® Omni-Path, while keeping the rest of the system the same. Discover Which Cluster is Right for the Workload tt Evaluate performance on fat-tree, Dragonfly, and hypercube topologies. If developers are not sure where their code will run best, Archanan can evaluate the application and recommend tt Optimize the balance of parallelization across choices of profiled systems (including well known cores and nodes with OpenMP and MPI. supercomputers and configurations) that are best tt Simultaneously test performance delivered by matches for executing the code. different compilers and using different libraries With proven code at scale on Archanan, developers on CPU architectures, such as x86, IBM Power, are ready for production runs with confidence in code and vector engines. completeness, efficiency, and time to solution. tt Optimize for NVMe or SATA buses.

© 2019 Archanan. All rights reserved. { 7 }

Optimize Production Utilization

Achieve More Results Archanan works with system operators to create an emulation profile of the production clusters down to Computational scientists—in academia and enterprise— the processors, GPUs, buses, storage technologies, are starved for supercomputing capacity to develop and network topologies and devices, software stack, run their applications. Facility directors and IT managers and more. The virtualized system runs code as it are as dedicated to programmers and researchers would in the facility, with accurate estimates of time during their development phase as in their production to result. Developers and researchers can build an phase. Now it’s possible to give them the capacity they emulated cluster in minutes and begin developing at need for both development and production—without scale. Production workloads gain precious computing the high cost of scaling out an institution’s system. capacity previously unavailable, while researchers accelerate their development time tables and schedule production runs sooner.

Emulate before you buy

As organizations plan for their next large-scale computing deployment, they can evaluate system designs and configurations using Archanan. Archanan works with OEMs to emulate virtually any design, with any mix of architectures, topologies, and software. System evaluators can run test and acceptance codes on various potential configurations to determine— before a system is built—which design will best serve their organization’s needs.

Archanan works with several system component manufacturers, such as Intel, Mellanox, NVIDIA, and MareNostrum4 - Barcelona Supercomputing Center others to build exact profiles of the components that become crucial to the performance estimates of executed code. Those components become part With Archanan, institutions can both achieve greater of the OEM’s emulated system design, taking into production utilization from their supercomputers while consideration board architecture, network topology, enabling developers to accelerate their development communication buses, and etc. process at scale and be prepared for confident production runs of their codes on the facility’s systems. The result is a metric that can be trusted for the It’s the best of both worlds. completed design when it’s running production applications.

© 2019 Archanan. All rights reserved. { 8 }

Realize the performance. Then build the performance.

Original Equipment Manufacturers (OEMs) have given science and industry the capability to prototype and simulate before assembly and in-lab experiments, resulting in countless savings in dollars and hours. With Archanan Development Cloud, these organizations can fully emulate a new design—at scale—before it is implemented. Why go to the time and expense of building out a limited test system for a potential customer’s tender, when it can be emulated at full scale in minutes?

Optimize Business Benefits

Archanan will work with an organization and their chosen component manufacturers to create a customized configuration of any core count, topology, technology, and architecture, so designs can be proven before committed to. That offers incredible flexibility to experiment with a range of components and fine tune any design to leverage the best combination of performance, cost, and time to market, while better understanding any unknown risks in the configuration.

Prove Designs to Customers and Give Them Assurance

Provide customers with the ability to run their codes on emulated systems in Archanan—from test codes to acceptance applications—to see how the evaluated systems will perform for them and to accelerate the acceptance process. Use the emulated system to help a customer’s developers optimize their applications while the hardware for the new system is being built out, so they are ready for production on the first day following acceptance. Archanan’s emulation service can help accelerate system delivery and speed the customer’s research and development efforts.

Magnus - Pawsey Supercomputer Centre

© 2019 Archanan. All rights reserved. { 9 }

Get Archanan Develop at Scale Faster and at Less Cost

The Archanan Development Cloud gives developers and researchers an innovative tool to help them accelerate insight for the world’s scientists and researchers in academia as well as developers in enterprise and industry. Archanan enables fast development of codes at scale with rich IDE tools, reports and visualizations, and accurate performance estimates based on the systems of choice.

Archanan enables supercomputing center directors and IT managers to easily and cost-effectively support faster application development at scale while releasin g more CPU cycles for larger or more production runs. Archanan will work with public and private facilities to exactly emulate their systems and help optimize their development workflows.

Additionally, Archanan gives OEMs fast and cost-effective prototyping and emulation capabilities that help them make better design and business decisions while optimizing solutions for their customers. Archanan will work closely with design teams to give them and their customers a competitive advantage to their business.

Get Started with Archanan

Developers should register for Archanan Developer Cloud at www.archanan.io

Supercomputing facility directors need to contact Archanan to put your clusters in Archanan’s portfolio and increase production capacity while accelerating researchers’ development. www.archanan.io

OEMs should contact Archanan to virtually build your next-generation supercomputer. [email protected]

© 2019 Archanan. All rights reserved.