<<

WHITE PAPER

HPC on Kubernetes A practical and comprehensive approach

Leo Reiter CTO Nimbix, Inc. HPC on Kubernetes 2 A practical and comprehensive approach

Introduction

In 2012 Nimbix began running HPC applications using containers and in 2013, launched the world’s first container- native supercomputing cloud platform called JARVICE™. This platform was and is novel in various ways. First, it ran applications directly on bare-metal rather than virtual machines, utilizing Linux containers for security and multi-tenant isolation, as well as workload mobility. Second, it provided both a ready-to-run service catalog of commercial and open source workflows, in a Software-as-a-Service style, as well as a Platform-as-a-Service interface for developers and ISVs to onboard their own custom applications and workflows. At the time and largely to this day, most high performance cloud platforms leverage and provide mainly Infrastructure- as-a-Service interfaces – mechanisms appropriate for Information Technology professionals but not scientists and engineers looking to consume HPC directly. The Nimbix Cloud, powered by JARVICE, overcame both the performance penalties of virtualized processing, as well as the ease of use challenges of IT-focused interfaces as such IaaS. The JARVICE software has since been released as an enterprise platform (called JARVICE XE) for use on-premises or on 3rd party infrastructure but retains all the usability and performance benefits (when run on bare-metal and with computational accelerators and low latency interconnects) as the Nimbix Cloud itself.

At the time Nimbix began deploying workflows in containers, there was neither a standard stable enterprise- grade format for packaging applications nor an available orchestration mechanism for deploying said containers on machines. Fast-forwarding to the present, we now take for granted both and technologies such as Kubernetes. But before this, Nimbix had to invent the mechanisms and the “art”, in order to bring products to market. The Nimbix Cloud and JARVICE XE have since run millions of containerized workloads in a myriad of different forms, solving real-world HPC problems in just about every industry and vertical. In 2019 Nimbix released HyperHub™ as the marketplace for accelerated and containerized applications, delivered as turn-key workflows to the JARVICE platform regardless of what infrastructure powers it. Not only can scientists and engineers consume containerized HPC seamlessly thanks to JARVICE, but ISVs supporting these users can monetize and securely distribute their codes without having to reinvent the wheel to do so.

In a somewhat related context, various container web service platforms have begun to emerge over the past few years, most notably Kubernetes. released the open -source Kubernetes platform to the world in 2014 as an evolution of tools it had used internally to scale web-based applications such as . From top to bottom Kubernetes is designed to scale web services based on (mainly) Docker-style containers behind load balancers and web proxies known as “Ingress” controllers. The architecture is ideal for delivering stateless request/response-type of services (e.g. RESTful and other web applications). It also really simplifies development of said applications by automating the deployment patterns along standardized practices.

Just as JARVICE is not designed to serve out stateless web applications, Kubernetes is not designed to run HPC and other tightly coupled workflows efficiently. Both platforms utilize containers for the runtime of applications, but the workloads and compute methods they support are drastically different. In a world of standardization to improve operational efficiency however, it’s not ideal to maintain different types of platforms for different types of applications. IT Organizations increasingly look to consolidate using a “layer cake” approach – e.g. the “infrastructure layer” should be able to run any type of application, in order to reduce the need for expensive specialized practices and additional labor.

Higher-level “layers” are there for the specifics, but rely on the underlying layers to cover the basics. Single-purpose platforms are generally phased out in favor of general-purpose ones – look no further than the rise of Unix and Linux systems since the 1970s and how they increasingly displaced the mainframe. For all but the most sensitive government and research type of deployments, so too should general-purpose platforms begin to displace traditional HPC. The missing link is the unifying technology to enable both the commodity Enterprise applications with more specific scientific and engineering ones on a common infrastructure management layer. As this paper will demonstrate, JARVICE XE combined with Kubernetes provides a practical approach to achieve just this. HPC on Kubernetes 3 A practical and comprehensive approach

Table of Contents

Linux Container Basics ...... 04 The Container Ecosystem, Explained ...... 05 An Alternative Approach to Linux Containers: ...... 05 Container Native versus Containerized Applications with Docker...... 07 HPC on Kubernetes ...... 09 JARVICE vs Kubernetes Application Pattern Support ...... 09 HPC on basic Kubernetes ...... 10 HPC on Kubernetes with JARVICE XE ...... 10 High Scale versus High Throughput Job Scheduling ...... 13 Converged IT Infrastructure with Kubernetes and JARVICE XE...... 14 Multi and Hybrid Cloud with JARVICE XE ...... 15 Conclusions ...... 16 HPC on Kubernetes 4 A practical and comprehensive approach

Linux Container Basics

A Linux container, most commonly At runtime, a container provides 3 basic can be “passed through” from the formatted as a Docker container, can mechanisms: host to the container. Unlike with be thought of as just a runtime 1. The filesystem – typically presented VMs there is no need for complex context for application bits. Unlike a as a fully assembled “jail” that a paravirtualization nor bus-level , a container shares containerized application cannot emulation methods since the the host operating system kernel “escape” from; this also means that applications share the same host with other containers. This makes everything the application needs kernel. It’s, therefore, possible to containers a much lighter must exist within this filesystem, connect devices such as mechanism to run applications since since it can’t easily access the computational accelerators they do not need to package an underlying host to leverage existing (FPGAs, GPUs, etc.) to containerized entire operating system stack with and directories (unless the applications easily and without kernel and drivers. Instead, the focus runtime environment is explicitly told overhead. But this must also be is on the encapsulated application to “bind” files and/or directories from governed with extreme care as it can and its dependencies needed to run the host –a practice that should be lead to a containerized application (shared libraries, configuration files, executed carefully due to the accessing any resource on the host etc.). Containers can be thought of inherent security concerns around it). system without restriction. as a type of application virtualization, allow container platforms to restrict where traditionally virtualization in a 2. The “namespaces” – in addition to access to devices as well as general hypervisor context has been machine the filesystem “jail”, it’s important system resources such as memory level (hence the term that containers cannot “see” and CPU cycles with very fine-grained “virtual machine”). processes, network interfaces, and control and minimum overhead. other objects outside their own A container at rest provides the files context. Using namespaces allows an application needs in order to run applications to run in isolation both (binaries, scripts, configuration files, from each other as well as from the and libraries), together with some host system. A secure containerized metadata to describe how to platform such as JARVICE or assemble the container runtime Kubernetes will allow running environment itself. In Docker terms, multiple containers per host, without containers are packaged as stacked those containerized applications layers that can be cached individually even being aware of each other nor on target systems, but it’s up to the of those that may be running on the runtime environment to assemble underlying host operating system the filesystem that containerized directly. applications operate from. 3. Access controls and resource limits - in Linux the primary mechanism for achieving this is known as “cgroups”. One of the major benefits of containers versus virtual machines is the ease with which system devices HPC on Kubernetes 5 A practical and comprehensive approach

The Container Container Platform issues, but this is widely understood. If containers are the application in a In the Singularity school of thought, Ecosystem, container-native context, containers actually use the host for Explained then the container platform is the networking and interconnects, and operating system on which things do not allow containerized Container Registry run. The container platform provides applications to gain root privileges A remotely accessible object store for interfaces for end-users and APIs to even in their isolated runtime container images (or containers at run and manage containerized contexts. The design philosophy is to rest); generally fronted with a RESTful applications. JARVICE in the Nimbix better support workload portability API for easy access from clients. Cloud is the defacto container-native within the context of monolithic Popular container registries include platform for HPC, while Kubernetes is traditional HPC environments where Docker Hub and gcr.io. Additionally, increasingly becoming the defacto everything is already installed the Nimbix HyperHub provides container native platform for web on the host, such as MPI libraries workflow-level metadata and service applications. JARVICE XE (a key HPC component), etc. In the authorization to various container actually interfaces with Kubernetes to Docker philosophy, the host provides registries as a unified interface for run HPC applications on converged IT nothing other than the runtime synchronizing applications across infrastructure. environment setup itself to the clusters automatically. containerized applications. Therefore Docker-style containerized Container Format An Alternative applications must bring all of their The defacto standard for own dependencies. While this containerized application packaging Approach to Linux results in slightly “fatter” images, it is Docker, but other formats exist as Containers: does simplify running diverse codes well (e.g. Singularity, but see below). on the same systems at the same Formatted images are “pushed” to Singularity time and is much better suited to the Container Registry and “pulled” by multi-tenant environments such as Container Runtimes. While Docker-style containers JARVICE. are increasingly ubiquitous and Container Runtime general-purpose, another format has Once again Docker is the most made some inroads in HPC: popular container runtime, Singularity. The main difference which actually pulls and runs is in architectural philosophy. The containerized applications. But the Docker format (and Docker runtime) Open Container Initiative (OCI) allows is intended to work in full isolation, other engines, such as "containerd", providing its own network and to run Docker-style containers system contexts to each container. It without modification. What container also allows a containerized engine runs on a platform is less application to gain administrative relevant than whether or not it can access (also known as “root”) within support Docker-style containers, its “jail” and namespaces (see above). given the popularity and vast The container platform must ensure adoption of this format. that the applications are properly resource managed to avoid security HPC on Kubernetes 6 A practical and comprehensive approach

Figure 1: Containerized HPC Application Comparison HPC on Kubernetes 7 A practical and comprehensive approach

While Singularity is making inroads in service ports. Scale-out is also ISV codes are not container native, traditional HPC, it is unlikely to handled by the container cannot be broken up into challenge Docker-style containers for platforms, to further reduce , and are simply not general purpose applications, and will complexity. In a Kubernetes aware of any sort of container therefore likely not be relevant in environment, applications must runtime. unifying HPC and commodity follow simple rules for service 2. Additional layers are added to applications on converged IT discovery and the platform takes extend the container functionality, infrastructure. care of the rest. In a JARVICE such as the graphical user environment, HPC applications interface (e.g. web-based desktop are automatically configured in a or shell), 3rd party integrations (e.g. runtime environment conducive for “cosimilation” across different Container Native to seamless MPI and other vendor codes), and convenience versus Containerized parallel distribution methods (e.g. desktop applications that across nodes. users typically run in conjunction Applications with with their HPC codes, such as text Docker editors, etc). Nimbix has open While container-native applications sourced the graphical desktop Ideally, applications running in are of course a type of containerized environment for containers in a containers are container-native. This application the reverse is not true. GitHub package called “image- generally means several things, but For example, it is possible (using common”. usually boils down to: well-understood methods) to 3. Workflow scripts, to automate containerize traditional applications running applications and plumbing 1. Minimized dependencies – only for distribution and mobility, be they license server connectivity, those libraries and configuration open source or commercial bits. configuring parallel scale files needed for the specific Over the years Nimbix developed a parameters automatically, etc. purpose of the container are methodology to do just this, given These scripts assume a JARVICE packaged; in fact, increasingly the breadth and complexity of platform underneath but can easily statically linked binaries such as existing HPC applications and the be emulated locally for unit testing those produced from the Go need to containerize them without application workflows at low scale. language are replacing modification: 4. Metadata for workflow automation application stacks, further and “trade packaging” – simple 1. A container image is built using simplifying and making container declarative files are layered in, the application’s installer. In images leaner than ever. which help the JARVICE platform Docker terms, this means running 2. Simplified operations – since generate user interfaces and installation scripts in “silent” containers are intended to be run present high-level workflows to mode (without the user interface, on container platforms, it’s not engineers and scientists. This since there is no opportunity for necessary to package large helps “construct” commands and user input when building frameworks such as scalable parameters to execute inside the container images), in the Docker HTTP(S) servers, firewall container image at runtime. file itself. Years of research and software, etc. The container The “trade packaging” includes evolution led to highly optimized platforms usually provide this descriptive information, screen layers and support for even the functionality and simply proxy shots, and optionally EULA most complex and extensive requests to containerized language, and is used in application suites. Again, the applications via standardized HPC on Kubernetes 8 A practical and comprehensive approach

generating the service catalog necessary post-install fixups, and With the better part of a decade of for HyperHub. This metadata is develop wrapper mechanisms (e.g. experience doing this, Nimbix has isolated and does not interfere workflow scripts) to parameterize both performed this feat on many with non-JARVICE platforms in and dynamically edit configurations popular application stacks and any way – maintainers can to adapt to dynamic environments advised countless developers and continue to run their codes on at runtime. A major challenge is ISVs on best practices for generic Docker runtimes the ephemeral nature of containers containerizing their own codes in a without JARVICE, but also – unlike on a workstation or server self-service fashion. The JARVICE without the end-user benefits of there is no persistence unless files platform even provides a CI/CD workflow automation. are stored in explicit volumes – so pipeline mechanism known as in some cases, considerable fixups PushToCompute™, which can In effect, all that is needed to are needed. further help to enable and deploy containerize traditional traditional applications on applications is to automate an heterogeneous platforms and installation script, perform architectures.

Figure 2: Sample containerized (not container-native) application running on JARVICE platform

Examples and Reference Material Additionally, the JARVICE Developer Nimbix provides various examples of containerized Documentation includes complete reference. traditional and container-native workflows on GitHub. Note that while all of this assumes a container These sources can be used as patterns to produce runtime environment that sets up and mimics a Dockerfiles and metadata for various types of HPC dynamic HPC cluster, it is not strictly mandatory applications. to use the JARVICE platform itself. HPC on Kubernetes 9 A practical and comprehensive approach

HPC on Kubernetes

The following section will examine the options for running HPC workflows on Kubernetes.

JARVICE vs Kubernetes Application Pattern Support For comparison, the following table illustrates the level of support for various common application patterns between Kubernetes and the original JARVICE platform powering the Nimbix Cloud. For reference, the original JARVICE platform predates Kubernetes.

Pattern Example JARVICE Support Kubernetes Support

Scale-out MVC Basic: can run single or Full: can run as a multi-container (model/view/ multiple containerized image and scale services Service-oriented controller) application images at predetermined individually, as well as provide application (SOA) – e.g.: content scale, but does not provide service discovery and load automatic load balancing or balancing. management system HA.

Partial: a Kubernetes “Deployment” can indeed launch multiple containers in parallel, but the scale is “best effort” Full: container environment at launch time as “gang scheduling” is setup automatically spans not possible; also it is not possible for HPC: “embarrassingly Monte Carlo multiple nodes with scale the application to perform different parallel”, or “perfectly simulation decided at launch; “master” functions on the “master” container parallel” application node can set up and shard as the “slaves”, so sharding and setup data, etc. must be performed either in advance or manually. This is not architecturally compatible with most existing applications.

Full: ready-to-run for MPI- initiated solvers, whether None: a Kubernetes “Deployment” is direct via mpirun or indirect not suitable for this mechanism as it via an application front-end cannot guarantee scale neither at to setup data and launch nor at runtime, does not HPC: tightly-coupled Computational Fluid processing. JARVICE support “gang scheduling” to queue parallel solver Dynamics (CFD) provides a dynamic cluster jobs until all parallel resources are complete with fabric setup, available, does not automatically SSH trust between nodes elect and run code on a “master”, and (parallel containers), and does not automatically configure generated machine files for fabric for applications. MPI, etc.

Full: supports HPC-style deep learning similarly None: for the same reasons as for to other parallel solvers, parallel solvers; alternative training AI: Accelerated Distributed Deep which is architecturally workflows must be used (assuming parallel training Learning (DDL) compatible with distributed framework support) when scale is training frameworks such as needed. Horovod.

Full: when using technologies such as FPGAs and novel Full: assuming Kubernetes plugins inference hardware, JARVICE AI: Accelerated Inference exist for accelerated hardware, can real-time analytics can provision and host single support these stacks much like it or multi-node services at does SOAs (see above). predetermined scale. HPC on Kubernetes 10 A practical and comprehensive approach

HPC on basic construct a machine file and launch the MPI-based solver to distribute containers and hand off control to a Kubernetes the work. Obvious drawbacks are setup process automatically before complexity, as this requires taking starting work, similar problems exist Since the platform itself does not apart an application or running even though the solver architecture provide adequate HPC support (as it in different stages, as well as lends itself better to a stateless replica explained above), here are some managing individual containers style of system. options and workarounds. explicitly. What’s more, it still does If the algorithm never needs to not guarantee launch scale as the coordinate (not even at setup time), Single Pod Solvers Kubernetes scheduler does not this is likely not any sort of HPC solver If the need is multi-core/multi-thread support gang scheduling, so rather to begin with, but may lend itself well rather than multi-node, an MPI-based than queue a set of pods until to run on a standard Kubernetes solver can be provisioned as a single the full capacity is available, it will platform. container in a single pod, and bound simply bind whatever it can and to a single node. This eliminates the continue to do so until all pods are need for fabric setup and “master/ bound1. HPC on Kubernetes slave” type configurations. The with JARVICE XE solver can simply be launched to use External control can also be shared memory interconnect on the performed manually, with a user JARVICE XE bridges the gap to provisioned CPU cores and threads. watching the available pods and running HPC codes on Kubernetes Depending on the underlying node proceeding with the setup once the with 2 major advances: capacity, this may suffice for some desired scale is bound. forms of HPC solves, but obviously Two Level HPC Scheduler restricts scale to whatever can fit on Regardless, establishing SSH trust The scheduler provides 2 levels, one a single node. between containers will still be that converts a traditional HPC job required – this can either be done request into a set of Kubernetes pods, External Control with generic trust at build time and a gang scheduler that binds pods For multiple pods, one possible (easy, but not secure), or explicitly to nodes, queuing entire jobs if the workaround is to launch all after launch (more complex and requested scale is not available. containers in “standby mode” difficult to automate in the Additionally the gang scheduler – for most applications this means application layer). provides the following important simply performing an init or functions: launching an SSH server explicitly. Embarrassingly Parallel Solvers An external process can then While a bit simpler to scale, 1. “Best fit”, ideal for heterogeneous discover what pods are actually embarrassingly (or “perfectly”) deployments –the JARVICE XE available for a given deployment, parallel solvers still need setup and pod scheduler will always try to what the container addresses are, may require sharding data before place pods on nodes with the and proceed to launching. If the platform cannot fewest total resource, including coordinate a guaranteed set of accelerators, etc. This is different than taking load into account, and assuming there is no attempt at genuine oversubscription, results in better resource utilization.

1 In Kubernetes terms, binding a pod means placing a pod (and its encapsulated containers) on a worker node for processing; this is the equivalent of a job scheduler choosing a node to run work on, and in turn running the work. HPC on Kubernetes 11 A practical and comprehensive approach

For example, in a cluster where a which is more along the lines of resource request before handing off fraction of the nodes have GPUs what end users expect. When it to the pod scheduler for binding. This in them, it does not make sense comes to scheduling batch HPC entire mechanism is opaque to the to ever schedule CPU-only work jobs, this is still appropriate for end user, making system operation on them unless all of the CPU- “queue and forget” operation, much simpler. only nodes are in use. If only where users don’t need to worry current load is taken into about resource management at The lower level part of the scheduler account, this can easily lead to the time they schedule work. (also known as the pod scheduler) wasted cycles on nodes with 4. Tenant isolation – in multi- can in fact run side by side with the more scarce and novel devices. tenant environments it’s default Kubernetes pod scheduler, This scheduler implements often important for security but of course race conditions may lessons learned in resource or compliance reasons to prevent exist if competing for the same management on the Nimbix nodes from running work resources. Kubernetes does not use Cloud, which is a multi-tenant belonging to multiple tenants. global critical sections when binding heterogeneous deployment. Even in single tenant pods, so it’s possible to environments there can still be oversubscribe resource (leading to 2. Configurable resource weighting regulatory restrictions between pod eviction) if both the default and – different service providers may teams of users. Labeling nodes the JARVICE XE pod scheduler are have different economics when it and defining resource types that trying to bind pods to the same comes to what is more valuable – target them is an obvious way to nodes. The best practice e.g. large memory nodes versus achieve this, but that is very static is to use labels as well as taints GPU nodes. The JARVICE and may not yield the best to ensure JARVICE has exclusive XE pod scheduler can be utilization in cases where the control of scheduling on hardware configured to weigh these exact machines are not restricted. used specifically for HPC. The appropriately. Instead, JARVICE XE can ensure JARVICE pod scheduler does support 3. Advisory limit support – in multi- that dynamically, no two tenants multiple namespaces, so it’s in fact tenant or even multi-user or teams share the same nodes possible to have several deployments environments it is sometimes for work at the same time. of JARVICE XE on the same cluster necessary to restrict the amount The upper level part of the scheduler scheduling work at the same time and type of resource certain users also presents resource collections to without the risk of race conditions. and teams can consume. users as “machine types”, which is a JARVICE XE supports self-service much more natural way for end users The scheduler is accessible via API or and administrator-imposed limits to select scale and capability when via point-and-click web portal. As even if more resource is available running work. Machine types can of mentioned above, JARVICE XE uses at the time of scheduling jobs. course request very granular metadata from the applications in the This is different than the resources, but this complexity can be HyperHub™ catalog to define namespace limits Kubernetes abstracted from the end user, who workflows for the end user, rather already supports, because in that for example simply decides to run a than requiring users to write PBS or case trying to exceed limits job on 32 16 core nodes for a total of Slurm scripts to launch work. results in failed object creation. 512 cores using MPI. JARVICE XE In the JARVICE XE case, jobs will converts this request to the queue when limits are reached, appropriate pod replica count and HPC on Kubernetes 12 A practical and comprehensive approach

Figure 3: JARVICE HPC Scheduler Architecture on Kubernetes: submission flow for a sample 64 core batch simulation

HPC Runtime Environment completions on their own, so that 4. Automatically generates machine The second major advance users have the opportunity to files that can be passed directly JARVICE XE brings to Kubernetes review workflow parameters and into mpirun command lines, for is the HPC runtime environment, retry with the appropriate example, and ensures that slaves created dynamically when jobs adjustments. Automatic restarts are ready to receive the request start. This environment performs with the same parameters would thanks to established SSH trust, the following functions: result in an endless loop of fabric configuration, etc. Each job failures that the end user may runs in its own dynamic cluster 1. Configures either a batch run or never detect. environment that presents itself an interactive interface based on 3. Establishes a clear “master/ quite naturally to traditional HPC parameters from the scheduler’s slave” topology, with a “head software. workflow request node” design that initiates solvers 5. Supports any type of Kubernetes 2. Ensures finite completion of to fan work out to worker nodes PersistentVolumeClaims, workflows, whether solvers (slaves). This means traditional including ReadWriteOnce, and succeed or fail (by default, using HPC codes can run unmodified on dynamically shares them across the Kubernetes “Job” API, failed JARVICE, rather than implement all nodes in a job. This means a solvers would restart complex master election logic user can launch a multi-node job automatically); since failures are before they can perform any work. with a block storage volume, a normal part of HPC job processing, it’s important these are captured and treated as 13 HPC on Kubernetes A practical and comprehensive approach

and the solver will automatically As mentioned above, Nimbix simulation or distributed deep get shared filesystem access provides web-based desktop learning jobs take several minutes, from all nodes without the risk environments as open source hours, or days to run, even at scale. of data corruption or indefinite layers for any container, so this is queuing due to storage a very effective way to provide For high-throughput scheduling, it’s contention. JARVICE XE also graphical user interfaces to best to not rely on the platform itself supports NFS and CephFS traditional HPC applications. to run the individual jobs. This is shared filesystems directly JARVICE XE uses secure tokens where it’s appropriate to schedule a without the need to manage to ensure users do not gain single large job as a dynamic cluster, Kubernetes volumes access to each other’s desktops and then embed a job scheduler underneath, if so desired. unless they explicitly share links inside it. Nimbix provides a simple Additionally, JARVICE XE’s or impersonate each other (an example pattern on HyperHub for this runtime environment can attach optional feature of JARVICE XE). called “HPC Test Environment” storage interfaces that are not (container source available on natively supported in GitHub), which actually deploys a Kubernetes, by defining host- High Scale versus Slurm cluster dynamically that is level mount points as part of High Throughput Job ready to schedule work once it starts. machine definitions. This While commonly used for testing or enables parallel storage Scheduling for providing on-demand academic- systems such as WekaIO2. style clusters to end users, this base Job scheduler requirements often 6. Supports network-mounted home image also fits certain use cases very conflate use cases and it’s important directories, based on user login, well, like many bioinformatic codes to understand the difference. including for automatically deriving that tend to run array jobs on JARVICE XE, at the platform level, user identity ID’s (UID and GID) sequencer data, etc. Once the job provides a high scale job scheduler, from login name and files in their starts and the embedded scheduler is meaning that users can submit jobs network mounted home fully configured, job submission that run on many machines directories; this eliminates the latency within the dynamic cluster is at once. Jobs can however take need for complex configurations in no different than on physical static several seconds to several minutes containers in order to access systems. It then becomes efficient to (depending on workflow complexity, ActiveDirectory, etc. run many short-lived jobs without container cache status, etc.) to start 7. Manages secrets, configurations, incurring platform overhead of processing. The platform-level services, and ingress automatically provisioning containers and other scheduler itself is appropriate for on a per-job basis. For example, if objects on infrastructure underneath. running large jobs that take a finite a user runs an interactive job, The obvious drawback is that in an but considerable amount of compute JARVICE XE will automatically on-demand environment, a user time to execute. This is again configure an ingress for it so that consumes all of the resource for the inspired by serving on-demand HPC the end user can access it duration of the dynamic cluster’s from the Nimbix Cloud for the better remotely via a web browser. lifetime, versus jobs ending and part of a decade. Most engineering/

2 Host mounts bound into containers should be used with great care and only when necessary, such as when the storage technology does not lend itself to being managed with technologies such as the Container Storage Interface (CSI). The JARVICE platform provides appropriate controls for specific patterns only. HPC on Kubernetes 14 A practical and comprehensive approach relinquishing resources automatically Converged deployments benefit Multi and Hybrid in the high-scale platform scheduling organizations in the following ways: model. Either users or applications 1. Reduction of specialized skills the Cloud with themselves must actually terminate same team that manages the IT the dynamic cluster in the embedded JARVICE XE infrastructure can now manage scheduler model in order to return HPC with minimal training; since Because JARVICE XE provides a resources to the system. JARVICE XE delivers curated processing API and a common, workflows from HyperHub, this synchronized service catalog includes HPC application (from HyperHub), workload Converged IT engineering as the workflows are mobility is very seamless. The Infrastructure with already ready to run. same workloads run on any cluster Kubernetes and 2. Improved utilization – while the best providing JARVICE XE, including practice is to segregate HPC and the Nimbix Cloud which retains JARVICE XE commodity workloads on the same compatibility. cluster, there is no need for JARVICE XE typically runs as a set of redundant control planes, storage, Even in the case where different services on a Kubernetes cluster, or networking; this improves cloud infrastructures provide deployed via Helm chart. It’s trivial for economies of scale of a cluster, and different types of resources, cluster administrators to apply makes it easier to plug in JARVICE XE applications retain updates and manage JARVICE XE specialized hardware without “step compatibility, and well-defined itself as any other framework on the functions” and complex patterns for machine types and system. JARVICE XE handles the administrative interfaces. resource requests ensure entire user interface above that, portability. JARVICE XE does not including user HPC applications, 3. Data sharing between workloads a currently manage data movement identity, authentication, authorization, common use case would be and migration across multiple job control, accounting, and auditing. combining an SOA application with clouds, but supports whatever It supports multiple tenants (or an HPC one on the same mechanisms are employed teams) without having to deploy infrastructure. For example, a full underneath to achieve this. In the separate instances. In short, it makes deep learning pipeline could feature case where users do not ever run enabling HPC on a large Kubernetes an SOA-based inference service that work on multiple clouds (but deployment as straightforward as reads trained models from the different teams of users do), this is deploying any other types of same storage that an HPC-based generally not an issue as the data applications. DDL training workflow produces either on demand or continuously. will reside with the compute at all The SOA could trigger training via times. web service API when there is new data, or for reinforcement purposes, JARVICE XE provides some and JARVICE ensures that the DDL additional features to ease multi- training architecture runs correctly. cloud deployments: 1. Helm chart deployment on 4. Migration and recovery Kubernetes clusters makes simplification – cluster managing JARVICE XE on administrators can rely on the same multiple infrastructures tools and techniques to handle seamless, including commodity versus HPC workloads deploying updates and when there is a need to move, changing deployment replicate, or recover a cluster. parameters HPC on Kubernetes 15 A practical and comprehensive approach

2. Scripted support for managed machines to leverage (provided all other constraints are met) Kubernetes endpoints on public when the originally requested clouds such as Amazon EKS and ones are busy. Google GKE; this is a higher level abstraction than the Helm chart Because multi-cloud deployments itself, supporting single command are often a technique for security deployments of JARVICE XE in a and compliance, JARVICE XE can be matter of minutes. used to restrict access to compute 3. Public and private application locations for tenants, teams, and synchronization across clusters; individual users based on login not only are upstream HyperHub credentials and group applications synced, but also memberships. This can also include select private and custom access to applications (e.g. to applications, further improving enforce export-control), data sets, workload mobility. and even hardware. 4. Single sign-on with and without federation; JARVICE XE supports In all cases, the underlying both Active Directory and SAML2, management, whether single allowing user identity to follow or multi cluster, is completely across multiple deployments with transparent to the applications consistency. themselves. This truly enables a 5. “Single pane of glass” – the same “publish once, run anywhere” model JARVICE XE interface operates for HyperHub application containers. any cluster containers. a. Available Q4 2018: URL- based federation – users visit the cluster or zone they wish to run work on by targeting its user portal ingress URL. b. Available Q2 2020: administrator and team- controlled federation from the same portal URL –users launch and manage jobs on multiple clusters from a single web portal URL. c. Available Q2 2020: bursting of jobs from one cluster to another based on site- configured policies and rules; in the JARVICE multi-cloud model, machine types are pinned to clusters, so burst policies define which HPC on Kubernetes 16 A practical and comprehensive approach

Conclusions

• Kubernetes is emerging as the standard Enterprise container deployment platform, but does not support HPC • JARVICE in the Nimbix Cloud is the leading on-demand HPC platform for containerized applications • JARVICE XE brings HPC to Kubernetes-managed infrastructure, including a rich (and ever expanding) catalog of ready-to-run open source and commercial workflows • JARVICE XE provides a runtime environment for traditional applications that can run unmodified and not need to be converted to container-native architectures; this is especially valuable when deploying commercial stacks where the source code is not available nor the architectures configurable enough to change dramatically • JARVICE XE supports private, hybrid, public, and multi-cloud deployments • JARVICE XE makes it easy for Enterprise IT departments to add HPC to their portfolio of services without having to build additional practices, tooling, or specialized skills

www.Nimbix.net

linkedin.com/company/nimbix facebook.com/nimbix twitter.com/nimbix

Copyright © 2019 Nimbix, Inc