GEMTC: GPU Enabled Many-Task Computing Scott Krieder, Ioan Raicu Department of Computer Science Illinois Institute of Technology Chicago, IL USA
[email protected],
[email protected] Abstract— Current software and hardware limitations prevent acceleration on HPC resources. However, a programmability gap still Many-Task Computing (MTC) workloads from leveraging exists between applications and accelerators. Researchers and hardware accelerators (NVIDIA GPUs, Intel Xeon Phi) boasting developers are forced to work within the constraints of closed Many-Core Computing architectures. Some broad application environments such as the CUDA GPU Programming Framework[3] classes that fit the MTC paradigm are workflows, MapReduce, (for NVIDIA GPUs). The goal of this work is to improve the high-throughput computing, and a subset of high-performance performance of MTC workloads running on GPUs through the use of computing. MTC emphasizes using many computing resources the GEMTC framework. The CUDA framework can only support over short periods of time to accomplish many computational 16 kernels running concurrently, one kernel per streaming tasks (i.e. including both dependent and independent tasks), multiprocessor (SM). One problem with this approach is that where the primary metrics are measured in seconds. MTC has all kernels must start and end at the same time, causing extreme already proven successful in Grid Computing and inefficiencies in heterogeneous workloads. By working at the Supercomputing on MIMD architectures, but the SIMD architectures of today’s accelerators pose many challenges in the warp level, (which sits between cores and SMs) I can trade efficient support of MTC workloads on accelerators. This work local memory for concurrency, and I am able to run up to 200 aims to address the programmability gap between MTC and concurrent kernels.