Novel Methodologies for Predictable CPU-To-GPU Command Offloading Roberto Cavicchioli Università di Modena e Reggio Emilia, Italy
[email protected] Nicola Capodieci Università di Modena e Reggio Emilia, Italy
[email protected] Marco Solieri Università di Modena e Reggio Emilia, Italy
[email protected] Marko Bertogna Università di Modena e Reggio Emilia, Italy
[email protected] Abstract There is an increasing industrial and academic interest towards a more predictable characterization of real-time tasks on high-performance heterogeneous embedded platforms, where a host system offloads parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). In this paper, we analyze an important aspect that has not yet been considered in the real-time literature, and that may significantly affect real-time performance if not properly treated, i.e., the time spent by the CPU for submitting GP-GPU operations. We will show that the impact of CPU-to-GPU kernel submissions may be indeed relevant for typical real-time workloads, and that it should be properly factored in when deriving an integrated schedulability analysis for the considered platforms. This is the case when an application is composed of many small and consecutive GPU com- pute/copy operations. While existing techniques mitigate this issue by batching kernel calls into a reduced number of persistent kernel invocations, in this work we present and evaluate three other approaches that are made possible by recently released versions of the NVIDIA CUDA GP-GPU API, and by Vulkan, a novel open standard GPU API that allows an improved control of GPU com- mand submissions.