LS-CAT: A Large-Scale CUDA AutoTuning Dataset Lars Bjertnes, Jacob O. Tørring, Anne C. Elster Department of Computer Science Norwegian University of Science and Technology (NTNU) Trondheim, Norway
[email protected],
[email protected],
[email protected] Abstract—The effectiveness of Machine Learning (ML) meth- However, these methods are still reliant on compiling and ods depend on access to large suitable datasets. In this article, executing the program to do gradual adjustments, which still we present how we build the LS-CAT (Large-Scale CUDA takes a lot of time. A better alternative would be to have an AutoTuning) dataset sourced from GitHub for the purpose of training NLP-based ML models. Our dataset includes 19 683 autotuner that can find good parameters without compiling or CUDA kernels focused on linear algebra. In addition to the executing the program. CUDA codes, our LS-CAT dataset contains 5 028 536 associated A dataset consisting of the results for each legal combina- runtimes, with different combinations of kernels, block sizes and tion of hardware systems, and all other information that can matrix sizes. The runtime are GPU benchmarks on both Nvidia change the performance or the run-time, could be used to find GTX 980 and Nvidia T4 systems. This information creates a foundation upon which NLP-based models can find correlations the absolute optimal combination of parameters for any given between source-code features and optimal choice of thread block configuration. Unfortunately creating such a dataset would, as sizes. mentioned above, take incredibly long time, and the amount There are several results that can be drawn out of our LS-CAT of data required would make the dataset huge.