Jan Hoffmann | Q-CUDA

Collaborators

Stefan Muller

Overview

General-purpose programming on Graphics Processing Units (GPUs) has become prevalent in fields such as machine learning. As a result, NVIDIA has developed the CUDA (Compute Unified Device Architecture) framework to support programmers in effectively using GPUs by implementing specializedfunctions, called kernels, in a dialect of C++. However, the unusual executionmodel of CUDA may result in performance anomalies that would be difficult topredict for novice CUDA programmers. The objective of this project is todevelop reasoning techniques and automated tools for predicting the resourceusage of CUDA kernels and aiding programmers, including novices, in writing more efficient kernels.

The difficulty in analyzing the performance of CUDA, as opposed to other imperative languages, is that the same code runs in parallel on many threads that store independent copies of local program variables. We are developing novel analyses that can reason about multiple copies of program variables when necessary for precision but elide this information when possible to maintain scalability. Furthermore, the performance of CUDA code crucially depends upon its ability to hide latency of, for example, memory operations, by quickly switching among many threads. Reasoning precisely about execution times of CUDA kernels therefore requires reasoning about the latency of such operations and the behavior of the GPU’s thread scheduler. The tools and analyses we develop in this project can open the emerging field of General-Purpose GPU programming to a wider array of developers and improve the quality and efficiency of code in several important domains of computing.

Support

In part supported by NSF SHF Grant 2007784.

Publications

Modeling and Analyzing Evaluation Cost of CUDA Kernels Stefan Muller, and Jan Hoffmann In 48th Symposium on Principles of Programming Languages (POPL’21). 2021 [pdf]