Kernel Auto-Tuning

It's a process to optimize deep-learning models by selecting the best-performing kernel.
"Kernel" refers to a function that tuns on the GPU to compute efficiently, like:
- Convolutions
- Pooling
- Activations
- ....
The performance of these functions can vary depending on
- the model architecture,
- input size: a kernel optimized for large matrix multiplications might be inefficient for smaller matrices due to memory access patterns.
- hardware architecture: different NVIDIA GPUs have varying capabilities in terms of memory bandwidth, compute units, and support for certain operations.
- precision: kernels tailored for FP16 or INT8 computations will generally run faster and use less memory than those designed for FP32
So, Kernel Auto-Tuning will help to adapt each of these functions to the best kernel.

There are different tools available for this task: NVIDIA TensorRT, Apache TVM, OpenXLA, OpenVINO, ONNX Runtime or cuDNN.

Last updated 1 year ago

Was this helpful?