Kernel Auto-Tuning
It's a process to optimize deep-learning models by selecting the best-performing kernel.
"Kernel" refers to a function that tuns on the GPU to compute efficiently, like:
Convolutions
Pooling
Activations
....
The performance of these functions can vary depending on
the model architecture,
input size: a kernel optimized for large matrix multiplications might be inefficient for smaller matrices due to memory access patterns.
hardware architecture: different NVIDIA GPUs have varying capabilities in terms of memory bandwidth, compute units, and support for certain operations.
precision: kernels tailored for FP16 or INT8 computations will generally run faster and use less memory than those designed for FP32
So, Kernel Auto-Tuning will help to adapt each of these functions to the best kernel.
There are different tools available for this task: NVIDIA TensorRT, Apache TVM, OpenXLA, OpenVINO, ONNX Runtime or cuDNN.
Last updated