What is TensorRT?

 What is TensorRT?

https://developer.nvidia.com/tensorrt


Last time, we talked about ONNX.

This time, I want to talk about TensorRT which is very closely related to ONNX especially if you are using NVIDIA GPUs and Cuda.


https://developer.nvidia.com/tensorrt

As we talked about last time, ONNX is an open standard for machine learning model interoperability; however, NVIDIA has its own optimal standard for loading and running the model in CUDA GPU. So NVIDIA made a machine learning framework called TensorRT to run inference on its hardware and GPUs.



TensorRT is built on CUDA, NVIDIA’s parallel programming model. Since TensorRT is highly optimized to run on NVIDIA GPUs, I believe it is the fastest tool for inference models if you are using NVIDIA GPUs. It can give around 4 to 5 times faster inference on many real-time services and embedded applications.

https://developer.nvidia.com/tensorrt

Before you use TensorRT, there is one thing that you need to check.

Make sure your GPUs have Cuda cores for TensorRT, otherwise, you won't be able to use TensorRT.


To check information about your Nvidia GPUs, go to your Command prompt and search "nvidia-smi".



Why TensorRT is Fast?


https://medium.com/@abhaychaturvedi_72055/understanding-nvidias-tensorrt-for-deep-learning-model-optimization-dad3eb6b26d9


1. Weight and Activation Precision Calibration

  • Models use FP32(Floating Point 32) precision for their weights when they are training. But TensorRT converts them to FP16 or INT8 to reduce the model size and also reduce latency.

2. Layers and Tensor Fusion
  • TensorRT optimizes the GPU memory and bandwidth to avoid executing similar computations. This optimization is done by fusing nodes in a kernel that allows GPUs to reduce the overhead and the cost of reading and writing the tensor data for each layer.


3. Kernel auto-tuning
  • Automatically, TensorRT selects the best layers, algorithms, and optimal batch size based on the target GPU platform.


With these techniques, TensorRT can allow models to inference faster in Nvidia GPUs.

Next time, I will talk about how to use TensorRT in the ONNX model.





References:

  • https://github.com/NVIDIA/TensorRT
  • https://blog.roboflow.com/what-is-tensorrt/#:~:text=TensorRT%20is%20a%20machine%20learning,a%20model%20at%20the%20moment.
  • https://medium.com/@abhaychaturvedi_72055/understanding-nvidias-tensorrt-for-deep-learning-model-optimization-dad3eb6b26d9

Comments

Popular Posts