TensorRT to improve Inference performance

Tags machine learning, technique Property Table of Content How TensorRT improve inference performance Tradeoff of TensorRT Conversion workflow How TensorRT improve model performance 1. Precision Calibration Convert weights and activation function from precision FP32 to FP16, INT8 to reduce the size of weights. This can cause the decrease in accuracy (sometimes significant) In real-time application, … Continue reading TensorRT to improve Inference performance

The correct way to measure inference time of Deep Neural Networks

Tags machine learning, technique Property Mistakes when measure the inference time of Deep Neural Networks 1. Transferring data between host and devices (CPU and GPU) Most common mistake is measure time of transferring data from CPU to GPU. This transfer is done unintentionally when a tensor is created on CPU and then performed on GPU. … Continue reading The correct way to measure inference time of Deep Neural Networks