https://developer.nvidia.com/zh-cn/tensorrt
NVIDIA TensorRT™ 是用于高性能深度学习推理的 SDK。此 SDK 包含深度学习推理优化器和运行时环境,可为深度学习推理应用提供低延迟和高吞吐量。
在推理过程中,基于 TensorRT 的应用程序的执行速度可比 CPU 平台的速度快 40 倍。借助 TensorRT,您可以优化在所有主要框架中训练的神经网络模型,精确校正低精度,并最终将模型部署到超大规模数据中心、嵌入式或汽车产品平台中。
TensorRT 以 NVIDIA 的并行编程模型 CUDA 为基础构建而成,可帮助您利用 CUDA-X 中的库、开发工具和技术,针对人工智能、自主机器、高性能计算和图形优化所有深度学习框架中的推理。
TensorRT 针对多种深度学习推理应用的生产部署提供 INT8 和 FP16 优化,例如视频流式传输、语音识别、推荐和自然语言处理。推理精度降低后可显著减少应用延迟,这恰巧满足了许多实时服务、自动和嵌入式应用的要求。
TensorRT 显著提高了 NVIDIA GPU 上的深度学习推理性能。也就是Nvidia芯片上的
参考:Jetson平台安装TensorRT
官网:https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html
cuda11.4以上可以直接下载TensorRT8.4的源码,里面有编译好的.whl文件
安装好后验证
# 查看 python3中引入 tensorrt
python3
import tensorrt
print(tensorrt.__version__)
>>> assert tensorrt.Builder(tensorrt.Logger())
这里安装了tensorRT 7.1.3版本,查看对应的文档
Note: The TensorRT Python API is not available for all platforms. For more information, see TensorRT Support Matrix
Logger
Most other TensorRT classes use a logger to report errors, warnings and informative messages. TensorRT provides a basic tensorrt.Logger implementation, but you can write your own implementation by deriving from tensorrt.ILogger for more advanced functionality.
Parsers
Parsers are used to populate a tensorrt.INetworkDefinition from a model trained in a Deep Learning framework.
Network
The tensorrt.INetworkDefinition represents a computational graph. In order to populate the network, TensorRT provides a suite of parsers for a variety of Deep Learning frameworks. It is also possible to populate the network manually using the Network API.
Builder
The tensorrt.Builder is used to build a tensorrt.ICudaEngine . In order to do so, it must be provided a populated tensorrt.INetworkDefinition .
Engine and Context
The tensorrt.ICudaEngine is the output of the TensorRT optimizer. It is used to generate a tensorrt.IExecutionContext that can perform inference.
具体请看官方文档
查看官方demo,如何利用TensorRT推理PyTorch模型
“Hello World” For TensorRT Using PyTorch And Python
从demo可以看出,PyTorch的模型如果需要转TensorRT,需要用TensorRT的PythonAPI来一层层构建PyTorch对应的神经网络
查看官方demo
Refitting An Engine Built From An ONNX Model In Python
"""Takes an ONNX file and creates a TensorRT engine to run inference with"""
builder = trt.Builder(TRT_LOGGER)
# Set max threads that can be used by builder.
builder.max_threads = 10
network = builder.create_network(common.EXPLICIT_BATCH)
parser = trt.OnnxParser(network, TRT_LOGGER)
runtime = trt.Runtime(TRT_LOGGER)
# Set max threads that can be used by runtime.
runtime.max_threads = 10
# Parse model file
print("Loading ONNX file from path {}...".format(onnx_file_path))
with open(onnx_file_path, "rb") as model:
print("Beginning ONNX file parsing")
if not parser.parse(model.read()):
print("ERROR: Failed to parse the ONNX file.")
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
print("Completed parsing of ONNX file")
可以得到结论,tensorRT具备ONNX Parser,直接读取ONNX模型,并用于推理。