• 深度学习模型试跑(十五):Real-ESRGAN(win10 trt推理部署)


    前言

    超分辨率修复、重建是采用低分辨率(LR)输入并将其提高到高分辨率的任务,具体原理可以参考paddlegan原理介绍。对于在linux上如何实现trt加速该网络,这里有一篇文章详细记录了过程。
    我的环境:

    • Visual Studio 2019
    • CUDA 11.6,cudnn 8.2
    • CMake 3.17.1
    • Tensorrt 8.4.1.5

    一.模型解读

    作者在ESRGAN(Real-ESRGAN同一作者)的基础上,又做了一些改进。这篇博文把这块改进写的很详细,想要了解原理的可以参看这篇文章解读作者的论文。
    网络设计的解读,可以参考ESRGAN官方代码解读,了解整个网络结构,我这里截取部分onnx的节点图。
    RDB块

    由于推理网络(就是生成器)是照搬了 ESRGAN 中的 Generator,即使用 Residual-in-residul Dense Block(RRDB),所以代码的复现还是基于ESRGAN 。

    二.模型训练

    我直接跳过了,试了下带不动。

    三.VS2019运行C++预测

    主要参照:
    tensorrtx的代码

    1. 生成二进制序列化权重文件.wts
      1)首先拷贝Real-ESRGAN官方pytorch(python)实现,我是直接下载了它的python代码,
      2)然后安装各种依赖,这两步具体可参考README_CN.md。
      请添加图片描述
    	pip install basicsr
    	pip install facexlib
    	pip install gfpgan
    	pip install -r requirements.txt
    	python setup.py develop
    
    • 1
    • 2
    • 3
    • 4
    • 5

    3)下载权重文件,在Real-ESRGAN官方pytorch(python)新建一个experiments/pretrained_models
    二级目录,并将权重文件拷贝到这里面;将tensorrtx/real-esrgan/目录下的gen_wts.py脚本拷贝到 Real-ESRGAN官方pytorch下并运行

    python gen_wts.py
    
    • 1

    运行成功后会生成一个real-esrgan.wts。

    1. 构建tensorrtx/real-esrgan工程并运行
      1. 转到tensorrtx/real-esrgan/目录下
      2. 修改CMakeLists.txt,主要将第4、5、6、7、8、9、10、11、12、16行改为相关库的目录,Tensorrt建议改用8开头的版本,31、67行根据我了解到的情况暂可不改。(修改 # 标记的地方)
    cmake_minimum_required(VERSION 3.0)
    
    project(real-esrgan) #1
    set(OpenCV_DIR "D:\\opencv\\build")  #2
    set(OpenCV_INCLUDE_DIRS ${OpenCV_DIR}\\include) #3
    set(OpenCV_LIB_DIRS ${OpenCV_DIR}\\x64\\vc15\\lib) #4
    set(OpenCV_Debug_LIBS "opencv_world450d.lib") #5
    set(OpenCV_Release_LIBS "opencv_world450.lib") #6
    set(TRT_DIR "D:\\lbq\\TensorRT-7.2.3.4")  #7
    set(TRT_INCLUDE_DIRS ${TRT_DIR}\\include) #8
    set(TRT_LIB_DIRS ${TRT_DIR}\\lib) #9
    set(Dirent_INCLUDE_DIRS "D:\\lbq\\dirent\\include") #10
    
    add_definitions(-std=c++14)
    
    set(CUDA_BIN_PATH C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.5)
    option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
    set(CMAKE_CXX_STANDARD 14)
    set(CMAKE_BUILD_TYPE Release)
    
    set(THREADS_PREFER_PTHREAD_FLAG ON)
    find_package(Threads)
    
    # setup CUDA
    find_package(CUDA REQUIRED)
    message(STATUS "    libraries: ${CUDA_LIBRARIES}")
    message(STATUS "    include path: ${CUDA_INCLUDE_DIRS}")
    
    include_directories(${CUDA_INCLUDE_DIRS})
    
    set(CUDA_NVCC_PLAGS ${CUDA_NVCC_PLAGS};-std=c++14; -g; -G;-gencode; arch=compute_86;code=sm_86)
    ####
    enable_language(CUDA)  # add this line, then no need to setup cuda path in vs
    ####
    include_directories(${PROJECT_SOURCE_DIR}/include) #14
    include_directories(${TRT_INCLUDE_DIRS}) #12
    link_directories(${TRT_LIB_DIRS}) #13
    include_directories(${OpenCV_INCLUDE_DIRS}) #14
    link_directories(${OpenCV_LIB_DIRS}) #15
    include_directories(${Dirent_INCLUDE_DIRS}) #16
    
    
    # -D_MWAITXINTRIN_H_INCLUDED for solving error: identifier "__builtin_ia32_mwaitx" is undefined
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14 -Wall -Ofast -D_MWAITXINTRIN_H_INCLUDED")
    
    # setup opencv
    find_package(OpenCV QUIET
        NO_MODULE
        NO_DEFAULT_PATH
        NO_CMAKE_PATH
        NO_CMAKE_ENVIRONMENT_PATH
        NO_SYSTEM_ENVIRONMENT_PATH
        NO_CMAKE_PACKAGE_REGISTRY
        NO_CMAKE_BUILDS_PATH
        NO_CMAKE_SYSTEM_PATH
        NO_CMAKE_SYSTEM_PACKAGE_REGISTRY
    )
    
    message(STATUS "OpenCV library status:")
    message(STATUS "    version: ${OpenCV_VERSION}")
    message(STATUS "    lib path: ${OpenCV_LIB_DIRS}")
    message(STATUS "    Debug libraries: ${OpenCV_Debug_LIBS}")
    message(STATUS "    Release libraries: ${OpenCV_Release_LIBS}")
    message(STATUS "    include path: ${OpenCV_INCLUDE_DIRS}")
    
    if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
    set(CMAKE_CUDA_ARCHITECTURES 86)
    endif(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
    
    add_executable(real-esrgan ${PROJECT_SOURCE_DIR}/real-esrgan.cpp ${PROJECT_SOURCE_DIR}/common.hpp 
    	${PROJECT_SOURCE_DIR}/preprocess.cu ${PROJECT_SOURCE_DIR}/preprocess.hpp
    	${PROJECT_SOURCE_DIR}/postprocess.cu ${PROJECT_SOURCE_DIR}/postprocess.hpp
    	)   #17
    
    target_link_libraries(real-esrgan "nvinfer" "nvinfer_plugin") #18
    target_link_libraries(real-esrgan debug ${OpenCV_Debug_LIBS}) #19
    target_link_libraries(real-esrgan optimized ${OpenCV_Release_LIBS}) #20
    target_link_libraries(real-esrgan ${CUDA_LIBRARIES}) #21
    target_link_libraries(real-esrgan Threads::Threads)  
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    1. 用cmake-gui打开代码,自己适配下版本,步骤跟yolov5类似。第一行填入Real_ESRGAN_TRT 目录,第三行填入Real_ESRGAN_TRT/buildnew目录。然后就是点击’Configure’、‘Generate’、‘Open Project’ 编译、生成、打开工程
      请添加图片描述
      请添加图片描述
      请添加图片描述

    2. 右击real-esrgan,点击“设为启动项目”,将调试模式全部调成”Release”
      请添加图片描述

    3. 接着点击“生成”-> “生成解决方案”,会生成对应的real-esrgan.exe
      请添加图片描述

    4. 将第一步中的real-esrgan.wts拷贝到这个Release目录下,将tensorrt中相关的dll文件如nvinfer.dll等也拷贝(软链接)到该目录下。

    5. 打开命令行窗口,执行real-esrgan.exe -s real-esrgan.wts real-esrgan_f32.engine ,生成对应的engine,这一步可能会花半个小时的时间,具体要看GPU。
      请添加图片描述

    6. 最后将 -d real-esrgan_f32.engine …/samples 写到VS的命令参数里,注意一定要索引到正确的目录,第一个是engine 文件所在的目录,第二个是图像所在的目录,就可以直接按F5执行程序了,图像涉密这里我就不贴出来了。请添加图片描述

    附一.VS2019运行C++预测

    • 输入尺寸INPUT_H、INPUT_W、INPUT_C,其中前两项是可以根据实际需求做调整
    • GPU id(DEVICE),在第9行,目前可能只支持单显卡
    • BATCH_SIZE,在第10行,每次推理多少张图片
    • PRECISION_MODE,推理精度,在第14行,精度越低效果越差速度越快
    • VISUALIZATION,推理可视化,在第15行.

    我发现对特定的图像修复还是要引用一个传统的图像修复算法做前处理,于是封装了四个传统图像增强方法在代码里,有兴趣的可以和我交流。
    ~~以下方式二选一,在代码257~273行;需要注意第262行是采用了传统的图像增强的方法,这一步对细胞的改善也是至关重要的;代码里一共集成了四种常用的传统的图像增强的方法均在utils.h文件里,里面的一些参数需要调节,目前采用的是伽马变换(gamma_transform)

    • 静态图输入,指的是固定了输入图像的尺度(C、W、H),不可以用其它尺度的图片来推理,
    • 动态图输入,即支持多种尺度的输入,目前输出结果有毛边(还未处理);~~

    附二.real-esrgan代码解读

    #include "cuda_utils.h"
    #include "common.hpp"		
    #include "preprocess.hpp"	// preprocess plugin 
    #include "postprocess.hpp"	// postprocess plugin 
    #include "logging.h"
    #include "utils.h"
    #include 	//access()
    
    #define DEVICE 0  // GPU id
    #define BATCH_SIZE 1
    #define MAX_IMAGE_INPUT_SIZE_THRESH 4096 * 4096 // ensure it exceed the maximum size in the input images !
    
    // stuff we know about the network and the input/output blobs
    static const int PRECISION_MODE = 32; // fp32 : 32, fp16 : 16
    static const bool VISUALIZATION = false;
    static const int INPUT_H = 1024;
    static const int INPUT_W = 1024;
    static const int INPUT_C = 3;
    static const int OUT_SCALE = 4;
    static const int OUTPUT_SIZE = INPUT_C * INPUT_H * OUT_SCALE * INPUT_W * OUT_SCALE;
    const char* INPUT_BLOB_NAME = "data";
    const char* OUTPUT_BLOB_NAME = "prob";
    static Logger gLogger;
    
    // Creat the engine using only the API and not any parser.	作者存手敲API的写法,没有用parser
    ICudaEngine* build_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, std::string& wts_name) {
    	INetworkDefinition* network = builder->createNetworkV2(0U);//#定义网络
    
    	// Create input tensor of shape {INPUT_H, INPUT_W, INPUT_C} with name INPUT_BLOB_NAME
    	ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ INPUT_H, INPUT_W, INPUT_C });
    	assert(data);
    	// tensorrtx项目加载.wts权重图的通用函数
    	std::map<std::string, Weights> weightMap = loadWeights(wts_name);
    
    	// 前处理,Custom preprocess (NHWC->NCHW, BGR->RGB, [0, 255]->[0, 1](Normalize))
    	Preprocess preprocess{ maxBatchSize, INPUT_C, INPUT_H, INPUT_W };
    
    	// TensorRT Plugin:	https://zhuanlan.zhihu.com/p/448241566
    	// 注册PluginCreator, 前处理的plugin,tensorrtx项目里很多模型都用这个前处理plugin
    	IPluginCreator* preprocess_creator = getPluginRegistry()->getPluginCreator("preprocess", "1");
    	// 创建自定义层类型的对象并返回
    	IPluginV2* preprocess_plugin = preprocess_creator->createPlugin("preprocess_plugin", (PluginFieldCollection*)&preprocess);
    	// Add a plugin layer to the network using the IPluginV2 interface
    	IPluginV2Layer* preprocess_layer = network->addPluginV2(&data, 1, *preprocess_plugin);
    	// https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_network_definition.html#a0c670938a4aef867545f41b65c52cd93
    	preprocess_layer->setName("preprocess_layer");
    	ITensor* prep = preprocess_layer->getOutput(0);
    
    	// 以下是整个RRDBNet生成器/推理网络的结构,可以参照wts/onnx模型文件查看具体的网络节点
    	// conv_first, 第一个卷积层(特征图生成), 输入tensor是*prep, 输出64通道,kernel大小是3x3,后面是权重和偏置的值
    	IConvolutionLayer* conv_first = network->addConvolutionNd(*prep, 64, DimsHW{ 3, 3 }, weightMap["conv_first.weight"], weightMap["conv_first.bias"]);
    	conv_first->setStrideNd(DimsHW{ 1, 1 });
    	conv_first->setPaddingNd(DimsHW{ 1, 1 });
    	conv_first->setName("conv_first");
    	ITensor* feat = conv_first->getOutput(0);
    
    	// conv_body, https://www.cnblogs.com/carsonzhu/p/10967369.html
    	// inference_realesrgan.py 
    	ITensor* body_feat = RRDB(network, weightMap, feat, "body.0");
    	// https://blog.csdn.net/qq_39751446/article/details/119970924
    	for (int idx = 1; idx < 23; idx++) { //num_block=23
    		// RRDB的代码编写参考了
    		body_feat = RRDB(network, weightMap, body_feat, "body." + std::to_string(idx));
    	}
    	// ****************此处为RRDB结构截至处****************
    	IConvolutionLayer* conv_body = network->addConvolutionNd(*body_feat, 64, DimsHW{ 3, 3 }, weightMap["conv_body.weight"], weightMap["conv_body.bias"]);
    	conv_body->setStrideNd(DimsHW{ 1, 1 });
    	conv_body->setPaddingNd(DimsHW{ 1, 1 });
    	IElementWiseLayer* ew1 = network->addElementWise(*feat, *conv_body->getOutput(0), ElementWiseOperation::kSUM);
    	feat = ew1->getOutput(0);
    
    	//	upsample, onnx图从最后一个Add后的Resize算子开始
    	//	添加一个resize网络层,使用线性插值的方法,使得输出数据的尺寸是输入数据尺寸的2倍
    	IResizeLayer* interpolate_nearest = network->addResize(*feat);
    	float sclaes1[] = { 1, 2, 2 };	// 需要制定channel,heigh和widht三个通道的缩放比例
    	interpolate_nearest->setScales(sclaes1, 3);
    	interpolate_nearest->setResizeMode(ResizeMode::kNEAREST);//kLINEAR
    
    	IConvolutionLayer* conv_up1 = network->addConvolutionNd(*interpolate_nearest->getOutput(0), 64, DimsHW{ 3, 3 }, weightMap["conv_up1.weight"], weightMap["conv_up1.bias"]);
    	conv_up1->setStrideNd(DimsHW{ 1, 1 });
    	conv_up1->setPaddingNd(DimsHW{ 1, 1 });
    	IActivationLayer* leaky_relu_1 = network->addActivation(*conv_up1->getOutput(0), ActivationType::kLEAKY_RELU);
    	leaky_relu_1->setAlpha(0.2);
    
    	IResizeLayer* interpolate_nearest2 = network->addResize(*leaky_relu_1->getOutput(0));
    	float sclaes2[] = { 1, 2, 2 };
    	interpolate_nearest2->setScales(sclaes2, 3);
    	interpolate_nearest2->setResizeMode(ResizeMode::kNEAREST);
    	IConvolutionLayer* conv_up2 = network->addConvolutionNd(*interpolate_nearest2->getOutput(0), 64, DimsHW{ 3, 3 }, weightMap["conv_up2.weight"], weightMap["conv_up2.bias"]);
    	conv_up2->setStrideNd(DimsHW{ 1, 1 });
    	conv_up2->setPaddingNd(DimsHW{ 1, 1 });
    	IActivationLayer* leaky_relu_2 = network->addActivation(*conv_up2->getOutput(0), ActivationType::kLEAKY_RELU);
    	leaky_relu_2->setAlpha(0.2);
    
    	IConvolutionLayer* conv_hr = network->addConvolutionNd(*leaky_relu_2->getOutput(0), 64, DimsHW{ 3, 3 }, weightMap["conv_hr.weight"], weightMap["conv_hr.bias"]);
    	conv_hr->setStrideNd(DimsHW{ 1, 1 });
    	conv_hr->setPaddingNd(DimsHW{ 1, 1 });
    	IActivationLayer* leaky_relu_hr = network->addActivation(*conv_hr->getOutput(0), ActivationType::kLEAKY_RELU);
    	leaky_relu_hr->setAlpha(0.2);
    	IConvolutionLayer* conv_last = network->addConvolutionNd(*leaky_relu_hr->getOutput(0), 3, DimsHW{ 3, 3 }, weightMap["conv_last.weight"], weightMap["conv_last.bias"]);
    	conv_last->setStrideNd(DimsHW{ 1, 1 });
    	conv_last->setPaddingNd(DimsHW{ 1, 1 });
    	ITensor* out = conv_last->getOutput(0);
    
    	//	后处理,Custom postprocess (RGB -> BGR, NCHW->NHWC, *255, ROUND, uint8)
    	Postprocess postprocess{ maxBatchSize, out->getDimensions().d[0], out->getDimensions().d[1], out->getDimensions().d[2] };
    	IPluginCreator* postprocess_creator = getPluginRegistry()->getPluginCreator("postprocess", "1");
    	IPluginV2* postprocess_plugin = postprocess_creator->createPlugin("postprocess_plugin", (PluginFieldCollection*)&postprocess);
    	IPluginV2Layer* postprocess_layer = network->addPluginV2(&out, 1, *postprocess_plugin);
    	postprocess_layer->setName("postprocess_layer");
    
    	ITensor* final_tensor = postprocess_layer->getOutput(0);
    	final_tensor->setName(OUTPUT_BLOB_NAME);
    	network->markOutput(*final_tensor); //网络输出
    
    	// Build engine
    	builder->setMaxBatchSize(maxBatchSize);
    	config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB,左移20位 = 16 * 1 * (2^20)
    
    	if (PRECISION_MODE == 16) {
    		std::cout << "==== precision f16 ====" << std::endl << std::endl;
    		config->setFlag(BuilderFlag::kFP16);
    	}
    	else {
    		std::cout << "==== precision f32 ====" << std::endl << std::endl;
    	}
    
    	std::cout << "Building engine, please wait for a while..." << std::endl;
    	ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
    	std::cout << "Build engine successfully!" << std::endl;
    
    	// Don't need the network any more
    	delete network;
    
    	// Release host memory
    	for (auto& mem : weightMap)
    	{
    		free((void*)(mem.second.values));
    	}
    
    	return engine;
    }
    
    void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream, std::string& wts_name) {
    	// Create builder,	#创建一个build初始化tensorRT的库
    	IBuilder* builder = createInferBuilder(gLogger);
    	// 构造 CudaEngine 的配置参数,可添加 IOptimizationProfile 配置,设置最大工作内存空间、最大Batch大小、最小可接受精度级别、半浮点精度运算等
    	IBuilderConfig* config = builder->createBuilderConfig();
    
    	// Create model to populate the network, then set the outputs and create an engine
    	ICudaEngine* engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, wts_name);
    
    	assert(engine != nullptr);
    
    	// Serialize the engine
    	(*modelStream) = engine->serialize();
    
    	// Close everything down
    	delete engine;
    	delete builder;
    	delete config;
    }
    
    void doInference(IExecutionContext& context, cudaStream_t& stream, void** buffers, uint8_t* output, int batchSize) {
    	// infer on the batch asynchronously, and DMA output back to host
    	context.enqueue(batchSize, buffers, stream, nullptr);
    	CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(uint8_t), cudaMemcpyDeviceToHost, stream));
    	cudaStreamSynchronize(stream);
    }
    
    bool parse_args(int argc, char** argv, std::string& wts, std::string& engine, std::string& img_dir) {
    	if (argc < 4) return false;
    	if (std::string(argv[1]) == "-s" && argc == 4) {
    		wts = std::string(argv[2]);
    		engine = std::string(argv[3]);
    	}
    	else if (std::string(argv[1]) == "-d" && argc == 4) {
    		engine = std::string(argv[2]);
    		img_dir = std::string(argv[3]);
    	}
    	else {
    		return false;
    	}
    	return true;
    }
    
    // ./real-esrgan -s ./real-esrgan.wts ./real-esrgan_f32.engine
    // ./real-esrgan -d ./real-esrgan_f32.engine ../samples
    
    int main(int argc, char** argv) {
    	std::string wts_name = "";
    	std::string engine_name = "";
    	std::string img_dir;
    	if (!parse_args(argc, argv, wts_name, engine_name, img_dir)) {
    		std::cerr << "arguments not right!" << std::endl;
    		std::cerr << "./real-esrgan -s [.wts] [.engine] // serialize model to plan file" << std::endl;
    		std::cerr << "./real-esrgan -d [.engine] ../samples  // deserialize plan file and run inference" << std::endl;
    		return -1;
    	}
    
    	// create a model using the API directly and serialize it to a stream
    	if (!wts_name.empty()) {
    		IHostMemory* modelStream{ nullptr };
    		APIToModel(BATCH_SIZE, &modelStream, wts_name);
    		assert(modelStream != nullptr);
    		std::ofstream p(engine_name, std::ios::binary);
    		if (!p) {
    			std::cerr << "could not open plan output file" << std::endl;
    			return -1;
    		}
    		p.write(reinterpret_cast<const char*>(modelStream->data()), modelStream->size());
    		delete modelStream;
    		return 0;
    	}
    
    	// deserialize the .engine and run inference
    	std::ifstream file(engine_name, std::ios::binary);
    	if (!file.good()) {
    		std::cerr << "read " << engine_name << " error!" << std::endl;
    		return -1;
    	}
    	char* trtModelStream = nullptr;
    	size_t size = 0;
    	file.seekg(0, file.end);
    	size = file.tellg();
    	file.seekg(0, file.beg);
    	trtModelStream = new char[size];
    	assert(trtModelStream);
    	file.read(trtModelStream, size);
    	file.close();
    
    	std::vector<std::string> file_names;
    	std::cout << "img_dir:" << img_dir.c_str() << std::endl;
    	if (read_files_in_dir(img_dir.c_str(), file_names) < 0) {
    		std::cerr << "read_files_in_dir failed." << std::endl;
    		return -1;
    	}
    
    	IRuntime* runtime = createInferRuntime(gLogger);
    	assert(runtime != nullptr);
    	ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
    	assert(engine != nullptr);
    	IExecutionContext* context = engine->createExecutionContext();
    	assert(context != nullptr);
    	delete[] trtModelStream;
    	assert(engine->getNbBindings() == 2);
    	void* buffers[2];
    	// In order to bind the buffers, we need to know the names of the input and output tensors.
    	// Note that indices are guaranteed to be less than IEngine::getNbBindings()
    	const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
    	const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
    	assert(inputIndex == 0);
    	assert(outputIndex == 1);
    
    	// Create GPU buffers on device	
    	CUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * INPUT_C * INPUT_H * INPUT_W * sizeof(uint8_t)));
    	CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(uint8_t)));
    
    	std::vector<uint8_t> input(BATCH_SIZE * INPUT_H * INPUT_W * INPUT_C);
    	std::vector<uint8_t> outputs(BATCH_SIZE * OUTPUT_SIZE);
    
    	// Create stream
    	cudaStream_t stream;
    	CUDA_CHECK(cudaStreamCreate(&stream));
    
    	std::vector<cv::Mat> imgs_buffer(BATCH_SIZE);
    	for (int f = 0; f < (int)file_names.size(); f++) {
    		cv::Mat re_img;
    		for (int b = 0; b < BATCH_SIZE; b++) {
    			cv::Mat img = cv::imread(img_dir + "/" + file_names[f]);
    			if (img.empty()) continue;
    
    			//  以下两种方法二选一
    			//  1.仅静态图输入,固定了输入图像可接受的C、W、H
    			//  memcpy(input.data() + b * INPUT_H * INPUT_W * INPUT_C, img.data, INPUT_H * INPUT_W * INPUT_C);
    
    			//  2.支持多种尺度的输入
    			cv::Mat traditional_enhance_img = gamma_transform(img);
    			cv::Mat pr_img;
    			std::pair<cv::Mat, cv::Mat> preprocess_rst;
    			if (img.cols != INPUT_W && img.cols != INPUT_H)
    			{
    				preprocess_rst = preprocess_img(traditional_enhance_img, INPUT_W, INPUT_H);//等比例填充
    				// std::cout << "img_dir:" << std::endl;
    				pr_img = preprocess_rst.first;
    				re_img = preprocess_rst.second;//对比图象
    			}
    			else
    			{
    				pr_img = traditional_enhance_img;
    			}
    
    			memcpy(input.data() + b * INPUT_H * INPUT_W * INPUT_C, pr_img.data, INPUT_H * INPUT_W * INPUT_C);
    		}
    
    		CUDA_CHECK(cudaMemcpyAsync(buffers[inputIndex], input.data(), BATCH_SIZE * INPUT_C * INPUT_H * INPUT_W * sizeof(uint8_t), cudaMemcpyHostToDevice, stream));
    
    		// Run inference
    		auto start = std::chrono::system_clock::now();
    		doInference(*context, stream, (void**)buffers, outputs.data(), BATCH_SIZE);
    		auto end = std::chrono::system_clock::now();
    		std::cout << "inference time: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
    		cv::Mat frame = cv::Mat(INPUT_H * OUT_SCALE, INPUT_W * OUT_SCALE, CV_8UC3, outputs.data());
    
    		//	去除非标准的图片产生的填充
    		int dif_h = 0;
    		int dif_w = 0;
    		if (re_img.cols == INPUT_H && re_img.rows != INPUT_W)
    		{
    			dif_w = (INPUT_W - re_img.rows) * 2;
    		}
    		else if (re_img.cols != INPUT_H && re_img.rows == INPUT_W)
    		{
    			dif_h = (INPUT_H - re_img.rows) * 2;
    		}
    		//	std::cout << dif_w << dif_h << std::endl;
    		cv::Mat result = frame(cv::Rect(dif_h, dif_w, INPUT_H * OUT_SCALE - 2 * dif_h, INPUT_W * OUT_SCALE - 2 * dif_w));
    		cv::imwrite("../_" + file_names[f], result);
    
    		if (VISUALIZATION) {
    			cv::imshow("result : " + file_names[0], frame);
    			cv::waitKey(0);
    		}
    	}
    
    	// Release stream and buffers
    	cudaStreamDestroy(stream);
    	CUDA_CHECK(cudaFree(buffers[inputIndex]));
    	CUDA_CHECK(cudaFree(buffers[outputIndex]));
    	// Destroy the engine
    	delete context;
    	delete engine;
    	delete runtime;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153
    • 154
    • 155
    • 156
    • 157
    • 158
    • 159
    • 160
    • 161
    • 162
    • 163
    • 164
    • 165
    • 166
    • 167
    • 168
    • 169
    • 170
    • 171
    • 172
    • 173
    • 174
    • 175
    • 176
    • 177
    • 178
    • 179
    • 180
    • 181
    • 182
    • 183
    • 184
    • 185
    • 186
    • 187
    • 188
    • 189
    • 190
    • 191
    • 192
    • 193
    • 194
    • 195
    • 196
    • 197
    • 198
    • 199
    • 200
    • 201
    • 202
    • 203
    • 204
    • 205
    • 206
    • 207
    • 208
    • 209
    • 210
    • 211
    • 212
    • 213
    • 214
    • 215
    • 216
    • 217
    • 218
    • 219
    • 220
    • 221
    • 222
    • 223
    • 224
    • 225
    • 226
    • 227
    • 228
    • 229
    • 230
    • 231
    • 232
    • 233
    • 234
    • 235
    • 236
    • 237
    • 238
    • 239
    • 240
    • 241
    • 242
    • 243
    • 244
    • 245
    • 246
    • 247
    • 248
    • 249
    • 250
    • 251
    • 252
    • 253
    • 254
    • 255
    • 256
    • 257
    • 258
    • 259
    • 260
    • 261
    • 262
    • 263
    • 264
    • 265
    • 266
    • 267
    • 268
    • 269
    • 270
    • 271
    • 272
    • 273
    • 274
    • 275
    • 276
    • 277
    • 278
    • 279
    • 280
    • 281
    • 282
    • 283
    • 284
    • 285
    • 286
    • 287
    • 288
    • 289
    • 290
    • 291
    • 292
    • 293
    • 294
    • 295
    • 296
    • 297
    • 298
    • 299
    • 300
    • 301
    • 302
    • 303
    • 304
    • 305
    • 306
    • 307
    • 308
    • 309
    • 310
    • 311
    • 312
    • 313
    • 314
    • 315
    • 316
    • 317
    • 318
    • 319
    • 320
    • 321
    • 322
    • 323
    • 324
    • 325
    • 326
    • 327
    • 328
    • 329
    • 330
    • 331
    • 332
    • 333
    • 334
  • 相关阅读:
    Invisible Backdoor Attack with Sample-Specific Triggers 论文笔记
    专业的ADAS测试记录仪ETHOS
    DispatcherServlet是如何进行初始化的呢?
    382.链表随机结点 | 398.随机数索引
    3D开发工具HOOPS助力CAM软件优化制造流程
    利用idea新创建maven项目时的一些基本配置
    21天打卡挑战学习MySQL—Day
    性能测试 —— 生成html测试报告、参数化、jvm监控
    SpringBoot集成WebSocket
    并发编程之CompletableFuture全网最细最全用法(一)
  • 原文地址:https://blog.csdn.net/qq_33642342/article/details/125369196