Table Of Contents
This sample, sampleCudla, uses an API to construct a network of a single ElementWise layer and builds the engine. The engine runs in DLA standalone mode using cuDLA runtime. In order to do that, the sample uses cuDLA APIs to do engine conversion and cuDLA runtime preparation, as well as inference.
After the construction of a network, the module with cuDLA is loaded from the network data. The input and output tensors are then allocated and registered with cuDLA. When the input tensors are copied from CPU to GPU, the cuDLA task can be submitted and executed. Then we wait for stream operations to finish and bring output buffer to CPU to be verified for correctness.
Specifically:
cudlaCreateDevice
is called to create DLA device.cudlaModuleLoadFromMemory
is called to load the engine memory for DLA use.cudaMalloc
and cudlaMemRegister
are called to first allocate memory on GPU, then let the CUDA pointer be registered with the DLA.cudlaModuleGetAttributes
is called to get module attributes from the loaded module.cudlaSubmitTask
is called to submit the inference task.In this sample, the ElementWise layer is used. For more information, see the TensorRT Developer Guide: Layers documentation.
This sample needs to be compiled with macro ENABLE_DLA=1
, otherwise, this sample will print the following error message:
Unsupported platform, please make sure it is running on aarch64, QNX or android.
and quit.
Compile this sample by running make
in the
directory. The binary named sample_cudla
will be created in the
directory.
cd
Where `` is where you installed TensorRT.
Run the sample to perform inference on DLA.
./sample_cudla
Verify that the sample ran successfully. If the sample runs successfully you should see an output similar to the following:
&&&& RUNNING TensorRT.sample_cudla # ./sample_cudla [I] [TRT] [I] [TRT] --------------- Layers running on DLA: [I] [TRT] [DlaLayer] {ForeignNode[(Unnamed Layer* 0) [ElementWise]]}, [I] [TRT] --------------- Layers running on GPU: [I] [TRT] …(omit messages) &&&& PASSED TensorRT.sample_cudla
This output shows that the sample ran successfully; `PASSED`.
--help
optionsTo see the full list of available options and their descriptions, use the ./sample_cudla -h
command line option.
The following resources provide a deeper understanding of sampleCudla.
Documentation
For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.
June 2022
This is the first release of the README.md
file.
There are no known issues with this tool.