python - 在 TensorRT 上添加多重推理（无效资源句柄错误）

Question

我正在尝试使用 Jetson Nano 在管道中运行两个推理。第一个推断是使用 MobileNet 和 TensorRT 进行对象检测。我的第一个推理代码几乎是从AastaNV/TRT_Obj_Detection存储库中复制的。唯一的区别是我更改了该代码，使其驻留在类 Inference1 中。

第二个推理作业使用第一个推理的输出来运行进一步的分析。对于这个推断，我使用自定义模型使用 tensorflow（不是 TensorRT，但我假设它是在后端调用的？）。该模型是从.pb文件（冻结图）加载的。加载后，通过调用session.run()tensorflow 命令执行推理。

如果我只运行 Inference1 或 ONLY Inference2，则代码可以正常运行而不会出现任何错误。但是，当我通过管道传输它们时，我得到了错误[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 33 (invalid resource handle)

从我在日志中看到的内容来看，TensorRT 序列化图的加载没有任何问题。TensorFlow 也被导入，它可以识别我的 GPU。通过我在互联网上的搜索，我发现这个问题可能与 CUDA 上下文有关？因此，我将在下面展示我如何在下面的代码中设置 CUDA 上下文。在 Inference1 类的create_cuda_context初始化期间仅调用一次。run_inference_for_single_image每次迭代都会调用。

代码：

def create_cuda_context(self):
    self.host_inputs, self.host_outputs = [], []
    self.cuda_inputs, self.cuda_outputs = [], []
    self.bindings = []
    self.stream = cuda.Stream()

    for binding in self.engine:
        size = trt.volume(self.engine.get_binding_shape(binding)) * self.engine.max_batch_size
        host_mem = cuda.pagelocked_empty(size, np.float32)
        cuda_mem = cuda.mem_alloc(host_mem.nbytes)

        self.bindings.append(int(cuda_mem))
        if self.engine.binding_is_input(binding):
            self.host_inputs.append(host_mem)
            self.cuda_inputs.append(cuda_mem)
        else:
            self.host_outputs.append(host_mem)
            self.cuda_outputs.append(cuda_mem)
    self.context = self.engine.create_execution_context()

def run_inference_for_single_image(self, image):
    ''' Copies the image (already raveled) input into GPU memory, performs the forward pass
    and copies the result back to CPU memory
    '''
    np.copyto(self.host_inputs[0], image)
    cuda.memcpy_htod_async(self.cuda_inputs[0], self.host_inputs[0], self.stream)
    self.context.execute_async(bindings=self.bindings, stream_handle=self.stream.handle)
    cuda.memcpy_dtoh_async(self.host_outputs[1], self.cuda_outputs[1], self.stream)
    cuda.memcpy_dtoh_async(self.host_outputs[0], self.cuda_outputs[0], self.stream)
    self.stream.synchronize()
    return self.host_outputs[0]

日志：

WARNING:tensorflow:From /usr/lib/python3.6/dist-packages/graphsurgeon/DynamicGraph.py:4: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

[TensorRT] INFO: Glob Size is 14049908 bytes.
[TensorRT] INFO: Added linear block of size 5760000
[TensorRT] INFO: Added linear block of size 2880000
[TensorRT] INFO: Added linear block of size 409600
[TensorRT] INFO: Added linear block of size 218624
[TensorRT] INFO: Added linear block of size 61440
[TensorRT] INFO: Added linear block of size 57344
[TensorRT] INFO: Added linear block of size 30720
[TensorRT] INFO: Added linear block of size 20992
[TensorRT] INFO: Added linear block of size 9728
[TensorRT] INFO: Added linear block of size 9216
[TensorRT] INFO: Added linear block of size 2560
[TensorRT] INFO: Added linear block of size 2560
[TensorRT] INFO: Added linear block of size 1024
[TensorRT] INFO: Added linear block of size 512
[TensorRT] INFO: Found Creator FlattenConcat_TRT
[TensorRT] INFO: Found Creator GridAnchor_TRT
[TensorRT] INFO: Found Creator FlattenConcat_TRT
[TensorRT] INFO: Found Creator NMS_TRT
[TensorRT] INFO: Deserialize required 5159079 microseconds.
Infering on input.mp4
WARNING:tensorflow:From /home/user/Desktop/SVM_TensorRT/deep_sort/tools/generate_detections.py:75: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2018-01-29 02:01:38.254282: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2018-01-29 02:01:38.286962: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2018-01-29 02:01:38.287300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
2018-01-29 02:01:38.287552: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2018-01-29 02:01:38.287744: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2018-01-29 02:01:38.287983: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2018-01-29 02:01:38.288201: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2018-01-29 02:01:38.415478: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2018-01-29 02:01:38.484010: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2018-01-29 02:01:38.484668: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2018-01-29 02:01:38.485343: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2018-01-29 02:01:38.486009: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2018-01-29 02:01:38.486286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2018-01-29 02:01:38.665379: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2018-01-29 02:01:38.682935: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x24f9ea50 executing computations on platform Host. Devices:
2018-01-29 02:01:38.683009: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2018-01-29 02:01:38.764975: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2018-01-29 02:01:38.765291: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x572614c0 executing computations on platform CUDA. Devices:
2018-01-29 02:01:38.765349: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2018-01-29 02:01:38.766014: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2018-01-29 02:01:38.766158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
2018-01-29 02:01:38.766716: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2018-01-29 02:01:38.766814: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2018-01-29 02:01:38.766879: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2018-01-29 02:01:38.767002: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2018-01-29 02:01:38.767174: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2018-01-29 02:01:38.767311: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2018-01-29 02:01:38.767423: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2018-01-29 02:01:38.767731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2018-01-29 02:01:38.768049: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2018-01-29 02:01:38.768136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2018-01-29 02:01:38.783718: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2018-01-29 02:01:41.046094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-01-29 02:01:41.046260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2018-01-29 02:01:41.046311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2018-01-29 02:01:41.054160: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2018-01-29 02:01:41.054730: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2018-01-29 02:01:41.112041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 85 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
WARNING:tensorflow:From /home/user/Desktop/SVM_TensorRT/deep_sort/tools/generate_detections.py:76: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /home/user/Desktop/SVM_TensorRT/deep_sort/tools/generate_detections.py:80: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

[TensorRT] ERROR: CUDA cask failure at execution for trt_maxwell_scudnn_128x32_relu_small_nn_v1.
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 33 (invalid resource handle)
[TensorRT] ERROR: cuda/caskConvolutionLayer.cpp (355) - Cuda Error in execute: 33 (invalid resource handle)

score 3 · Accepted Answer

我相信您尝试运行的两个模型都尝试创建 CUDA 上下文。第一个直接从 TensorRT 库初始化 CUDA 上下文，而第二个在 Tensorflow 中初始化新的 CUDA 上下文。当第一个模型尝试执行推理时，它将使用错误的 CUDA 上下文，从而导致该错误。

如果您对两个模型使用相同的 TensorRT、Tensorflow（或其他 CUDA 库），则控制 CUDA 上下文会容易得多。根据我的经验，Tensorflow 和直接 CUDA 并没有很好地配合。

我建议您将两个模型分成不同的线程。这将确保 TensorRT 和 Tensorflow 都创建和使用自己不同的 CUDA 上下文...（假设您没有遇到 OOM 问题。我曾经尝试使用 SSD+MobileNetV2 进行对象检测，使用另一个 MobileNet 进行更多分类检测Jetson Nano 上的对象。我面对 OOM 并最终在 CPU 上运行第二个模型）。

python - 在 TensorRT 上添加多重推理（无效资源句柄错误）

1 回答 1

Related

Reference