cuda - 内核启动后，Nvidia 视觉分析器未显示 cudaMalloc()

Question

我正在尝试编写一个几乎完全在 GPU 上运行的程序（与主机的交互很少）。initKernel是从主机启动的第一个内核。我使用动态并行性从启动连续内核initKernel，其中两个是thrust::sort(thrust::device,...).

在启动之前initKernel，我cudaMalloc()在主机代码上做了一个，它显示在可视分析器的运行时 API中。Visual profiler的Runtime API中没有cudaMalloc显示出现在__device__函数和后续内核中的 s（在启动之后initKernel）。有人可以帮我理解为什么我在 Visual Profiler中看不到s 吗？cudaMalloc

感谢您的时间。

score 2 · Accepted Answer

有人可以帮我理解为什么我在 Visual Profiler 中看不到 cudaMallocs 吗？

因为它是该工具的文档化限制。从文档中：

Visual Profiler 时间线不显示从设备启动的内核中调用的 CUDA API 调用。

cuda - 内核启动后，Nvidia 视觉分析器未显示 cudaMalloc()

1 回答 1

Related

Reference