cuda - CUDA 的调用图

Question

我正在尝试为我在埃及的 CUDA 中拥有的代码生成调用图，但通常的方法似乎不起作用（因为 nvcc 没有任何标志可以做与 -fdump-rtl-expand 相同的事情） .

更多细节：

我有一个非常大的代码（我不是作者），它跨越多个 .cu 文件，如果我有一个调用图，我会更容易理解它在做什么。

我敢打赌，这个问题的答案也对其他人有用。

关于如何使用 cuda (.cu) 文件完成此操作的任何想法？

score 1 · Accepted Answer

您可以使用clang 3.8 的 CUDA 支持来做到这一点。

首先，编译您的 CUDA 代码以发出 llvm（安装 CUDA 7.5 的 Windows 上的示例）：

clang++ -c main.cu --cuda-gpu-arch=sm_35 -o main.ll -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include"

然后，使用生成的 ll 来构建调用图opt：

opt.exe main.ll -analyze -dot-callgraph

请注意，这opt不是默认二进制发行版的一部分，您可能需要自己构建它（我有一个 3.7.1 构建，它已经能够管理来自 3.8 的 ll）。

示例 main.cu 文件：

#include <cuda_runtime.h>
__device__ int f() { return 1; }
__device__ float g(float* a) { return a[f()] ; }
__device__ float h() { return 42.0f ; }
__global__ void kernel (int a, float* b)
{
        int c = a + f();
        g(b);
        b[c] = h();
}

生成的点文件：

digraph "Call graph" {
        label="Call graph";

        Node0x1e3d438 [shape=record,label="{external node}"];
        Node0x1e3d438 -> Node0x1e3cfb0;
        Node0x1e3d438 -> Node0x1e3ce48;
        Node0x1e3d438 -> Node0x1e3d0a0;
        Node0x1e3d438 -> Node0x1e3d258;
        Node0x1e3d438 -> Node0x1e3cfd8;
        Node0x1e3d438 -> Node0x1e3ce98;
        Node0x1e3d438 -> Node0x1e3d000;
        Node0x1e3d438 -> Node0x1e3cee8;
        Node0x1e3d438 -> Node0x1e3d078;
        Node0x1e3d000 [shape=record,label="{__cuda_module_ctor}"];
        Node0x1e3d000 -> Node0x1e3ce98;
        Node0x1e3d000 -> Node0x1e3d168;
        Node0x1e3d078 [shape=record,label="{__cuda_module_dtor}"];
        Node0x1e3d078 -> Node0x1e3cee8;
        Node0x1e3cfb0 [shape=record,label="{^A?f@@YAHXZ}"];
        Node0x1e3d0a0 [shape=record,label="{^A?h@@YAMXZ}"];
        Node0x1e3ce48 [shape=record,label="{^A?g@@YAMPEAM@Z}"];
        Node0x1e3ce48 -> Node0x1e3cfb0;
        Node0x1e3d258 [shape=record,label="{^A?kernel@@YAXHPEAM@Z}"];
        Node0x1e3d258 -> Node0x1e3cfb0;
        Node0x1e3d258 -> Node0x1e3ce48;
        Node0x1e3d258 -> Node0x1e3d0a0;
        Node0x1e3d168 [shape=record,label="{__cuda_register_kernels}"];
        Node0x1e3cee8 [shape=record,label="{__cudaUnregisterFatBinary}"];
        Node0x1e3cee8 -> Node0x1e3d528;
        Node0x1e3cfd8 [shape=record,label="{__cudaRegisterFunction}"];
        Node0x1e3cfd8 -> Node0x1e3d528;
        Node0x1e3ce98 [shape=record,label="{__cudaRegisterFatBinary}"];
        Node0x1e3ce98 -> Node0x1e3d528;
}

cuda - CUDA 的调用图

1 回答 1

Related

Reference