0

假设我使用 NSight Systems 来分析我的程序,并创建一个 SQLite 3 数据库,如下所示:

nsys profile -o /path/to/db --export=sqlite /path/to/executable --arg1=val1 --arg2

我现在究竟要做什么来获取我的各种内核调用的执行时间?

4

1 回答 1

1

CUPTI 文档(针对 CUDA 11.2)

3.29。CUpti_ActivityKernel4 结构参考 [CUPTI 活动 API]

此活动记录表示内核执行(CUPTI_ACTIVITY_KIND_KERNEL 和 CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL)。

这些是 SQLite3 输出数据库中的两个表名称。以下是查询它们的方法:

  • 如果您只想要执行时间:
    sqlite3 -csv /path/to/db.sqlite 'SELECT end-start AS duration FROM CUPTI_ACTIVITY_KIND_KERNEL;'
    
  • 如果您还想要(解构的)内核名称,则需要更复杂的 SQL 查询:
    sqlite3 -csv /path/to/db.sqlite 'SELECT names.value AS name, end - start FROM CUPTI_ACTIVITY_KIND_KERNEL AS k JOIN StringIds AS names ON k.demangledName = names.id;'
    

跑步也很有教育意义:

sqlite3 /path/to/db.sqlite 

然后输入

.schema

获取架构中所有表的 SQL 创建命令。这通常如下所示(使用 CUDA 11.2 和 nsys 2020.4.3):

sqlite> .schema
CREATE TABLE StringIds (id INTEGER NOT NULL PRIMARY KEY, value TEXT NOT NULL);
CREATE TABLE ProcessStreams (globalPid INTEGER NOT NULL, filenameId INTEGER NOT NULL, contentId INTEGER NOT NULL);
CREATE TABLE SCHED_EVENTS (start INTEGER NOT NULL, cpu INTEGER NOT NULL, isSchedIn INTEGER NOT NULL, globalTid INTEGER);
CREATE TABLE COMPOSITE_EVENTS (id INTEGER NOT NULL PRIMARY KEY, start INTEGER NOT NULL, cpu INTEGER, threadState INTEGER, globalTid INTEGER, cpuCycles INTEGER NOT NULL);
CREATE TABLE UnwindMethodType (number INTEGER PRIMARY KEY, name TEXT NOT NULL);
CREATE TABLE SAMPLING_CALLCHAINS (id INTEGER NOT NULL REFERENCES COMPOSITE_EVENTS, symbol INTEGER NOT NULL, module INTEGER NOT NULL, kernelMode INTEGER, thumbCode INTEGER, unresolved INTEGER, specialEntry INTEGER, originalIP INTEGER, unwindMethod INTEGER REFERENCES UnwindMethodType(number), stackDepth INTEGER NOT NULL, PRIMARY KEY (id, stackDepth));
CREATE TABLE PROFILER_OVERHEAD (start INTEGER NOT NULL, end INTEGER NOT NULL, globalTid INTEGER, correlationId INTEGER, nameId INTEGER NOT NULL, returnValue INTEGER NOT NULL);
CREATE TABLE OSRT_API (start INTEGER NOT NULL, end INTEGER NOT NULL, eventClass INTEGER NOT NULL, globalTid INTEGER, correlationId INTEGER, nameId INTEGER NOT NULL, returnValue INTEGER NOT NULL, nestingLevel INTEGER, callchainId INTEGER NOT NULL);
CREATE TABLE OSRT_CALLCHAINS (id INTEGER NOT NULL, symbol INTEGER NOT NULL, module INTEGER NOT NULL, kernelMode INTEGER, thumbCode INTEGER, unresolved INTEGER, specialEntry INTEGER, originalIP INTEGER, unwindMethod INTEGER REFERENCES UnwindMethodType(number), stackDepth INTEGER NOT NULL, PRIMARY KEY (id, stackDepth));
CREATE TABLE CUPTI_ACTIVITY_KIND_RUNTIME (start INTEGER NOT NULL, end INTEGER NOT NULL, eventClass INTEGER NOT NULL, globalTid INTEGER, correlationId INTEGER, nameId INTEGER NOT NULL, returnValue INTEGER NOT NULL, callchainId INTEGER REFERENCES CUDA_CALLCHAINS(id));
CREATE TABLE CUPTI_ACTIVITY_KIND_MEMCPY (start INTEGER NOT NULL, end INTEGER NOT NULL, deviceId INTEGER NOT NULL, contextId INTEGER NOT NULL, streamId INTEGER NOT NULL, correlationId INTEGER, globalPid INTEGER, bytes INTEGER NOT NULL, copyKind INTEGER NOT NULL, deprecatedSrcId INTEGER, srcKind INTEGER, dstKind INTEGER, srcDeviceId INTEGER, srcContextId INTEGER, dstDeviceId INTEGER, dstContextId INTEGER, graphNodeId INTEGER);
CREATE TABLE CUPTI_ACTIVITY_KIND_SYNCHRONIZATION (start INTEGER NOT NULL, end INTEGER NOT NULL, deviceId INTEGER NOT NULL, contextId INTEGER NOT NULL, streamId INTEGER NOT NULL, correlationId INTEGER, globalPid INTEGER, syncType INTEGER NOT NULL, eventId INTEGER NOT NULL);
CREATE TABLE CUPTI_ACTIVITY_KIND_KERNEL (start INTEGER NOT NULL, end INTEGER NOT NULL, deviceId INTEGER NOT NULL, contextId INTEGER NOT NULL, streamId INTEGER NOT NULL, correlationId INTEGER, globalPid INTEGER, demangledName INTEGER NOT NULL, shortName INTEGER NOT NULL, launchType INTEGER, cacheConfig INTEGER, registersPerThread INTEGER NOT NULL, gridX INTEGER NOT NULL, gridY INTEGER NOT NULL, gridZ INTEGER NOT NULL, blockX INTEGER NOT NULL, blockY INTEGER NOT NULL, blockZ INTEGER NOT NULL, staticSharedMemory INTEGER NOT NULL, dynamicSharedMemory INTEGER NOT NULL, localMemoryPerThread INTEGER NOT NULL, localMemoryTotal INTEGER NOT NULL, gridId INTEGER NOT NULL, sharedMemoryExecuted INTEGER, graphNodeId INTEGER);
CREATE TABLE ThreadNames (nameId INTEGER NOT NULL, priority INTEGER, globalTid INTEGER);
CREATE TABLE TARGET_INFO_CUDA_GPU (vmId INTEGER NOT NULL, name TEXT NOT NULL, pciBusId TEXT, globalMemoryBandwidth INTEGER NOT NULL, globalMemorySize INTEGER NOT NULL, constantMemorySize INTEGER NOT NULL, l2CacheSize INTEGER NOT NULL, numThreadsPerWarp INTEGER NOT NULL, coreClockRate INTEGER NOT NULL, numMemcpyEngines INTEGER NOT NULL, numMultiprocessors INTEGER NOT NULL, maxIPC INTEGER NOT NULL, maxWarpsPerMultiprocessor INTEGER NOT NULL, maxBlocksPerMultiprocessor INTEGER NOT NULL, maxRegistersPerBlock INTEGER NOT NULL, maxSharedMemoryPerBlock INTEGER NOT NULL, maxThreadsPerBlock INTEGER NOT NULL, maxBlockDimX INTEGER NOT NULL, maxBlockDimY INTEGER NOT NULL, maxBlockDimZ INTEGER NOT NULL, maxGridDimX INTEGER NOT NULL, maxGridDimY INTEGER NOT NULL, maxGridDimZ INTEGER NOT NULL, computeCapabilityMajor INTEGER NOT NULL, computeCapabilityMinor INTEGER NOT NULL, deviceId INTEGER NOT NULL, pid INTEGER, maxSharedMemoryPerMultiprocessor INTEGER, maxRegistersPerMultiprocessor INTEGER);
CREATE TABLE TARGET_INFO_GPU (vmId INTEGER NOT NULL, deviceId INTEGER NOT NULL, name TEXT, busLocation TEXT, isDiscrete INTEGER);
CREATE TABLE TARGET_INFO_CUDA_NULL_STREAM (streamId INTEGER NOT NULL, hwId INTEGER NOT NULL, vmId INTEGER NOT NULL, processId INTEGER NOT NULL, deviceId INTEGER NOT NULL, contextId INTEGER NOT NULL);
CREATE TABLE TARGET_INFO_CUDA_STREAM (streamId INTEGER NOT NULL, hwId INTEGER NOT NULL, vmId INTEGER NOT NULL, processId INTEGER NOT NULL, contextId INTEGER NOT NULL, priority INTEGER NOT NULL, flag INTEGER NOT NULL);

您可以对此应用任何 SQL 查询(当然是 SQLite 的方言)。

于 2021-03-14T15:18:40.843 回答