c++ - 自定义层 API (TensorRT 2.1) 的简单示例？

Question

我正在使用TensorRT 2.1并想要实现一个简单的自定义层。（目标是在嵌入系统上运行。Single Shot Detector）TensorRT

为了练习，我想制作一个Inc层（只需将 1.0 添加到输入张量值并保持尺寸相同）。

我按照示例实现Inc类。除了保持相同的尺寸外，我保持一切几乎相同。（这似乎很好。）class Reshape : public IpluginsampleFasterRNN.cppgetOutputDimensions()

我应该在哪里实现“添加 1.0”部分？我想它应该在“enqueue()”中。所以，我尝试了

int enqueue(int batchSize, const void*const *inputs, void** outputs, void*, cudaStream_t stream) override
{
  # the below is from the Reshape class. seems to copy from input to output
  CHECK(cudaMemcpyAsync(outputs[0], inputs[0], mCopySize * batchSize, cudaMemcpyDeviceToDevice, stream));
  # add 1.0 to first ten values
  float* foutputs = (float*) outputs[0];
  int i; for (i = 0; i < 10; i++) foutputs[i] += 1.0;   
  return 0;
}

但是，这部分会导致“分段错误”错误。

我的问题是：

在哪里以及如何在输入和输出之间实现一些计算？
谁能提供一个简单的例子？

score 0 · Accepted Answer

参考文件samples/samplePlugin/samplePlugin.cpp并查看FCPlugin类。您的实际计算应该进入该enqueue方法。您可能必须编写一个执行增量的 CUDA 内核。

c++ - 自定义层 API (TensorRT 2.1) 的简单示例？

1 回答 1

Related

Reference