1

我正在使用 TensorRT FP16 精度模式来优化我的深度学习模型。我在Jetson TX2上使用了这个优化模型。在测试模型时,我观察到 TensorRT 推理引擎不是确定性的。换句话说,我的优化模型为相同的输入图像提供了 40 到 120 FPS 之间的不同 FPS 值。

当我看到关于 CUDA 的评论,我开始认为非确定性的根源是浮点运算:

“如果您的代码使用浮点原子,结果可能因运行而异,因为浮点运算通常不是关联的,并且当使用原子时,数据进入计算的顺序(例如总和)是不确定的。 "

FP16、FP32 和 INT8 等精度类型是否会影响 TensorRT 的确定性?还是什么?

你有什么想法吗?

此致。

4

1 回答 1

0

I solved the problem by changing the function clock() that I used for measuring latencies. The clock() function was measuring the CPU time latency, but what I want to do is to measure real time latency. Now I am using std::chrono to measure the latencies. Now inference results are latency-deterministic.

That was wrong one, (clock())

int main ()
{
  clock_t t;
  int f;
  t = clock();
  inferenceEngine(); // Tahmin yapılıyor
  t = clock() - t;
  printf ("It took me %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);
  return 0;
}

Use Cuda Events like this, (CudaEvent)

cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);

cudaEventRecord(start);
inferenceEngine(); // Do the inference

cudaEventRecord(stop);

cudaEventSynchronize(stop);
float milliseconds = 0;

cudaEventElapsedTime(&milliseconds, start, stop);

Use chrono like this: (std::chrono)

#include <iostream>
#include <chrono>
#include <ctime>
int main()
{
  auto start = std::chrono::system_clock::now();
  inferenceEngine(); // Do the inference
  auto end = std::chrono::system_clock::now();

  std::chrono::duration<double> elapsed_seconds = end-start;
  std::time_t end_time = std::chrono::system_clock::to_time_t(end);

  std::cout << "finished computation at " << std::ctime(&end_time)
            << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
于 2019-08-02T15:09:23.927 回答