f# - C++ AMP 库对 F# 有用吗？

Question

我正在尝试使用 F# 中的 C++ AMP 库作为使用 GPU 并行工作的一种方式。但是，我得到的结果似乎并不直观。

在 C++ 中，我用一个函数创建了一个库，该函数使用 AMP 对数组中的所有数字求平方：

extern "C" __declspec ( dllexport ) void _stdcall square_array(double* arr, int n)
{
// Create a view over the data on the CPU
    array_view<double,1> dataView(n, &arr[0]);

// Run code on the GPU
    parallel_for_each(dataView.extent, [=] (index<1> idx) restrict(amp)
    {
        dataView[idx] = dataView[idx] * dataView[idx];
    });

// Copy data from GPU to CPU
    dataView.synchronize();
}

（代码改编自 Igor Ostrovsky在 MSDN 上的博客。）

然后我编写了以下 F# 来将任务并行库 (TPL) 与 AMP 进行比较：

// Print the time needed to run the given function
let time f =
    let s = new Stopwatch()
    s.Start()
    f ()
    s.Stop()
    printfn "elapsed: %d" s.ElapsedTicks

module CInterop =
    [<DllImport("CPlus", CallingConvention = CallingConvention.StdCall)>]
    extern void square_array(float[] array, int length)

let options = new ParallelOptions()
let size = 1000.0
let arr = [|1.0 .. size|]
// Square the number at the given index of the array
let sq i =
    do arr.[i] <- arr.[i] * arr.[i]
    ()
// Square every number in the array using TPL
time (fun() -> Parallel.For(0, arr.Length - 1, options, new Action<int>(sq)) |> ignore)

let arr2 = [|1.0 .. size|]
// Square every number in the array using AMP
time (fun() -> CInterop.square_array(arr2, arr2.Length))

如果我将数组大小设置为像 10 这样的微不足道的数字，则需要 TPL ~22K 滴答声才能完成，AMP ~10K 滴答声才能完成。这就是我所期望的。据我了解，GPU（因此是 AMP）应该比 TPL 更适合这种情况，在这种情况下，工作被分成非常小的部分。

但是，如果我将数组大小增加到 1000，则 TPL 现在需要约 30K 滴答，而 AMP 需要约 70K 滴答。从那里开始变得更糟。对于 100 万大小的数组，AMP 所需的时间几乎是 TPL 的 1000 倍。

由于我希望 GPU（即 AMP）能更好地完成此类任务，我想知道我在这里缺少什么。

我的显卡是 1GB 的 GeForce 550 Ti，据我所知，这不是一个懒散的。我知道使用 PInvoke 调用 AMP 代码会产生开销，但我希望这是一个固定成本，可以在更大的数组大小上摊销。我相信数组是通过引用传递的（尽管我可能是错的），所以我不认为复制它会产生任何成本。

谢谢大家的建议。

score 7 · Accepted Answer

在 GPU 和 CPU 之间来回传输数据需要时间。您很可能在这里测量您的 PCI Express 总线带宽。对 1M 的浮点数求平方对于 GPU 来说是小菜一碟。

使用该类来衡量 AMP 的性能也不是一个好主意，Stopwach因为 GPU 调用可能异步发生。在您的情况下没关系，但是如果您仅测量计算部分（the parallel_for_each），这将不起作用。我认为您可以为此使用 D3D11 性能计数器。

f# - C++ AMP 库对 F# 有用吗？

1 回答 1

Related

Reference