void Addx(float *z, float *x, float *y, size_t m, size_t n)
{
vector<float > vx(m*n);
vector<float > vy(m*n);
vector<float > vz(m*n);
vx.assign(x, x + n*m);
vy.assign(y, y + n*m);
pick_accelerator();
extent<2> e(m, n);
array_view<const float, 2> xg(e, vx), yg(e, vy);
array_view<float, 2> zg(e, vz);
zg.discard_data();
parallel_for_each(e, [=](index<2> idx) restrict(amp)
{
zg[idx] = xg[idx] + yg[idx];
});
zg.synchronize();
for (int count = 0; count < m*n; count++)
{
z[count] = vz[count];
}
}
我的 GPU 是 HD 7790,该程序在 matlab mex 中实现。C++AMP。我看到程序比 CPU Phenom II X6 (1055T) 2.8GHZ 慢。
Size Array 1024x1024
GPU Elapsed time is 0.026684 seconds.
CPU Elapsed time is 0.004970 seconds.
我看到程序比 CPU Phenom II X6 (1055T) 800MHZ(慢 4 倍)慢。
Size Array 1024x1024
GPU Elapsed time is 0.064891 seconds.
CPU Elapsed time is 0.009650 seconds.
CPU 和 GPU 之间的关系传输内存。我如何加速 GPU 程序?
CPU 130 Gflops AIDA64x FP (Phenom II X6 1055T)
GPU 1820 Gflops AIDA64x FP (HD 7790 OC)