2

I have the following loop in c++

dword result = 0;
for ( int i = 0; i < 16; i++ ) {
    result |= ( value[i] << (unsigned int)( i << 1 ) );
}

And I would like to parallelize it in amp. I know it might go slower then the actual non-parallelized version above, but I want to do it to learn something more about AMP.

My idea was to loop trough the value array in parallel:

enter image description here

And fill a new array with newarray[0] = value[0] << (unsigned int)(0 << 1 ), newarray[1] = value[1] << (unsigned int)(1 << 1 ), etc. Then I would OR the values in the array in parallel in a tree structure (see image).

I have tried to put this idea in some simple c++ amp code, but I don't succeed in it, so any help would be appreciated.

Thank you for your consideration of this matter, I look forward to a response.

4

1 回答 1

0

以下代码是我认为您需要的一部分。这段代码将一些元素作为输入,并在 CPU 上准备好向量,然后在 GPU 上并行执行位移操作。然后我将 av[elements] 设置回 0,因为我正在使用该元素来存储您的最终结果。这很粗糙,但是 AMP 对可以在 GPU 上处理的数据类型有很大的限制,所以我只使用现有数组的一个额外元素。移位完成后,我为每个按位或函数执行另一个并行操作。这也发生在 GPU 上,但不太令人满意,因为每个操作都将数组的任何给定元素与 av[elements] 元素进行 ORing,因此会产生瓶颈。您的树结构将使这部分运行得更快,但我无法弄清楚如何轻松完成这部分。事实上,这个程序可以在一台相当旧的计算机上在几秒钟内处理 1 亿个元素。对于代码中任何违反最佳实践的行为,请提前致歉;我也是新手。代码如下:

#include <conio.h>
#include <amp.h>
#include <iostream>

using namespace concurrency;
using namespace std;

unsigned int doParallel(unsigned int);

unsigned int elements;

void main()
{
    int ch=NULL;
    cout<<"\nHow many elements to populate: ";
    cin>>elements;
    cout<<"The result is: "<<doParallel(elements);
    cout<<"\nPress 'X' to exit.";
    do
    {
        ch=_getch();
    } while (ch!='X' && ch!='x');
    exit(0);
}

unsigned int doParallel(unsigned int elements)
{
    vector<unsigned int> v(elements+1);

    for (unsigned int i = 0; i<elements+1;i++)
        {
            v[i]=i;
        }
    array_view<unsigned int,1> av(elements+1,v);

    parallel_for_each(av.extent,[=](index<1> idx)
        restrict(amp)
        {
            av[idx] = static_cast<unsigned int>(av[idx])<<1;
        });
    av[elements]=0;
    parallel_for_each(av.extent,[=](index<1> idx)
    restrict(amp)
    {
        av[elements] |= static_cast<unsigned int>(av[idx]);
    });

    return av[elements];
}
于 2013-10-21T20:12:19.647 回答