0

每个人。希望有人能帮助我。我有一个代码可以在 openMP 上并行 Prim 的算法,我需要让它在 Xeon Phi 上工作。请帮我。我无法真正理解如何做到这一点。这是我在 openMP 上的代码。

void ParallelPrim(double *pMatrix, TTreeNode** pMinSpanningTree, int Size)
{
    int LastAdded;
    TGraphNode NearestNode;
    TGraphNode **NotInMinSpanningTree = new TGraphNode* [Size-1];
    LastAdded = 0;

for(int i = 0; i < Size-1; i++)
{
    NotInMinSpanningTree[i] = new TGraphNode;
    NotInMinSpanningTree[i]->NodeNum = i+1;
    NotInMinSpanningTree[i]->Distance = -1.0f;
    NotInMinSpanningTree[i]->ParentNodeNum = -1;
}

for(int Iter = 1; Iter < Size; Iter++)
{
    #pragma omp parallel for 
        for(int i = 0; i < Size-1; i++)
            if(NotInMinSpanningTree[i] != NULL)
            {
                double t1 = NotInMinSpanningTree[i]->Distance;
                double t2 = pMatrix[(NotInMinSpanningTree[i]->NodeNum) * Size + LastAdded];
                if(((t1 < 0) && (t2 > 0)) || (t1>0) && (t2 > 0) && (t1 > t2))
                {
                    NotInMinSpanningTree[i]->Distance = t2;
                    NotInMinSpanningTree[i]->ParentNodeNum = LastAdded;
                }
            }

        NearestNode.NodeNum = -1;
        NearestNode.Distance = 3000;
#pragma omp parallel
        {
            TGraphNode ThreadNearestNode;
            ThreadNearestNode.NodeNum = -1;
            ThreadNearestNode.Distance = 3000;
            #pragma omp for 
                for(int i = 0; i < Size-1; i++)
                {
                if(NotInMinSpanningTree[i] != NULL)
                {
                    double t1 = NotInMinSpanningTree[i]->Distance;
                    double t2 = ThreadNearestNode.Distance;
                    if((t1 > 0) && (t1 < t2) )
                    {
                        ThreadNearestNode.Distance = t1;
                        ThreadNearestNode.NodeNum = NotInMinSpanningTree[i]->NodeNum;
                    }
                }
            }
#pragma omp critical
            {
                if(ThreadNearestNode.Distance < NearestNode.Distance)
                {
                    NearestNode.Distance = ThreadNearestNode.Distance;
                    NearestNode.NodeNum = ThreadNearestNode.NodeNum;
                }
            }
        }
        pMinSpanningTree[NearestNode.NodeNum] = new TTreeNode;

        pMinSpanningTree[NearestNode.NodeNum]->NodeNum = NotInMinSpanningTree[NearestNode.NodeNum-1]->ParentNodeNum;
        pMinSpanningTree[NearestNode.NodeNum]->Distance = NearestNode.Distance;

        int Parent = NotInMinSpanningTree[NearestNode.NodeNum-1]->ParentNodeNum;
        if(pMinSpanningTree[Parent] != NULL)
        {
           TTreeNode *tmp = new TTreeNode;
            tmp->Distance = NearestNode.Distance;
            tmp->NodeNum = NearestNode.NodeNum;
        }
        else
        {
            pMinSpanningTree[Parent] = new TTreeNode;
            pMinSpanningTree[Parent]->Distance = NearestNode.Distance;
            pMinSpanningTree[Parent]->NodeNum = NearestNode.NodeNum;
        }
        LastAdded = NearestNode.NodeNum;
        delete NotInMinSpanningTree[NearestNode.NodeNum - 1];
        NotInMinSpanningTree[NearestNode.NodeNum - 1] = NULL;
    }
    delete[] NotInMinSpanningTree;
}
4

1 回答 1

0

在英特尔至强融核协处理器上运行代码有两个基本选项。您可以使用 -mmic 和 -qopenmp 标志编译整个程序,然后使用 micnativeloadex 或通过使用 scp 将可执行文件和所需的库复制到协处理器来运行它。或者,您可以省略 -mmic 并修改您的代码,以便您希望在协处理器上运行的代码部分位于卸载部分,其中只有该部分代码将被发送到协处理器运行,其余部分代码将在主机上运行。

Avi 发送给您的演示文稿是协处理器编程的精彩概述。此外,您还可以在以下网址找到有关编译和优化协处理器的基本信息:https ://software.intel.com/en-us/articles/programming-and-compiling-for-intel-many-integrated-core-architecture 。

但是,这是一个很大的问题,您的代码没有矢量化并且具有重要的串行部分。要在协处理器上获得最佳性能,您的代码必须同时进行矢量化和并行化。

于 2015-06-01T23:21:16.337 回答