我认为这个问题在这里问是无关紧要的。却无法自拔。假设我有一个包含 100 个节点的集群,每个节点有 16 个核心。我有一个 mpi 应用程序,它的通信模式是已知的,我也知道集群拓扑(即节点之间的跳距)。现在我知道了减少网络争用的节点映射过程。例如:进程到节点的映射是 10->20,30->90。如何将 rank 10 的进程映射到 node-20?请帮助我。
问问题
3737 次
2 回答
2
如果您不受任何类型的排队系统的限制,您可以通过创建自己的machinefile
.
例如,如果文件my_machine_file
有以下 1600 行
node001
node002
node003
....
node100
node001
node002
node003
....
node100
...
[repeat 13 more times]
...
node001
node002
node003
....
node100
它将对应于映射
0-> node001, 1 -> node002, ... 99 -> node100, 100 -> node001, ...
你应该运行你的应用程序
mpirun -machinefile my_machine_file -n 1600 my_app
当您的应用程序需要少于 1600 个进程时,您可以相应地编辑您的机器文件。
请记住,尽管集群管理员可能已经根据互连的拓扑对节点进行了编号。然而,有报道称通过仔细利用集群拓扑结构可以显着提高性能(大约 10%-20%)。(参考如下)。
注意:启动 MPI 程序mpirun
既不标准化也不可移植。但是,这里的问题显然与特定的计算集群和特定的实现(OpenMPI)有关,并且不需要可移植的解决方案。
于 2013-01-19T07:51:55.530 回答
1
这个聚会有点晚了,但这里有一个 C++ 子程序,它将为您提供一个节点通信器和一个主通信器(仅适用于节点的主节点),以及每个节点的大小和等级。这很笨拙,但不幸的是我还没有找到更好的方法来做到这一点。幸运的是,它只增加了大约 0.1 秒的挂墙时间。也许您或其他人会从中得到一些用处。
#define MASTER 0
using namespace std;
/*
* Make a comunicator for each node and another for just
* the masters of the nodes. Upon completion, everyone is
* in a new node communicator, knows its size and their rank,
* and the rank of their master in the master communicator,
* which can be useful to use for indexing.
*/
bool CommByNode(MPI::Intracomm &NodeComm,
MPI::Intracomm &MasterComm,
int &NodeRank, int &MasterRank,
int &NodeSize, int &MasterSize,
string &NodeNameStr)
{
bool IsOk = true;
int Rank = MPI::COMM_WORLD.Get_rank();
int Size = MPI::COMM_WORLD.Get_size();
/*
* ======================================================================
* What follows is my best attempt at creating a communicator
* for each node in a job such that only the cores on that
* node are in the node's communicator, and each core groups
* itself and the node communicator is made using the Split() function.
* The end of this (lengthly) process is indicated by another comment.
* ======================================================================
*/
char *NodeName, *NodeNameList;
NodeName = new char [1000];
int NodeNameLen,
*NodeNameCountVect,
*NodeNameOffsetVect,
NodeNameTotalLen = 0;
// Get the name and name character count of each core's node
MPI::Get_processor_name(NodeName, NodeNameLen);
// Prepare a vector for character counts of node names
if (Rank == MASTER)
NodeNameCountVect = new int [Size];
// Gather node name lengths to master to prepare c-array
MPI::COMM_WORLD.Gather(&NodeNameLen, 1, MPI::INT, NodeNameCountVect, 1, MPI::INT, MASTER);
if (Rank == MASTER){
// Need character count information for navigating node name c-array
NodeNameOffsetVect = new int [Size];
NodeNameOffsetVect[0] = 0;
NodeNameTotalLen = NodeNameCountVect[0];
// build offset vector and total char count for all node names
for (int i = 1 ; i < Size ; ++i){
NodeNameOffsetVect[i] = NodeNameCountVect[i-1] + NodeNameOffsetVect[i-1];
NodeNameTotalLen += NodeNameCountVect[i];
}
// char-array for all node names
NodeNameList = new char [NodeNameTotalLen];
}
// Gatherv node names to char-array in master
MPI::COMM_WORLD.Gatherv(NodeName, NodeNameLen, MPI::CHAR, NodeNameList, NodeNameCountVect, NodeNameOffsetVect, MPI::CHAR, MASTER);
string *FullStrList, *NodeStrList;
// Each core keeps its node's name in a str for later comparison
stringstream ss;
ss << NodeName;
ss >> NodeNameStr;
delete NodeName; // node name in str, so delete c-array
int *NodeListLenVect, NumUniqueNodes = 0, NodeListCharLen = 0;
string NodeListStr;
if (Rank == MASTER){
/*
* Need to prepare a list of all unique node names, so first
* need all node names (incl duplicates) as strings, then
* can make a list of all unique node names.
*/
FullStrList = new string [Size]; // full list of node names, each will be checked
NodeStrList = new string [Size]; // list of unique node names, used for checking above list
// i loops over node names, j loops over characters for each node name.
for (int i = 0 ; i < Size ; ++i){
stringstream ss;
for (int j = 0 ; j < NodeNameCountVect[i] ; ++j)
ss << NodeNameList[NodeNameOffsetVect[i] + j]; // each char into the stringstream
ss >> FullStrList[i]; // stringstream into string for each node name
ss.str(""); // This and below clear the contents of the stringstream,
ss.clear(); // since the >> operator doesn't clear as it extracts
//cout << FullStrList[i] << endl; // for testing
}
delete NodeNameList; // master is done with full c-array
bool IsUnique; // flag for breaking from for loop
stringstream ss; // used for a full c-array of unique node names
for (int i = 0 ; i < Size ; ++i){ // Loop over EVERY name
IsUnique = true;
for (int j = 0 ; j < NumUniqueNodes ; ++j)
if (FullStrList[i].compare(NodeStrList[j]) == 0){ // check against list of uniques
IsUnique = false;
break;
}
if (IsUnique){
NodeStrList[NumUniqueNodes] = FullStrList[i]; // add unique names so others can be checked against them
ss << NodeStrList[NumUniqueNodes].c_str(); // build up a string of all unique names back-to-back
++NumUniqueNodes; // keep a tally of number of unique nodes
}
}
ss >> NodeListStr; // make a string of all unique node names
NodeListCharLen = NodeListStr.size(); // char length of all unique node names
NodeListLenVect = new int [NumUniqueNodes]; // list of unique node name lengths
/*
* Because Bcast simply duplicates the buffer of the Bcaster to all cores,
* the buffer needs to be a char* so that the other cores can have a similar
* buffer prepared to receive. This wouldn't work if we passed string.c_str()
* as the buffer, becuase the receiving cores don't have string.c_str() to
* receive into, and even if they did, c_srt() is a method and can't be used
* that way.
*/
NodeNameList = new char [NodeListCharLen]; // even though c_str is used, allocate necessary memory
NodeNameList = const_cast<char*>(NodeListStr.c_str()); // c_str() returns const char*, so need to recast
for (int i = 0 ; i < NumUniqueNodes ; ++i) // fill list of unique node name char lengths
NodeListLenVect[i] = NodeStrList[i].size();
/*for (int i = 0 ; i < NumUnique ; ++i)
cout << UniqueNodeStrList[i] << endl;
MPI::COMM_WORLD.Abort(1);*/
//delete NodeStrList; // Arrays of string don't need to be deallocated,
//delete FullStrList; // I'm guessing becuase of something weird in the string class.
delete NodeNameCountVect;
delete NodeNameOffsetVect;
}
/*
* Now we send the list of node names back to all cores
* so they can group themselves appropriately.
*/
// Bcast the number of nodes in use
MPI::COMM_WORLD.Bcast(&NumUniqueNodes, 1, MPI::INT, MASTER);
// Bcast the full length of all node names
MPI::COMM_WORLD.Bcast(&NodeListCharLen, 1, MPI::INT, MASTER);
// prepare buffers for node name Bcast's
if (Rank > MASTER){
NodeListLenVect = new int [NumUniqueNodes];
NodeNameList = new char [NodeListCharLen];
}
// Lengths of node names for navigating c-string
MPI::COMM_WORLD.Bcast(NodeListLenVect, NumUniqueNodes, MPI::INT, MASTER);
// The actual full list of unique node names
MPI::COMM_WORLD.Bcast(NodeNameList, NodeListCharLen, MPI::CHAR, MASTER);
/*
* Similar to what master did before, each core (incl master)
* needs to build an actual list of node names as strings so they
* can compare the c++ way.
*/
int Offset = 0;
NodeStrList = new string[NumUniqueNodes];
for (int i = 0 ; i < NumUniqueNodes ; ++i){
stringstream ss;
for (int j = 0 ; j < NodeListLenVect[i] ; ++j)
ss << NodeNameList[Offset + j];
ss >> NodeStrList[i];
ss.str("");
ss.clear();
Offset += NodeListLenVect[i];
//cout << FullStrList[i] << endl;
}
// Now since everyone has the same list, just check your node and find your group.
int CommGroup = -1;
for (int i = 0 ; i < NumUniqueNodes ; ++i)
if (NodeNameStr.compare(NodeStrList[i]) == 0){
CommGroup = i;
break;
}
if (Rank > MASTER){
delete NodeListLenVect;
delete NodeNameList;
}
// In case process fails, error prints and job aborts.
if (CommGroup < 0){
cout << "**ERROR** Rank " << Rank << " didn't identify comm group correctly." << endl;
IsOk = false;
}
/*
* ======================================================================
* The above method uses c++ strings wherever possible so that things
* like node name comparisons can be done the c++ way. I'm sure there's
* a better way to do this because that was way too many lines of code...
* ======================================================================
*/
// Create node communicators
NodeComm = MPI::COMM_WORLD.Split(CommGroup, 0);
NodeSize = NodeComm.Get_size();
NodeRank = NodeComm.Get_rank();
// Group for master communicator
int MasterGroup;
if (NodeRank == MASTER)
MasterGroup = 0;
else
MasterGroup = MPI_UNDEFINED;
// Create master communicator
MasterComm = MPI::COMM_WORLD.Split(MasterGroup, 0);
MasterRank = -1;
MasterSize = -1;
if (MasterComm != MPI::COMM_NULL){
MasterRank = MasterComm.Get_rank();
MasterSize = MasterComm.Get_size();
}
MPI::COMM_WORLD.Bcast(&MasterSize, 1, MPI::INT, MASTER);
NodeComm.Bcast(&MasterRank, 1, MPI::INT, MASTER);
return IsOk;
}
于 2013-12-12T20:34:50.677 回答