我有安装了英特尔并行工作室 2013 的 Visual Studio 2012,所以我有英特尔 TBB。
假设我有以下代码:
const int cardsCount = 12; // will be READ by all threads
// the required number of cards of each colour to complete its set:
// NOTE that the required number of cards of each colour is not the same as the total number of cards of this colour available
int required[] = {2,3,4}; // will be READ by all threads
Card cards[cardsCount]; // will be READ by all threads
int cardsIndices[cardsCount];// this will be permuted, permutations need to be split among threads !
// set "cards" to 4 cards of each colour (3 colours total = 12 cards)
// set cardsIndices to {0,1,2,3...,11}
// this variable will be written to by all threads, maybe have one for each thread and combine them later?? or can I use concurrent_vector<int> instead !?
int logColours[] = {0,0,0};
int permutationsCount = fact(cardsCount);
for (int pNum=0; pNum<permutationsCount; pNum++) // I want to make this loop parallel !!
{
int countColours[3] = {0,0,0}; // local loop variable, no problem with multithreading
for (int i=0; i<cardsCount; i++)
{
Card c = cards[cardsIndices[i]]; // accessed "cards"
countColours[c.Colour]++; // local loop variable, np.
// we got the required number of cards of this colour to complete it
if (countColours[c.Colour] == required[c.Colour]) // read global variable "required" !
{
// log that we completed this colour and go to next permutation
logColours[c.Colour] ++; // should I use a concurrent_vector<int> for this shared variable?
break;
}
}
std::next_permutation(cardsIndices, cardsIndices+cardsCount); // !! this is my main issue
}
我正在计算的是,如果我们从可用的卡片中随机挑选,我们将完成多少次颜色,这是通过遍历每个可能的排列并按顺序挑选来彻底完成的,当一种颜色“完成”时,我们会打破并进入下一个排列。请注意,我们有每种颜色的 4 张卡片,但完成每种颜色所需的卡片数量是 {2,3,4}(红、绿、蓝)。2 张红色卡足以完成红色,我们有 4 张可用,因此红色比蓝色更可能完成,蓝色需要选择所有 4 张卡。
我想让这个for循环并行,但我的主要问题是如何处理“卡片”排列?你在这里有大约 5 亿个排列(12 个!),如果我有 4 个线程,我怎么能把它分成 4 个不同的部分,让每个线程都通过它们?
如果我不知道机器的内核数,我希望程序自动选择正确的并发线程数怎么办?肯定有办法使用英特尔或微软工具来做到这一点吗?
这是我的 Card 结构以防万一:
struct Card
{
public:
int Colour;
int Symbol;
}