I'm just starting to use OpenMP and am writing a function which divides an array into numBlocks
blocks and computes a histogram on every block (i.e., one histogram per block) by inspecting the blockSize
elements of each block (in the code I'm providing, the histogram is recording the divisibility of the elements in the block by the integers 1
to numBuckets
).
In the first omp for
loop I create a thread for each block using:
#pragma omp for schedule(static)
for(uint blockNum = 0; blockNum < numBlocks; blockNum++){
for(uint blockSubIdx = 0; blockSubIdx < blockSize; blockSubIdx++){
uint idx = blockNum * blockSize + blockSubIdx;
// Compute histogram here by examining array[idx]
}
}
Implementing it another way, I request threads to each operate on blockSize
elements, where I have previously asserted that array.size() == numBlocks * blockSize
:
#pragma omp for schedule(static, blockSize)
for(uint idx = 0; idx < array.size(); idx++){
uint blockNum = idx / blockSize;
// Compute histogram here by examining array[idx]
}
This second method does not work correctly if I increase the number of threads (using export OMP_NUM_THREADS
) above some threshold (78 on my compute node)--the resulting histogram values do not match those from a serial computation.
According to https://computing.llnl.gov/tutorials/openMP/, it looks like the blocks of size blockSize
in the second method are contiguous, so I'm not clear as to why it's failing. It seems like multiple threads are writing to the same index in the histogram, though. Is there a subtlety that I'm missing?
Here is a gist of the whole source code: https://gist.github.com/anonymous/5391777.
UPDATE: The thread threshold does show some dependence on array.size()
. It's not the particular value of the threshold that I'm trying to understand, but rather the existence of the threshold.