2

Background

I have a piece of code which is highly parallelizable and I found that most of the time I am using only one core at 100% while the rest does nothing. To tackle this I've tinkered with multithreading, implementing semaphores and what not to realize that Parallel.For() is fine grained and more efficient than any of my solutions.

The Code

To simplify I will only write the pieces of code structurally important.

int sharedResource = 0;

for (int i = 0; i < someMax; i++)
{
    for (int j = 0; j <= i; j++)
    {
        if (someCondition(i, j))
            sharedResource += someFunction(i, j);
        else break;
    }
}

All of the ambiguously named functions are more or less just mathematical equations and are of time complexity O(1).

Important Details

Pay attention to the inner loop that has the variable i as upper boundary as well as the summation variable named sharedResource. The execution order in this case is not important as addition is commutative and I don't see any obvious reason for Amdahl's Law to apply as all instance combinations (i, j) of both loops can be calculated independently.

Question

Is it smart to use a nested Parallel.For() loop in this scenario or should I only use it instead of the outer loop (or only on the inner respectively)?

The only thing that concerns me is the sharedResource as I don't get a lot of in-depth insight of how Parallel.For() works from the documentation. Another important thing is that if I do use two Parallel.For() loops some instances will finish almost instantly due to the break while others will take much more time. Will it be able to balance this?

4

2 回答 2

3

Whether to use nested parallel loops, parallelize only inner or only outer loop, depends a lot on nature of your data. Nested parallel loops are designed to work reasonably well. For example, if both outer and inner loops has both degree of parallelism of 8 for example - it does not mean that when nested they will process items on 8x8=64 threads, as one might think when looking naively at this.

You should measure perfomance of all options on your specific data set and figure out what works best for you.

Note that Parallel.For loop partitions interval in certain number of ranges (depending on degree of parallelism) and then those ranges are executed in parallel on separate threads. What that means is: if processing time of your items is distributed non-evenly - some ranges might complete much faster than others. Say you run with degree of parallelism 4, and processing 100 items, of which first 75 return false for someCondition and so take 0 time to execute, while last 25 return true. In result, first 3 ranges will complete immediately and last range with all real work will execute on one thread, essentially making whole thing sequential.

If uneven distribution is expected, you might use Parallel.ForEach with "real" IEnumerable instead (by real I mean it's not array or list but real "lazy" IEnumerable):

Parallel.ForEach(Enumerable.Range(0, i), j => {...})

But note that on evenly distributed data it will be slower than pre-partitioned version.

Nested Parallel.For might also help if run-time is unevenly distributed, but again - you have to measure each option on your real data and choose the best one.

As for thread safety. Of course, this

sharedResource += someFunction(i, j);

is not thread safe inside parallel loops. Using lock here might degrade perfomance a lot if someFunction is fast, and not necessary anyway. Either just use

Interlocked.Add(ref sharedResource, someFunction(i, j))

Or you can use overloads of Parallel.For`Parallel.ForEach` which allow accumulation of values per each running thread, and then aggregation of results. For example:

Parallel.For(0, 100, (i, outerState) =>
{
   Parallel.ForEach(Enumerable.Range(0, i), () => 0, (j, innerState, subTotal) =>
   {
       if (someCondition(i, j))
           return subTotal + someFunction(i, j);
       else {
           innerState.Break();
           return subTotal;
       }
   }, subTotalOfThread => Interlocked.Add(ref sharedResource, subTotalOfThread));
});
于 2018-03-24T18:43:00.247 回答
1

You can use some custom partitioner with load balancing enabled and use it in Parallel.ForEach loop. Load balancing assures that every core is busy until the end of execution. For example:

int sharedResource = 0;
var iterations = Enumerable.Range(0, someMax);

//this creates partitioner with load balancing (true is default for IEnumerable really)
var customPartitioner = Partitioner.Create(iterations, true); 

Parallel.ForEach(customPartitioner, i =>
{
    for (int j = 0; j <= i; j++)
    {
        if (someCondition(i, j))
            Interlocked.Add(ref sharedResource, someFunction(i, j)); 
        else break;
    }
});

In your example an assignment operator is really not thread safe so I used Interlocked.Add instead.

You can also write some functional code which can be parallelized by design with LINQ. Note that there is no any shared resource or thread synchronization because there is no state in FP.

var result = customPartitioner
    .AsParallel()
    .Select(i => Enumerable.Range(0, i + 1)
        .AsParallel()
        .TakeWhile(j => someCondition(i, j))
        .Sum(j => someFunction(i, j)))
    .Sum();

One thing you also need to take into account is thread creation cost. The more thread you create the more processor's time is wasted on it instead of doing actual work. Also Parallel.Foreach provides additional cost in determining on which thread each iteration should run. So sometimes it's better to have some inner loop single-threaded . In LINQ example in some cases inner AsParallel may provide additional cost really.

于 2018-03-24T18:35:50.843 回答