3

Given the following for each statement that has some linq in it, I need to iterate through the list as Distinct. So I have a number of places I can add the Distict() statement.

Can someone please explain which solution I should use and why? I'm very new to using AsParallel() so I'm not sure which solution to go with..

Existing Code (which is missing and needs the Distinct())

foreach (var phrase in (something != null 
    ? ListOne.AsParallel() 
    : ListTwo.AsParallel()))
{
 ... // irrelevant for this question 
}

Option 1: Distincting the entire result.

foreach (var phrase in (something != null 
    ? ListOne.AsParallel() 
    : ListTwo.AsParallel()).Distinct())
{
 ... // irrelevant for this question 
}

I feel that this would return too much info back (initially).

Option 2: Distinct each list, before Parallel

foreach (var phrase in (something != null 
    ? ListOne.Distinct().AsParallel() 
    : ListTwo.Distinct().AsParallel()))
{
 ... // irrelevant for this question 
}

Option 3: Distinct each list, after the Parallel

foreach (var phrase in (something != null 
    ? ListOne.AsParallel().Distinct() 
    : ListTwo.AsParallel().Distinct() ))
{
 ... // irrelevant for this question 
}

Yes - I could create my own test code with stopwatches, etc.. but I'm not so much after the metrics, but more the theory (as to what I should do .. because of XXXXX).

** Before this turns into a subjective question and get closed down, please consider your answers. ** Secondly, I understand the perf here is tiny .. so just to iterate, I'm not so much worried about the -actual- perf difference, but the theoretical difference.

4

1 回答 1

4

Option 1 and 3 will result in the same execution paths. In both cases Distinct is called on the result of AsParallel. This means that the question really boils down to whether it is better to call Distinct before or after the AsParallel().

Based on the actual implementation of ParallelEnumerable.Distinct() which simply uses the non-parallel Enumerable.Distinct I would say that it makes no difference at all in your case, because you don't actually have a query here. You are just distincting an existing list. You probably shouldn't be using AsParallel here at all.

In case you actually do have a query, it is potentially better to call Distinct last, because it won't be executed in parallel and be potentially faster if executed on a smaller set - which has been filtered down in parallel. But this part of the answer needs actual benchmarking to be verified.

于 2013-04-22T05:12:04.337 回答