I have some code written to use Parallel.For with thread local variables. It's basically a summation of a large array, but the elements of the array are calculated explicitly in the for loop at the same time they are being summed.
The problem I'm having is that my thread-local variables are very, very heavy objects. It's not uncommon for them to take up 200 mb of memory. I noticed my program's memory usage would spike to 2 gb, then the GC would drop it back down to 200 mb and up and down it went, which indicated a lot of temporaries were being allocated. Since I need several thread-local variables, I've wrapped them in a struct object. This allowed me to add a Console.WriteLine in the constructor and I saw a lot of my objects being created whereas I only expected one construction per core on my machine. How can I force it to create exactly (numberOfCores) threads and keep only those around until the end?
I added
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 2;
which helped only slightly. I still get too many struct constructions. It looks like there is something I can do with options.TaskScheduler, but I can't seem to understand what the extent of its power is. It looks like I can roll my own, which is almost scary. I don't want to do that if possible.
Here is the relevant section of code in my program.
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 2;
Parallel.For<ThreadLocalData>(0, m, options,
// Thread local variable initialization
() => new ThreadLocalData(new DenseMatrix(r * r, r * r, 0),
new DenseMatrix(r * r, r * r, 0),
new DenseMatrix(r, r, 0)),
// Per-thread routine
(row, loop, threadLocalData) =>
{
threadLocalData.kronProductRight.Clear();
for (int column = 0; column < n; ++column)
{
if ((int)E[row, column] == 1)
threadLocalData.kronProductRight.Add(Yblocks[column], threadLocalData.kronProductRight);
}
MathNetAdditions.KroneckerProduct(Xblocks[row], threadLocalData.kronProductRight, threadLocalData.kronProduct);
threadLocalData.subtotal.Add(threadLocalData.kronProduct, threadLocalData.subtotal);
return threadLocalData;
},
(threadLocalData) =>
{
lock (mutex)
A.Add(threadLocalData.subtotal, A);
}
);