1

I have some code written to use Parallel.For with thread local variables. It's basically a summation of a large array, but the elements of the array are calculated explicitly in the for loop at the same time they are being summed.

The problem I'm having is that my thread-local variables are very, very heavy objects. It's not uncommon for them to take up 200 mb of memory. I noticed my program's memory usage would spike to 2 gb, then the GC would drop it back down to 200 mb and up and down it went, which indicated a lot of temporaries were being allocated. Since I need several thread-local variables, I've wrapped them in a struct object. This allowed me to add a Console.WriteLine in the constructor and I saw a lot of my objects being created whereas I only expected one construction per core on my machine. How can I force it to create exactly (numberOfCores) threads and keep only those around until the end?

I added

ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 2;

which helped only slightly. I still get too many struct constructions. It looks like there is something I can do with options.TaskScheduler, but I can't seem to understand what the extent of its power is. It looks like I can roll my own, which is almost scary. I don't want to do that if possible.

Here is the relevant section of code in my program.

ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 2;

Parallel.For<ThreadLocalData>(0, m, options,
    // Thread local variable initialization
    () => new ThreadLocalData(new DenseMatrix(r * r, r * r, 0),
                              new DenseMatrix(r * r, r * r, 0),
                              new DenseMatrix(r, r, 0)),
    // Per-thread routine
    (row, loop, threadLocalData) =>
    {
        threadLocalData.kronProductRight.Clear();
        for (int column = 0; column < n; ++column)
        {
            if ((int)E[row, column] == 1)
                threadLocalData.kronProductRight.Add(Yblocks[column], threadLocalData.kronProductRight);
        }
        MathNetAdditions.KroneckerProduct(Xblocks[row], threadLocalData.kronProductRight, threadLocalData.kronProduct);
        threadLocalData.subtotal.Add(threadLocalData.kronProduct, threadLocalData.subtotal);
        return threadLocalData;
    },
    (threadLocalData) =>
    {
        lock (mutex)
        A.Add(threadLocalData.subtotal, A);
    }
);
4

2 回答 2

1

查看这篇文章http://blogs.msdn.com/b/pfxteam/archive/2010/10/21/10079121.aspx特别是关于 Parallel.For 在初始化委托很昂贵时出现性能问题的部分。

从上面的代码很难判断,但看起来您应该能够将 ThreadLocalData 的计算/数据部分与它的状态/变异方面分开?理想情况下,您会将不可变版本的 ThreadLocalData 的引用传递给正在处理您的数字的任何内容。这样,无论如何,您都只是在处理一个实例。

于 2012-05-05T09:07:07.927 回答
0

我还没有深入了解您的问题(而且似乎您提出了错误的问题,正如 phoog 指出的那样),但要回答您的具体问题:

我怎样才能强制它精确地创建(numberOfCores)线程并只保留那些直到结束?

你有一个调度程序可以做到这一点:

http://blog.abodit.com/2010/11/task-parallel-library-a-scheduler-with-priority-apartment-state-and-maximum-degree-of-parallelism/

于 2012-05-05T00:09:32.717 回答