8

我有一个大列表要遍历(1.500.000 个项目),每个项目我都必须做一个非常小的检查。总共在 30 秒内。

使用 Sequential 时的 CPU 利用率约为 10%,因此有很多资源未使用。

第一个想法是使用Parallel,但是由于每个项目的持续时间有限,Parallel持续时间比顺序Foreach长,这是由于“为什么在这个例子中并行版本比顺序版本慢? ”,这解释了每个任务的创建都将花费时间。

所以我有另一个想法,那就是将列表分成 4 个(或更多)相等的和平,并创建一个线程来循环遍历项目以使其更快。

在创建自己的课程之前,这是一个好方法吗?或者关于如何加快速度的任何其他想法?或者您是否知道更好的处理方法。

代码

我为另一种并行方法创建的代码:(在我自己的静态类中使用)

public static void ForEach<T>(IEnumerable<T> list, Action<T> body, int listDevide)
{
    // Number of items
    int items = list.Count();
    // Divided (in int, so floored)
    int listPart = items / listDevide;
    // Get numbers extra for last run
    int rest = items % listDevide;

    // List to save the actions
    var actions = new List<Action>();
    for(var x = 0; x < listDevide; x++)
    {
        // Create the actions
        actions.Add(delegate {
            foreach(var item in list.Skip(x * listPart).Take(listPart))
            {
                body.Invoke(item);
            }
        });
    }

    // Run the actions parallel
    Parallel.Invoke(actions.ToArray());
}

备注:此示例中当前未使用“rest”变量来执行最后一项。

下面的解决方案,更多信息:http: //msdn.microsoft.com/en-us/library/dd997411.aspx

4

1 回答 1

6

是的,对输入数组进行分区是一个好方法。

事实上,微软提供了一个Partitioner类来帮助这种方法。

这是一个显示如何执行此操作的示例:

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading.Tasks;

namespace Demo
{
    class Program
    {
        private void run()
        {
            double sum = 0;
            Func<double, double> func = x => Math.Sqrt(Math.Sin(x));
            object locker = new object();

            double[] data = testData();

            // For each double in data[] we are going to calculate Math.Sqrt(Math.Sin(x)) and
            // add all the results together.
            //
            // To do this, we use class Partitioner to split the input array into just a few partitions,
            // (the Partitioner will use knowledge about the number of processor cores to optimize this)
            // and then add up all the values using a separate thread for each partition.
            //
            // We use threadLocalState to compute the total for each partition, and then we have to
            // add all these together to get the final sum. We must lock the additon because it isn't
            // threadsafe, and several threads could be doing it at the same time.

            Parallel.ForEach
            (
                Partitioner.Create(0, data.Length),

                () => 0.0,

                (subRange, loopState, threadLocalState) =>
                {
                    for (int i = subRange.Item1; i < subRange.Item2; i++)
                    {
                        threadLocalState += func(data[i]);
                    }

                    return threadLocalState;
                },

                finalThreadLocalState =>
                {
                    lock (locker)
                    {
                        sum += finalThreadLocalState;
                    }
                }
            );

            Console.WriteLine("Sum = " + sum);
        }

        private static double[] testData()
        {
            double[] array = new double[1000003]; // Test with an odd number of values.

            Random rng = new Random(12345);

            for (int i = 0; i < array.Length; ++i)
                array[i] = rng.Next() & 3; // Don't want large values for this simple test.

            return array;
        }

        static void Main()
        {
            new Program().run();
        }
    }
}
于 2013-07-04T11:58:09.957 回答