在 Mono 上运行计算密集型多处理代码时,我遇到了严重的性能问题。下面的简单片段使用蒙特卡洛方法估计 pi 的值,说明了这个问题。
该程序会产生与当前机器上的逻辑核心数相等的线程数,并在每个线程上执行相同的计算。在英特尔酷睿 i7 笔记本电脑上使用 .NET Framework 4.5 运行 Windows 7 时,整个进程运行时间为 4.2 秒,线程各自执行时间之间的相对标准偏差为 2%。
但是,当使用 Mono 2.10.9 在同一台机器(和操作系统)上运行时,总执行时间会高达 18 秒。各个线程的性能差异很大,最快的只需 5.6 秒,而最慢的需要 18 秒。平均值为 14 s,相对标准偏差为 28%。
原因似乎不是线程调度。将每个线程固定到不同的核心(通过调用BeginThreadAffinity
and SetThreadAffinityMask
)对线程的持续时间或差异没有任何显着影响。
类似地,在每个线程上多次运行计算(并单独计时)也会给出看似临时的持续时间。因此,该问题似乎也不是由每个处理器的预热时间引起的。
我确实发现有所作为的是将所有 8 个线程固定到同一个处理器上。在这种情况下,整体执行时间为 25 秒,仅比在单个线程上执行 8 倍的工作慢 1%。此外,相对标准偏差也降至 1% 以下。因此,问题不在于 Mono 的多线程本身,而在于它的多处理。
有没有人有关于如何解决这个性能问题的解决方案?
static long limit = 1L << 26;
static long[] results;
static TimeSpan[] timesTaken;
internal static void Main(string[] args)
{
int processorCount = Environment.ProcessorCount;
Console.WriteLine("Thread count: " + processorCount);
Console.WriteLine("Number of points per thread: " + limit.ToString("N0"));
Thread[] threads = new Thread[processorCount];
results = new long[processorCount];
timesTaken = new TimeSpan[processorCount];
for (int i = 0; i < processorCount; ++i)
threads[i] = new Thread(ComputeMonteCarloPi);
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < processorCount; ++i)
threads[i].Start(i);
for (int i = 0; i < processorCount; ++i)
threads[i].Join();
stopwatch.Stop();
double average = results.Average();
double ratio = average / limit;
double pi = ratio * 4;
Console.WriteLine("Pi: " + pi);
Console.WriteLine("Overall duration: " + FormatTime(stopwatch.Elapsed));
Console.WriteLine();
for (int i = 0; i < processorCount; ++i)
Console.WriteLine("Thread " + i.ToString().PadLeft(2, '0') + " duration: " + FormatTime(timesTaken[i]));
Console.ReadKey();
}
static void ComputeMonteCarloPi(object o)
{
int processorID = (int)o;
Random random = new Random(0);
Stopwatch stopwatch = Stopwatch.StartNew();
long hits = SamplePoints(random);
stopwatch.Stop();
timesTaken[processorID] = stopwatch.Elapsed;
results[processorID] = hits;
}
private static long SamplePoints(Random random)
{
long hits = 0;
for (long i = 0; i < limit; ++i)
{
double x = random.NextDouble() - 0.5;
double y = random.NextDouble() - 0.5;
if (x * x + y * y <= 0.25)
hits++;
}
return hits;
}
static string FormatTime(TimeSpan time, int padLeft = 7)
{
return time.TotalMilliseconds.ToString("N0").PadLeft(padLeft);
}
.NET 上的输出:
Thread count: 8
Number of points per thread: 67,108,864
Pi: 3.14145541191101
Overall duration: 4,234
Thread 00 duration: 4,199
Thread 01 duration: 3,987
Thread 02 duration: 4,002
Thread 03 duration: 4,032
Thread 04 duration: 3,956
Thread 05 duration: 3,980
Thread 06 duration: 4,036
Thread 07 duration: 4,160
单声道输出:
Thread count: 8
Number of points per thread: 67,108,864
Pi: 3.14139330387115
Overall duration: 17,890
Thread 00 duration: 10,023
Thread 01 duration: 13,203
Thread 02 duration: 14,776
Thread 03 duration: 15,564
Thread 04 duration: 17,888
Thread 05 duration: 16,776
Thread 06 duration: 16,050
Thread 07 duration: 5,561
Mono 上的输出,所有线程都固定到同一处理器:
Thread count: 8
Number of points per thread: 67,108,864
Pi: 3.14139330387115
Overall duration: 25,260
Thread 00 duration: 24,704
Thread 01 duration: 25,191
Thread 02 duration: 24,689
Thread 03 duration: 24,697
Thread 04 duration: 24,716
Thread 05 duration: 24,725
Thread 06 duration: 24,707
Thread 07 duration: 24,720
Mono,单线程上的输出:
Thread count: 1
Number of points per thread: 536,870,912
Pi: 3.14153660088778
Overall duration: 25,090