-1

我有一些代码,它同时执行许多任务并出现性能问题,对于这个问题,我创建了发生相同问题的简化代码。

在这个简化的代码中,两个任务通过使用同时执行Parallel.ForEach. 同时被评估的任务在一个长的for循环中迭代,并且在每次迭代中,它都会改变一个整数变量。如果这两个任务都改变了一个局部整数变量,或者一个改变了一个局部整数变量而一个改变了一个全局变量,并行代码几乎只需要串行代码的一半时间(串行代码大约需要 4.5 秒和并行代码大约需要 2.5 秒)。但是如果两个任务在每个循环中同时更改不同的全局整数变量,或者如果一个任务更改一个全局变量而其他任务访问它,则并行代码中的性能更差(串行评估大约需要 5.0 秒,并行评估需要约 7.5 秒)。两个任务都会改变不同的变量(甚至是原子数据类型),

我想知道发生了什么,以及这个问题的解决方案是否会改变算法(在这个简单的代码中,算法是改变变量的for循环),这样全局变量不会经常被改变,或者如果有一个技巧或我忽略的东西,可以在不改变算法的情况下解决这个问题。

这是代码:

using System.Diagnostics;
using System.Threading.Tasks;
using System;

public class Program
{
    static void Main()
    {
        Program prog = new Program();
    }

    int intField1;
    int intField2;

    public Program()
    {
        this.intField1 = 0;
        this.intField2 = 0;
        Stopwatch watch = new Stopwatch();

        //Here we evaluate a task, 
        //normal serial Evaluation
        Console.WriteLine("serial evaluation");
        watch.Start();
        for (int j = 0; j < 2; j++)
        {
            this.TaskThatTakesFewSeconds(j);
        }
        Console.WriteLine("Elapsed milliseconds: " + watch.ElapsedMilliseconds);
        watch.Stop();

        this.intField1 = 0;
        this.intField2 = 0;

        watch = new Stopwatch();
        Console.WriteLine("parallel evaluation");
        watch.Start();
        //parallel Evaluation
        int[] loops = new int[2] { 0, 1 };
        Parallel.ForEach(loops, x =>
            this.TaskThatTakesFewSeconds(x)
        );
        Console.WriteLine("Elapsed milliseconds: " + watch.ElapsedMilliseconds);
        watch.Stop();
    }

    public void TaskThatTakesFewSeconds(int k
    {
        int localVariable = 0;
        if (k == 0)
        {
            for (ulong j = 0; j < 1000000000; j++)
            {
                //leave one of the next two lines commented
                //localVariable++;
                this.intField1++;
            }
        }
        else
        {
            for (ulong j = 0; j < 1000000000; j++)
            {
                //leave one of the next two lines commented
                //localVariable++;
                this.intField2++;
            }
        }
    }
}
4

1 回答 1

-1

我强烈建议您不要使用它StopWatch来进行性能测量。
我鼓励您使用Benchmark.NET

  • 它是一个简单易用但功能强大的微型基准测试工具。

让我向您展示如何设置测试环境。

ITest

这个接口定义了每个测试用例的公共表面

public interface ITest
{
    void Execute();
}

Computation

该类包含通用逻辑

public class Computation
{
    private int intField1;
    private int intField2;
    public void TaskThatTakesFewSeconds(int k)
    {
        if (k == 0)
        {
            for (ulong j = 0; j < 1000000000; j++)
            {
                intField1++;
            }
        }
        else
        {
            for (ulong j = 0; j < 1000000000; j++)
            {
                intField2++;
            }
        }
    }
}

SequentialTest

此类包含一个实现变体,它将顺序执行两个操作

public class SequentialTest: ITest
{
    private readonly Computation _comp;
    public SequentialTest()
    {
        this._comp = new Computation();
    }
    public void Execute()
    {
        for (int j = 0; j < 2; j++)
        {
            this._comp.TaskThatTakesFewSeconds(j);
        }
    }
}

ParallelForeachTestParallelInvokeTest

这些类包含不同的实现变体。
在这两个类中,操作将同时执行

public class ParallelForeachTest: ITest
{
    private readonly Computation _comp;

    public ParallelForeachTest()
    {
        _comp = new Computation();
    }

    public void Execute()
    {
        var loops = new [] { 0, 1 };
        Parallel.ForEach(loops, this._comp.TaskThatTakesFewSeconds);
    }
}
public class ParallelInvokeTest: ITest
{
    private readonly Computation _comp;

    public ParallelInvokeTest()
    {
        _comp = new Computation();
    }

    public void Execute()
    {
        Parallel.Invoke(
            () => this._comp.TaskThatTakesFewSeconds(0), 
            () => this._comp.TaskThatTakesFewSeconds(1));
    }
}

TestCase

这个类负责设置实验

[HtmlExporter]
[MemoryDiagnoser]
[SimpleJob(BenchmarkDotNet.Engines.RunStrategy.ColdStart, targetCount: 5)]
public class TestCase
{
    [Benchmark(Baseline = true)]
    public void RunBaseLine() => RunExperiment<SequentialTest>();

    [Benchmark]
    public void RunParallelForEach() => RunExperiment<ParallelForeachTest>();

    [Benchmark]
    public void RunParallelInvoke() => RunExperiment<ParallelInvokeTest>();

    internal void RunExperiment<T>() where T : ITest, new()
    {
        new T().Execute();
    }
}
  • MemoryDiagnoser: Benchmark 也会监控内存使用情况
  • SimpleJob: 这里我们定义迭代
    • 会有一些迭代不会计入最终结果。(冷启动
    • 在实验期间将进行 5 次迭代。
  • Benchmark(Baseline = true):顺序变体将用作基线。
    • 每个其他实现都将与此相关(比率)

Program

我们启动实验的控制台应用程序的入口点

class Program
{
    static void Main(string[] args)
    {
        BenchmarkRunner.Run<TestCase>();
        Console.ReadLine();
    }
}

注意:请确保在应用程序符合发布模式时运行此实验。

我的笔记本电脑上的结果:

TL;博士

|             Method |    Mean |    Error |   StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------- |--------:|---------:|---------:|------:|------:|------:|------:|----------:|
|        RunBaseLine | 2.711 s | 0.0307 s | 0.0080 s |  1.00 |     - |     - |     - |     432 B |
| RunParallelForEach | 1.944 s | 0.1432 s | 0.0372 s |  0.72 |     - |     - |     - |    2696 B |
|  RunParallelInvoke | 1.975 s | 0.1283 s | 0.0333 s |  0.73 |     - |     - |     - |     856 B |

满的

// Validating benchmarks:
// ***** BenchmarkRunner: Start   *****
// ***** Found 3 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start   *****
// start dotnet restore  /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 in C:\...\MySimpleBenchmark\bin\Release\netcoreapp3.1\7dd97576-8d82-459f-8018-efbdb1d641bc
// command took 1.35s and exited with 0
// start dotnet build -c Release  --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 in C:\...\MySimpleBenchmark\bin\Release\netcoreapp3.1\7dd97576-8d82-459f-8018-efbdb1d641bc
// command took 2.23s and exited with 0
// ***** Done, took 00:00:03 (3.7 sec)   *****
// Found 3 benchmarks:
//   TestCase.RunBaseLine: Job-RWBPOP(IterationCount=5, RunStrategy=ColdStart)
//   TestCase.RunParallelForEach: Job-RWBPOP(IterationCount=5, RunStrategy=ColdStart)
//   TestCase.RunParallelInvoke: Job-RWBPOP(IterationCount=5, RunStrategy=ColdStart)

// **************************
// Benchmark: TestCase.RunBaseLine: Job-RWBPOP(IterationCount=5, RunStrategy=ColdStart)
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "7dd97576-8d82-459f-8018-efbdb1d641bc.dll" --benchmarkName "MySimpleBenchmark.TestCase.RunBaseLine" --job "IterationCount=5, RunStrategy=ColdStart" --benchmarkId 0 in C:\...\MySimpleBenchmark\bin\Release\netcoreapp3.1\7dd97576-8d82-459f-8018-efbdb1d641bc\bin\Release\netcoreapp3.1
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT
// GC=Concurrent Workstation
// Job: Job-MCAVLE(IterationCount=5, RunStrategy=ColdStart)

// BeforeActualRun
WorkloadActual   1: 1 op, 2703012600.00 ns, 2.7030 s/op
WorkloadActual   2: 1 op, 2722115500.00 ns, 2.7221 s/op
WorkloadActual   3: 1 op, 2714919500.00 ns, 2.7149 s/op
WorkloadActual   4: 1 op, 2704378800.00 ns, 2.7044 s/op
WorkloadActual   5: 1 op, 2708101600.00 ns, 2.7081 s/op

// AfterActualRun
WorkloadResult   1: 1 op, 2703012600.00 ns, 2.7030 s/op
WorkloadResult   2: 1 op, 2722115500.00 ns, 2.7221 s/op
WorkloadResult   3: 1 op, 2714919500.00 ns, 2.7149 s/op
WorkloadResult   4: 1 op, 2704378800.00 ns, 2.7044 s/op
WorkloadResult   5: 1 op, 2708101600.00 ns, 2.7081 s/op
GC:  0 0 0 432 1
Threading:  2 0 1

// AfterAll
// Benchmark Process 30236 has exited with code 0

Mean = 2.711 s, StdErr = 0.004 s (0.13%), N = 5, StdDev = 0.008 s
Min = 2.703 s, Q1 = 2.704 s, Median = 2.708 s, Q3 = 2.715 s, Max = 2.722 s
IQR = 0.011 s, LowerFence = 2.689 s, UpperFence = 2.731 s
ConfidenceInterval = [2.680 s; 2.741 s] (CI 99.9%), Margin = 0.031 s (1.13% of Mean)
Skewness = 0.39, Kurtosis = 1.15, MValue = 2

// **************************
// Benchmark: TestCase.RunParallelForEach: Job-RWBPOP(IterationCount=5, RunStrategy=ColdStart)
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "7dd97576-8d82-459f-8018-efbdb1d641bc.dll" --benchmarkName "MySimpleBenchmark.TestCase.RunParallelForEach" --job "IterationCount=5, RunStrategy=ColdStart" --benchmarkId 1 in C:\...\MySimpleBenchmark\bin\Release\netcoreapp3.1\7dd97576-8d82-459f-8018-efbdb1d641bc\bin\Release\netcoreapp3.1
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT
// GC=Concurrent Workstation
// Job: Job-OWEIXV(IterationCount=5, RunStrategy=ColdStart)

// BeforeActualRun
WorkloadActual   1: 1 op, 1885435300.00 ns, 1.8854 s/op
WorkloadActual   2: 1 op, 1951180900.00 ns, 1.9512 s/op
WorkloadActual   3: 1 op, 1989053900.00 ns, 1.9891 s/op
WorkloadActual   4: 1 op, 1944026900.00 ns, 1.9440 s/op
WorkloadActual   5: 1 op, 1948992000.00 ns, 1.9490 s/op

// AfterActualRun
WorkloadResult   1: 1 op, 1885435300.00 ns, 1.8854 s/op
WorkloadResult   2: 1 op, 1951180900.00 ns, 1.9512 s/op
WorkloadResult   3: 1 op, 1989053900.00 ns, 1.9891 s/op
WorkloadResult   4: 1 op, 1944026900.00 ns, 1.9440 s/op
WorkloadResult   5: 1 op, 1948992000.00 ns, 1.9490 s/op
GC:  0 0 0 2696 1
Threading:  6 0 1

// AfterAll
// Benchmark Process 21660 has exited with code 0

Mean = 1.944 s, StdErr = 0.017 s (0.86%), N = 5, StdDev = 0.037 s
Min = 1.885 s, Q1 = 1.944 s, Median = 1.949 s, Q3 = 1.951 s, Max = 1.989 s
IQR = 0.007 s, LowerFence = 1.933 s, UpperFence = 1.962 s
ConfidenceInterval = [1.800 s; 2.087 s] (CI 99.9%), Margin = 0.143 s (7.37% of Mean)
Skewness = -0.41, Kurtosis = 1.65, MValue = 2

// **************************
// Benchmark: TestCase.RunParallelInvoke: Job-RWBPOP(IterationCount=5, RunStrategy=ColdStart)
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "7dd97576-8d82-459f-8018-efbdb1d641bc.dll" --benchmarkName "MySimpleBenchmark.TestCase.RunParallelInvoke" --job "IterationCount=5, RunStrategy=ColdStart" --benchmarkId 2 in C:\...\MySimpleBenchmark\bin\Release\netcoreapp3.1\7dd97576-8d82-459f-8018-efbdb1d641bc\bin\Release\netcoreapp3.1
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT
// GC=Concurrent Workstation
// Job: Job-XWLZBM(IterationCount=5, RunStrategy=ColdStart)

// BeforeActualRun
WorkloadActual   1: 1 op, 1976197800.00 ns, 1.9762 s/op
WorkloadActual   2: 1 op, 1972453200.00 ns, 1.9725 s/op
WorkloadActual   3: 1 op, 1967741600.00 ns, 1.9677 s/op
WorkloadActual   4: 1 op, 2026004200.00 ns, 2.0260 s/op
WorkloadActual   5: 1 op, 1932835200.00 ns, 1.9328 s/op

// AfterActualRun
WorkloadResult   1: 1 op, 1976197800.00 ns, 1.9762 s/op
WorkloadResult   2: 1 op, 1972453200.00 ns, 1.9725 s/op
WorkloadResult   3: 1 op, 1967741600.00 ns, 1.9677 s/op
WorkloadResult   4: 1 op, 2026004200.00 ns, 2.0260 s/op
WorkloadResult   5: 1 op, 1932835200.00 ns, 1.9328 s/op
GC:  0 0 0 856 1
Threading:  3 0 1

// AfterAll
// Benchmark Process 11348 has exited with code 0

Mean = 1.975 s, StdErr = 0.015 s (0.75%), N = 5, StdDev = 0.033 s
Min = 1.933 s, Q1 = 1.968 s, Median = 1.972 s, Q3 = 1.976 s, Max = 2.026 s
IQR = 0.008 s, LowerFence = 1.955 s, UpperFence = 1.989 s
ConfidenceInterval = [1.847 s; 2.103 s] (CI 99.9%), Margin = 0.128 s (6.50% of Mean)
Skewness = 0.31, Kurtosis = 1.61, MValue = 2

// ***** BenchmarkRunner: Finish  *****

// * Export *
  BenchmarkDotNet.Artifacts\results\MySimpleBenchmark.TestCase-report.csv
  BenchmarkDotNet.Artifacts\results\MySimpleBenchmark.TestCase-report-github.md
  BenchmarkDotNet.Artifacts\results\MySimpleBenchmark.TestCase-report.html

// * Detailed results *
TestCase.RunBaseLine: Job-RWBPOP(IterationCount=5, RunStrategy=ColdStart)
Runtime = .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT; GC = Concurrent Workstation
Mean = 2.711 s, StdErr = 0.004 s (0.13%), N = 5, StdDev = 0.008 s
Min = 2.703 s, Q1 = 2.704 s, Median = 2.708 s, Q3 = 2.715 s, Max = 2.722 s
IQR = 0.011 s, LowerFence = 2.689 s, UpperFence = 2.731 s
ConfidenceInterval = [2.680 s; 2.741 s] (CI 99.9%), Margin = 0.031 s (1.13% of Mean)
Skewness = 0.39, Kurtosis = 1.15, MValue = 2
-------------------- Histogram --------------------
[2.697 s ; 2.728 s) | @@@@@
---------------------------------------------------

TestCase.RunParallelForEach: Job-RWBPOP(IterationCount=5, RunStrategy=ColdStart)
Runtime = .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT; GC = Concurrent Workstation
Mean = 1.944 s, StdErr = 0.017 s (0.86%), N = 5, StdDev = 0.037 s
Min = 1.885 s, Q1 = 1.944 s, Median = 1.949 s, Q3 = 1.951 s, Max = 1.989 s
IQR = 0.007 s, LowerFence = 1.933 s, UpperFence = 1.962 s
ConfidenceInterval = [1.800 s; 2.087 s] (CI 99.9%), Margin = 0.143 s (7.37% of Mean)
Skewness = -0.41, Kurtosis = 1.65, MValue = 2
-------------------- Histogram --------------------
[1.857 s ; 1.914 s) | @
[1.914 s ; 1.995 s) | @@@@
---------------------------------------------------

TestCase.RunParallelInvoke: Job-RWBPOP(IterationCount=5, RunStrategy=ColdStart)
Runtime = .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT; GC = Concurrent Workstation
Mean = 1.975 s, StdErr = 0.015 s (0.75%), N = 5, StdDev = 0.033 s
Min = 1.933 s, Q1 = 1.968 s, Median = 1.972 s, Q3 = 1.976 s, Max = 2.026 s
IQR = 0.008 s, LowerFence = 1.955 s, UpperFence = 1.989 s
ConfidenceInterval = [1.847 s; 2.103 s] (CI 99.9%), Margin = 0.128 s (6.50% of Mean)
Skewness = 0.31, Kurtosis = 1.61, MValue = 2
-------------------- Histogram --------------------
[1.929 s ; 2.000 s) | @@@@
[2.000 s ; 2.052 s) | @
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1379 (1909/November2018Update/19H2)
Intel Core i7-8665U CPU 1.90GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.103
  [Host]     : .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT
  Job-RWBPOP : .NET Core 3.1.12 (CoreCLR 4.700.21.6504, CoreFX 4.700.21.6905), X64 RyuJIT

IterationCount=5  RunStrategy=ColdStart

|             Method |    Mean |    Error |   StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------- |--------:|---------:|---------:|------:|------:|------:|------:|----------:|
|        RunBaseLine | 2.711 s | 0.0307 s | 0.0080 s |  1.00 |     - |     - |     - |     432 B |
| RunParallelForEach | 1.944 s | 0.1432 s | 0.0372 s |  0.72 |     - |     - |     - |    2696 B |
|  RunParallelInvoke | 1.975 s | 0.1283 s | 0.0333 s |  0.73 |     - |     - |     - |     856 B |

// * Hints *
Outliers
  TestCase.RunParallelForEach: IterationCount=5, RunStrategy=ColdStart -> 2 outliers were detected (1.89 s, 1.99 s)
  TestCase.RunParallelInvoke: IterationCount=5, RunStrategy=ColdStart  -> 2 outliers were detected (1.93 s, 2.03 s)

// * Legends *
  Mean      : Arithmetic mean of all measurements
  Error     : Half of 99.9% confidence interval
  StdDev    : Standard deviation of all measurements
  Ratio     : Mean of the ratio distribution ([Current]/[Baseline])
  Gen 0     : GC Generation 0 collects per 1000 operations
  Gen 1     : GC Generation 1 collects per 1000 operations
  Gen 2     : GC Generation 2 collects per 1000 operations
  Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  1 s       : 1 Second (1 sec)

// * Diagnostic Output - MemoryDiagnoser *


// ***** BenchmarkRunner: End *****
// ** Remained 0 benchmark(s) to run **
Run time: 00:00:40 (40.77 sec), executed benchmarks: 3

Global total time: 00:00:44 (44.48 sec), executed benchmarks: 3
// * Artifacts cleanup *
于 2021-03-04T11:10:42.610 回答