7

I am writting a thread safe object that basically represents a double and uses a lock to ensure safe reading and writing. I use many of these objects (20-30) in a piece of code that is reading and writing them all 100 times per second, and I am measuring the average computation time of each of these time steps. I started looking at a few options for implementations of my getter and after running many tests and collecting many samples to average out my measurement of computation time I find certain implementations perform consistently better than others, but not the implementations I would expect.

Implementation 1) Computation time average = 0.607ms:

protected override double GetValue()
{
    lock(_sync)
    {
        return _value;
    }
}

Implementation 2) Computation time average = 0.615ms:

protected override double GetValue()
{
    double result;
    lock(_sync)
    {
        result = _value;
    }
    return result;
}

Implementation 3) Computation time average = 0.560ms:

protected override double GetValue()
{
    double result = 0;
    lock(_sync)
    {
        result = _value;
    }
    return result;
}

What I expected: I had expected to see implementation 3 be the worst of the 3 (this was actually my original code, so it was chance or lazy coding that I had written it this way), but surprisingly it is consistently the best in terms of performance. I would expect implementation 1 to be the fastest. I also expected implementation 2 to be at least as fast, if not faster than implementation 3 since I am just removing an assignment to the double result that is overwritten anyways, so it is unnecessary.

My question is: can anyone explain why these 3 implementations have the relative performance that I have measured? It seems counter-intuitive to me and I would really like to know why.

I realize that these differences are not major, but their relative measure is consistent every time I run the test, collecting thousands of samples each test to average out the computation time. Also, please keep in mind I am doing these tests because my application requires very high performance, or at least as good as I can reasonably get it. My test case is just a small test case and a my code's performance will be important when running in release.

EDIT: note that I am using MonoTouch and running the code on an iPad Mini device, so perhaps it's nothing related to c# and more something related to MonoTouch's cross compiler.

4

2 回答 2

15

坦率地说,这里还有其他更好的方法。以下输出(忽略用于 JIT 的 x1):

x5000000
Example1        128ms
Example2        136ms
Example3        129ms
CompareExchange 53ms
ReadUnsafe      54ms
UntypedBox      23ms
TypedBox        12ms

x5000000
Example1        129ms
Example2        129ms
Example3        129ms
CompareExchange 52ms
ReadUnsafe      53ms
UntypedBox      23ms
TypedBox        12ms

x5000000
Example1        129ms
Example2        161ms
Example3        129ms
CompareExchange 52ms
ReadUnsafe      53ms
UntypedBox      23ms
TypedBox        12ms

所有这些都是线程安全的实现。如您所见,最快的是有类型的框,然后是无类型的 ( object) 框。接下来是(以大约相同的速度)Interlocked.CompareExchange/ Interlocked.Read- 请注意后者仅支持long,因此我们需要进行一些位抨击以将其视为double.

显然,在您的目标框架上进行测试。

为了好玩,我还测试了一个Mutex;在相同规模的测试中,大约需要 3300 毫秒。

using System;
using System.Diagnostics;
using System.Threading;
abstract class Experiment
{
    public abstract double GetValue();
}
class Example1 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        lock (_sync)
        {
            return _value;
        }
    }
}
class Example2 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        lock (_sync)
        {
            return _value;
        }
    }
}

class Example3 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        double result = 0;
        lock (_sync)
        {
            result = _value;
        }
        return result;
    }
}

class CompareExchange : Experiment
{
    private double _value = 3;
    public override double GetValue()
    {
        return Interlocked.CompareExchange(ref _value, 0, 0);
    }
}
class ReadUnsafe : Experiment
{
    private long _value = DoubleToInt64(3);
    static unsafe long DoubleToInt64(double val)
    {   // I'm mainly including this for the field initializer
        // in real use this would be manually inlined
        return *(long*)(&val);
    }
    public override unsafe double GetValue()
    {
        long val = Interlocked.Read(ref _value);
        return *(double*)(&val);
    }
}
class UntypedBox : Experiment
{
    // references are always atomic
    private volatile object _value = 3.0;
    public override double GetValue()
    {
        return (double)_value;
    }
}
class TypedBox : Experiment
{
    private sealed class Box
    {
        public readonly double Value;
        public Box(double value) { Value = value; }

    }
    // references are always atomic
    private volatile Box _value = new Box(3);
    public override double GetValue()
    {
        return _value.Value;
    }
}
static class Program
{
    static void Main()
    {
        // once for JIT
        RunExperiments(1);
        // three times for real
        RunExperiments(5000000);
        RunExperiments(5000000);
        RunExperiments(5000000);
    }
    static void RunExperiments(int loop)
    {
        Console.WriteLine("x{0}", loop);
        RunExperiment(new Example1(), loop);
        RunExperiment(new Example2(), loop);
        RunExperiment(new Example3(), loop);
        RunExperiment(new CompareExchange(), loop);
        RunExperiment(new ReadUnsafe(), loop);
        RunExperiment(new UntypedBox(), loop);
        RunExperiment(new TypedBox(), loop);
        Console.WriteLine();
    }
    static void RunExperiment(Experiment test, int loop)
    {
        // avoid any GC interruptions
        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
        GC.WaitForPendingFinalizers();

        double val = 0;
        var watch = Stopwatch.StartNew();
        for (int i = 0; i < loop; i++)
            val = test.GetValue();
        watch.Stop();
        if (val != 3.0) Console.WriteLine("FAIL!");
        Console.WriteLine("{0}\t{1}ms", test.GetType().Name,
            watch.ElapsedMilliseconds);

    }

}
于 2013-04-12T08:54:14.843 回答
6

仅测量并发读取具有误导性,您的缓存将为您提供比实际用例更好的结果数量级。所以我将 SetValue 添加到 Marc 的示例中:

using System;
using System.Diagnostics;
using System.Threading;

abstract class Experiment
{
    public abstract double GetValue();
    public abstract void SetValue(double value);
}

class Example1 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        lock (_sync)
        {
            return _value;
        }
    }

    public override void SetValue(double value)
    {
        lock (_sync)
        {
            _value = value;
        }

    }

}
class Example2 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        lock (_sync)
        {
            return _value;
        }
    }

    public override void SetValue(double value)
    {
        lock (_sync)
        {
            _value = value;
        }
    }

}



class Example3 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        double result = 0;
        lock (_sync)
        {
            result = _value;
        }
        return result;
    }

    public override void SetValue(double value)
    {
        lock (_sync)
        {
            _value = value;
        }
    }
}

class CompareExchange : Experiment
{
    private double _value = 3;
    public override double GetValue()
    {
        return Interlocked.CompareExchange(ref _value, 0, 0);
    }

    public override void SetValue(double value)
    {
        Interlocked.Exchange(ref _value, value);
    }
}
class ReadUnsafe : Experiment
{
    private long _value = DoubleToInt64(3);
    static unsafe long DoubleToInt64(double val)
    {   // I'm mainly including this for the field initializer
        // in real use this would be manually inlined
        return *(long*)(&val);
    }
    public override unsafe double GetValue()
    {
        long val = Interlocked.Read(ref _value);
        return *(double*)(&val);
    }

    public override void SetValue(double value)
    {
        long intValue = DoubleToInt64(value);
        Interlocked.Exchange(ref _value, intValue);
    }
}
class UntypedBox : Experiment
{
    // references are always atomic
    private volatile object _value = 3.0;
    public override double GetValue()
    {
        return (double)_value;
    }

    public override void SetValue(double value)
    {
        object valueObject = value;
        _value = valueObject;
    }
}
class TypedBox : Experiment
{
    private sealed class Box
    {
        public readonly double Value;
        public Box(double value) { Value = value; }

    }
    // references are always atomic
    private volatile Box _value = new Box(3);
    public override double GetValue()
    {
        Box value = _value;
        return value.Value;
    }

    public override void SetValue(double value)
    {
        Box boxValue = new Box(value);
        _value = boxValue;
    }
}
static class Program
{
    static void Main()
    {
        // once for JIT
        RunExperiments(1);
        // three times for real
        RunExperiments(5000000);
        RunExperiments(5000000);
        RunExperiments(5000000);
    }
    static void RunExperiments(int loop)
    {
        Console.WriteLine("x{0}", loop);
        RunExperiment(new Example1(), loop);
        RunExperiment(new Example2(), loop);
        RunExperiment(new Example3(), loop);
        RunExperiment(new CompareExchange(), loop);
        RunExperiment(new ReadUnsafe(), loop);
        RunExperiment(new UntypedBox(), loop);
        RunExperiment(new TypedBox(), loop);
        Console.WriteLine();
    }
    static void RunExperiment(Experiment test, int loop)
    {
        // avoid any GC interruptions
        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
        GC.WaitForPendingFinalizers();

        int threads = Environment.ProcessorCount;

        ManualResetEvent done = new ManualResetEvent(false);

        // Since we use threads, divide the original workload
        //
        int workerLoop = Math.Max(1, loop / Environment.ProcessorCount);
        int writeRatio = 1000;
        int writes = Math.Max(workerLoop / writeRatio, 1);
        int reads = workerLoop / writes;

        var watch = Stopwatch.StartNew();

        for (int t = 0; t < Environment.ProcessorCount; ++t)
        {
            ThreadPool.QueueUserWorkItem((state) =>
                {
                    try
                    {
                        double val = 0;

                        // Two loops to avoid comparison for % in the inner loop
                        //
                        for (int j = 0; j < writes; ++j)
                        {
                            test.SetValue(j);
                            for (int i = 0; i < reads; i++)
                            {
                                val = test.GetValue();
                            }
                        }
                    }
                    finally
                    {
                        if (0 == Interlocked.Decrement(ref threads))
                        {
                            done.Set();
                        }
                    }
                });
        }
        done.WaitOne();
        watch.Stop();
        Console.WriteLine("{0}\t{1}ms", test.GetType().Name,
            watch.ElapsedMilliseconds);

    }
}

结果是,在 1000:1 读:写比率下:

x5000000
Example1        353ms
Example2        395ms
Example3        369ms
CompareExchange 150ms
ReadUnsafe      161ms
UntypedBox      11ms
TypedBox        9ms

100:1(读:写)

x5000000
Example1        356ms
Example2        360ms
Example3        356ms
CompareExchange 161ms
ReadUnsafe      172ms
UntypedBox      14ms
TypedBox        13ms

10:1(读:写)

x5000000
Example1        383ms
Example2        394ms
Example3        414ms
CompareExchange 169ms
ReadUnsafe      176ms
UntypedBox      41ms
TypedBox        43ms

2:1(读:写)

x5000000
Example1        550ms
Example2        581ms
Example3        560ms
CompareExchange 257ms
ReadUnsafe      292ms
UntypedBox      101ms
TypedBox        122ms

1:1(读:写)

x5000000
Example1        718ms
Example2        745ms
Example3        730ms
CompareExchange 381ms
ReadUnsafe      376ms
UntypedBox      161ms
TypedBox        200ms

*更新了代码以删除写入时不必要的 ICX 操作,因为该值总是被覆盖。还修复了计算除以线程的读取数的公式(相同的工作)。

于 2013-04-12T12:22:51.707 回答