我正在生成一个表达式树,它将属性从源对象映射到目标对象,然后将其编译为 aFunc<TSource, TDestination, TDestination>
并执行。
这是结果的调试视图LambdaExpression
:
.Lambda #Lambda1<System.Func`3[MemberMapper.Benchmarks.Program+ComplexSourceType,MemberMapper.Benchmarks.Program+ComplexDestinationType,MemberMapper.Benchmarks.Program+ComplexDestinationType]>(
MemberMapper.Benchmarks.Program+ComplexSourceType $right,
MemberMapper.Benchmarks.Program+ComplexDestinationType $left) {
.Block(
MemberMapper.Benchmarks.Program+NestedSourceType $Complex$955332131,
MemberMapper.Benchmarks.Program+NestedDestinationType $Complex$2105709326) {
$left.ID = $right.ID;
$Complex$955332131 = $right.Complex;
$Complex$2105709326 = .New MemberMapper.Benchmarks.Program+NestedDestinationType();
$Complex$2105709326.ID = $Complex$955332131.ID;
$Complex$2105709326.Name = $Complex$955332131.Name;
$left.Complex = $Complex$2105709326;
$left
}
}
清理后会是:
(left, right) =>
{
left.ID = right.ID;
var complexSource = right.Complex;
var complexDestination = new NestedDestinationType();
complexDestination.ID = complexSource.ID;
complexDestination.Name = complexSource.Name;
left.Complex = complexDestination;
return left;
}
这是映射这些类型的属性的代码:
public class NestedSourceType
{
public int ID { get; set; }
public string Name { get; set; }
}
public class ComplexSourceType
{
public int ID { get; set; }
public NestedSourceType Complex { get; set; }
}
public class NestedDestinationType
{
public int ID { get; set; }
public string Name { get; set; }
}
public class ComplexDestinationType
{
public int ID { get; set; }
public NestedDestinationType Complex { get; set; }
}
执行此操作的手动代码是:
var destination = new ComplexDestinationType
{
ID = source.ID,
Complex = new NestedDestinationType
{
ID = source.Complex.ID,
Name = source.Complex.Name
}
};
问题是,当我编译LambdaExpression
和基准测试结果delegate
时,它比手动版本慢约 10 倍。我不知道为什么会这样。关于这一点的整个想法是在没有繁琐的手动映射的情况下实现最高性能。
当我从 Bart de Smet 的博客文章中获取有关该主题的代码并将计算素数的手动版本与编译的表达式树进行基准测试时,它们的性能完全相同。
当调试视图LambdaExpression
看起来像您所期望的那样时,什么会导致这种巨大的差异?
编辑
根据要求,我添加了我使用的基准:
public static ComplexDestinationType Foo;
static void Benchmark()
{
var mapper = new DefaultMemberMapper();
var map = mapper.CreateMap(typeof(ComplexSourceType),
typeof(ComplexDestinationType)).FinalizeMap();
var source = new ComplexSourceType
{
ID = 5,
Complex = new NestedSourceType
{
ID = 10,
Name = "test"
}
};
var sw = Stopwatch.StartNew();
for (int i = 0; i < 1000000; i++)
{
Foo = new ComplexDestinationType
{
ID = source.ID + i,
Complex = new NestedDestinationType
{
ID = source.Complex.ID + i,
Name = source.Complex.Name
}
};
}
sw.Stop();
Console.WriteLine(sw.Elapsed);
sw.Restart();
for (int i = 0; i < 1000000; i++)
{
Foo = mapper.Map<ComplexSourceType, ComplexDestinationType>(source);
}
sw.Stop();
Console.WriteLine(sw.Elapsed);
var func = (Func<ComplexSourceType, ComplexDestinationType, ComplexDestinationType>)
map.MappingFunction;
var destination = new ComplexDestinationType();
sw.Restart();
for (int i = 0; i < 1000000; i++)
{
Foo = func(source, new ComplexDestinationType());
}
sw.Stop();
Console.WriteLine(sw.Elapsed);
}
可以理解,第二个比手动执行要慢,因为它涉及字典查找和一些对象实例化,但第三个应该与调用的原始委托一样快,并且转换 fromDelegate
发生Func
在循环之外。
我也尝试将手动代码包装在一个函数中,但我记得它并没有产生明显的差异。无论哪种方式,函数调用都不应该增加一个数量级的开销。
我还做了两次基准测试,以确保 JIT 没有干扰。
编辑
您可以在此处获取此项目的代码:
https://github.com/JulianR/MemberMapper/
我使用了 Bart de Smet 在该博客文章中描述的 Sons-of-Strike 调试器扩展来转储生成的动态方法的 IL:
IL_0000: ldarg.2
IL_0001: ldarg.1
IL_0002: callvirt 6000003 ComplexSourceType.get_ID()
IL_0007: callvirt 6000004 ComplexDestinationType.set_ID(Int32)
IL_000c: ldarg.1
IL_000d: callvirt 6000005 ComplexSourceType.get_Complex()
IL_0012: brfalse IL_0043
IL_0017: ldarg.1
IL_0018: callvirt 6000006 ComplexSourceType.get_Complex()
IL_001d: stloc.0
IL_001e: newobj 6000007 NestedDestinationType..ctor()
IL_0023: stloc.1
IL_0024: ldloc.1
IL_0025: ldloc.0
IL_0026: callvirt 6000008 NestedSourceType.get_ID()
IL_002b: callvirt 6000009 NestedDestinationType.set_ID(Int32)
IL_0030: ldloc.1
IL_0031: ldloc.0
IL_0032: callvirt 600000a NestedSourceType.get_Name()
IL_0037: callvirt 600000b NestedDestinationType.set_Name(System.String)
IL_003c: ldarg.2
IL_003d: ldloc.1
IL_003e: callvirt 600000c ComplexDestinationType.set_Complex(NestedDestinationType)
IL_0043: ldarg.2
IL_0044: ret
我不是 IL 的专家,但这看起来很简单,正是你所期望的,不是吗?那为什么这么慢呢?没有奇怪的装箱操作,没有隐藏的实例化,什么都没有。它与上面的表达式树不完全相同,因为现在也有一个null
检查right.Complex
。
这是手动版本的代码(通过 Reflector 获得):
L_0000: ldarg.1
L_0001: ldarg.0
L_0002: callvirt instance int32 ComplexSourceType::get_ID()
L_0007: callvirt instance void ComplexDestinationType::set_ID(int32)
L_000c: ldarg.0
L_000d: callvirt instance class NestedSourceType ComplexSourceType::get_Complex()
L_0012: brfalse.s L_0040
L_0014: ldarg.0
L_0015: callvirt instance class NestedSourceType ComplexSourceType::get_Complex()
L_001a: stloc.0
L_001b: newobj instance void NestedDestinationType::.ctor()
L_0020: stloc.1
L_0021: ldloc.1
L_0022: ldloc.0
L_0023: callvirt instance int32 NestedSourceType::get_ID()
L_0028: callvirt instance void NestedDestinationType::set_ID(int32)
L_002d: ldloc.1
L_002e: ldloc.0
L_002f: callvirt instance string NestedSourceType::get_Name()
L_0034: callvirt instance void NestedDestinationType::set_Name(string)
L_0039: ldarg.1
L_003a: ldloc.1
L_003b: callvirt instance void ComplexDestinationType::set_Complex(class NestedDestinationType)
L_0040: ldarg.1
L_0041: ret
长得跟我一模一样。。
编辑
我关注了 Michael B 关于该主题的回答中的链接。我尝试在接受的答案中实施这个技巧,它奏效了!如果您想总结一下技巧:它会创建一个动态程序集并将表达式树编译为该程序集中的静态方法,并且由于某种原因快了 10 倍。这样做的一个缺点是我的基准类是内部的(实际上,公共类嵌套在内部类中),当我尝试访问它们时它抛出异常,因为它们不可访问。似乎没有解决方法,但我可以简单地检测引用的类型是否是内部的,并决定使用哪种编译方法。
但仍然困扰我的是为什么素数方法在性能上与编译的表达式树相同。
再次,我欢迎任何人在该 GitHub 存储库中运行代码以确认我的测量结果并确保我没有发疯:)