58

假设有以下代码:

private static int DoSwitch(string arg)
{
    switch (arg)
    {
        case "a": return 0;
        case "b": return 1;
        case "c": return 2;
        case "d": return 3;
    }
    return -1;
}

private static Dictionary<string, Func<int>> dict = new Dictionary<string, Func<int>>
    {
        {"a", () => 0 },
        {"b", () => 1 },
        {"c", () => 2 },
        {"d", () => 3 },
    };

private static int DoDictionary(string arg)
{
    return dict[arg]();
}

通过迭代这两种方法并进行比较,我发现字典稍微快了一点,即使“a”、“b”、“c”、“d”被扩展为包含更多键。为什么会这样?

这与圈复杂度有关吗?是因为jitter只将字典中的return语句编译为本机代码一次吗?是不是因为字典的查找是 O(1),而switch 语句可能不是这种情况?(这些只是猜测)

4

4 回答 4

57

简短的回答是 switch 语句以线性方式执行,而字典以对数方式执行。

在 IL 级别,一个小的 switch 语句通常被实现为一系列 if-elseif 语句,比较切换变量和每个 case 的相等性。因此,该语句将在与 myVar 的有效选项数量成线性比例的时间内执行;这些案例将按照它们出现的顺序进行比较,最坏的情况是所有比较都已尝试,最后一个匹配或不匹配。因此,对于 32 个选项,最坏的情况是它们都不是,代码将进行 32 次比较来确定这一点。

另一方面,字典使用索引优化的集合来存储值。在 .NET 中,字典基于 Hashtable,它具有有效的恒定访问时间(缺点是空间效率极差)。通常用于“映射”集合(如字典)的其他选项包括平衡树结构,如红黑树,它提供对数访问(和线性空间效率)。任何这些都将允许代码在集合中找到与正确“案例”相对应的键(或确定它不存在),这比 switch 语句可以做的快得多。

编辑:其他答案和评论者已经谈到了这一点,所以为了完整起见,我也会这样做。正如我最初推断的那样,Microsoft 编译器并不总是将 switch 编译为 if/elseif。它通常在少量案例和/或“稀疏”案例(非增量值,如 1、200、4000)的情况下这样做。对于较大的相邻案例集,编译器将使用 CIL 语句将开关转换为“跳转表”。对于大量稀疏情况,编译器可以实现二进制搜索以缩小范围,然后“遍历”少量稀疏情况或为相邻情况实现跳转表。

但是,编译器通常会选择性能和空间效率最佳折衷的实现,因此它只会在大量密集打包的情况下使用跳转表。这是因为跳转表在内存中需要一个它必须覆盖的案例范围的空间,这对于稀疏案例来说在内存方面是非常低效的。通过在源代码中使用 Dictionary,您基本上可以强制编译器使用;它会按照你的方式去做,而不是为了提高内存效率而牺牲性能。

因此,我希望在大多数情况下,可以在源代码中使用 switch 语句或 Dictionary 以在使用 Dictionary 时表现更好。无论如何,要避免 switch 语句中的大量情况,因为它们的可维护性较差。

于 2012-07-23T17:29:10.200 回答
51

这是一个很好的例子,说明了为什么微基准可能会产生误导。C# 编译器会根据 switch/case 的大小生成不同的 IL。所以打开这样的字符串

switch (text) 
{
     case "a": Console.WriteLine("A"); break;
     case "b": Console.WriteLine("B"); break;
     case "c": Console.WriteLine("C"); break;
     case "d": Console.WriteLine("D"); break;
     default: Console.WriteLine("def"); break;
}

生成对每种情况基本上执行以下操作的 IL:

L_0009: ldloc.1 
L_000a: ldstr "a"
L_000f: call bool [mscorlib]System.String::op_Equality(string, string)
L_0014: brtrue.s L_003f

然后

L_003f: ldstr "A"
L_0044: call void [mscorlib]System.Console::WriteLine(string)
L_0049: ret 

即它是一系列的比较。所以运行时间是线性的。

但是,添加额外的情况,例如包括所有来自 az 的字母,会更改为每个生成的 IL:

L_0020: ldstr "a"
L_0025: ldc.i4.0 
L_0026: call instance void [mscorlib]System.Collections.Generic.Dictionary`2<string, int32>::Add(!0, !1)

L_0176: ldloc.1 
L_0177: ldloca.s CS$0$0001
L_0179: call instance bool [mscorlib]System.Collections.Generic.Dictionary`2<string, int32>::TryGetValue(!0, !1&)
L_017e: brfalse L_0314

最后

L_01f6: ldstr "A"
L_01fb: call void [mscorlib]System.Console::WriteLine(string)
L_0200: ret 

即它现在使用字典而不是一系列字符串比较,从而获得字典的性能。

换句话说,为这些生成的 IL 代码是不同的,这只是在 IL 级别。JIT 编译器可能会进一步优化。

TL;DR:所以这个故事的精神是查看真实数据和配置文件,而不是尝试基于微基准进行优化。

于 2012-07-23T17:57:47.383 回答
4

As with many questions involving compiler codegen decisions, the answer is "it depends".

Building your own hash table will probably run faster than compiler generated code in many cases because the compiler has other cost metrics it is trying to balance that you are not: primarily, memory consumption.

A hash table will use more memory than a handful of if-then-else IL instructions. If the compiler spit out a hash table for every switch statement in a program, memory use would explode.

As the number of case blocks in the switch statement grows, you will probably see the compiler produce different code. With more cases, there is greater justification for the compiler to abandon small and simple if-then-else patterns in favor of faster but fatter alternatives.

I don't know offhand if the C# or JIT compilers perform this particular optimization, but a common compiler trick for switch statements when the case selectors are many and mostly sequential is to compute a jump vector. This requires more memory (in the form of compiler generated jump tables embedded in the code stream) but executes in constant time. Subtract arg - "a", use result as index into jump table to jump to appropriate case block. Boom, yer done, regardless of whether there are 20 or 2000 cases.

A compiler is more likely to shift into jump table mode when the switch selector type is char or int or enum and the values of the case selectors are mostly sequential ("dense"), since these types can be easily subtracted to create an offset or index. String selectors are a little harder.

String selectors are "interned" by the C# compiler, meaning the compiler adds the string selectors values to an internal pool of unique strings. The address or token of an interned string can be used as its identity, allowing for int-like optimizations when comparing intern'd strings for identity / byte-wise equality. With sufficient case selectors, the C# compiler will produce IL code that looks up the intern'd equivalent of the arg string (hash table lookup), then compares (or jump tables) the interned token with the precomputed case selector tokens.

If you can coax the compiler into producing jump-table code in the char/int/enum selector case, this can execute faster than using your own hash table.

For the string selector case, the IL code still has to do a hash lookup so any performance difference from using your own hash table is probably a wash.

In general, though, you shouldn't dwell on these compiler nuances too much when writing application code. Switch statements are generally much easier to read and understand than a hash table of function pointers. Switch statements that are big enough to push the compiler into jump table mode are often too big to be humanly readable.

If you find a switch statement is in a performance hotspot of your code, and you have measured with a profiler that it has tangible performance impact, then changing your code to use your own dictionary is a reasonable tradeoff for a performance gain.

Writing your code to use a hash table from the start, with no performance measurements to justify this choice, is over-engineering that will lead to unfathomable code with an unnecessarily higher maintenance cost. Keep It Simple.

于 2012-07-23T18:26:30.550 回答
2

默认情况下,字符串上的开关的实现类似于 if / else / if / else 构造。正如 Brian 所建议的,当它变大时,编译器会将开关转换为哈希表。Bart de Smet在这个 channel9 视频中展示了这一点,(开关在 13:50 讨论)

编译器没有针对 4 个项目执行此操作,因为它是保守的,以防止优化的成本超过收益。构建哈希表会花费一些时间和内存。

于 2012-07-23T17:58:15.000 回答