c# - .net 中的 IL 和堆栈实现？

Question

我编写了一个简单的程序来检查 IL 的工作原理：

void Main()
{

 int a=5;
 int b=6;
 if (a<b) Console.Write("333");
 Console.ReadLine();
}

伊利诺伊：

IL_0000:  ldc.i4.5    
IL_0001:  stloc.0     
IL_0002:  ldc.i4.6    
IL_0003:  stloc.1     
IL_0004:  ldloc.0     
IL_0005:  ldloc.1     
IL_0006:  bge.s       IL_0012
IL_0008:  ldstr       "333"
IL_000D:  call        System.Console.Write
IL_0012:  call        System.Console.ReadLine

我试图了解实现的效率：

在第 1 行（IL 代码），它将值 5 推入堆栈（4 个字节，即 int32）
在第 #2 行（IL 代码），它从堆栈弹出到局部变量中。

接下来的 2 行也是如此。

然后，它将这些局部变量加载到堆栈上，然后进行评估bge.s。

问题 #1

为什么他将局部变量加载到堆栈中？这些值已经在堆栈中。但他弹出它们是为了将它们放入局部变量中。这不是浪费吗？

我的意思是，为什么代码不能是这样的：

IL_0000:  ldc.i4.5
IL_0001:  ldc.i4.6    
IL_0002:  bge.s       IL_0004
IL_0003:  ldstr       "333"
IL_0004:  call        System.Console.Write
IL_0005:  call        System.Console.ReadLine

我的代码示例只有 5 行代码。50,000,000 行代码呢？IL会发出大量额外的代码

问题2

看代码地址：

在此处输入图像描述

IL_0009 地址在哪里？它不应该是顺序的吗？

ps Im 带有优化标志 + 发布模式

score 10 · Accepted Answer

我可以轻松回答第二个问题。指令是可变长度的。例如，ldstr "333"由ldstr(at address 8) 的操作码和表示字符串的数据（对用户字符串表中的字符串的引用）组成。

call与下面的语句类似- 您需要call操作码本身以及要调用的函数的信息。

将像 4 或 6 这样的小值压入堆栈的指令没有额外数据的原因是因为这些值被编码到操作码本身中。

有关说明和编码，请参见此处。

关于第一个问题，您可能想看看C# 开发人员之一 Eric Lippert 的这篇博客文章，其中指出：

/optimize 标志不会大量改变我们的发射和生成逻辑。我们尝试始终生成简单、可验证的代码，然后在生成真实机器代码时依靠抖动进行繁重的优化。

score 7 · Accepted Answer

为什么他将局部变量加载到堆栈中？这些值已经在堆栈中。但是他弹出它们是为了将它们放入局部变量中。不是浪费吗？

浪费什么？您必须记住，IL（通常）不是按原样执行的，而是由执行大部分优化的 JIT 编译器再次编译的。使用“中间语言”的要点之一是可以在一个地方实现优化：JIT 编译器和每种语言（C#、VB.NET、F#，...）不必重新实现它们。Eric Lippert 在他的文章Why IL？

IL_0009 地址在哪里？不应该是顺序的吗？

让我们看一下ldstr指令的规范（来自ECMA-335）：

III.4.16 ldstr– 加载文字字符串

格式：72 <T> […]

该ldstr指令推送一个新的字符串对象，该对象将存储在元数据中的文字表示为字符串（这是一个字符串文字）。

上面提到的元数据和 <T> 意味着72指令的字节后面跟着一个元数据标记，它指向一个包含字符串的表。这样的令牌有多大？从同一文件的第 III.1.9 节：

许多 CIL 指令后跟一个“元数据令牌”。这是一个 4 字节的值，它指定元数据表中的一行 […]

因此，在您的情况下，72指令的字节位于地址 0008 处，而令牌（在本例中为 0x70000001，其中 0x70 字节表示用户字符串表）位于地址 0009 到 000C 处。

score 6 · Accepted Answer

在这个级别上对 IL 效率进行推理是没有意义的。

JIT 将完全消除堆栈，将所有堆栈操作转换为中间三地址代码（并进一步向下转换为 SSA）。由于 IL永远不会被解释，因此堆栈操作不应该是高效和优化的。

例如，请参阅开源 Mono 实现。

score 0 · Accepted Answer

To give a final answer to all this discussion about "extra code".

The C# compiler reads int a=5; and translates that to:

ldc.i4.5
stloc.0

Then it goes to the next line and reads int b=6; and that is translated to:

ldc.i4.6
stloc.1

And then it reads the next line with the if statement and so on.

When compiling from C# to IL it reads line by line and translates that line to IL, not that line when looking at other lines.

To optimize the IL and remove the "extra code" (that you call it) in this stage the C# compiler would have to check all the IL code, build a tree representation of it, remove all unneeded nodes and then write it as IL again. This is not something that the C# compiler should do since this will be done by the JIT compiler when going from IL to machine language.

So the code that you see as extra is not extra code, it is part of the statements that the C# compiler has read from your C# code and will be removed when the JIT compiler compiles the code to a native executable.

This was a high level explanation of how the C# code is translated since I don't think that you have taken any classes in compiler construction or anything like that. If you want to know more there are books and pages on the internet to read.

c# - .net 中的 IL 和堆栈实现？

4 回答 4

Related

Reference