c++ - 在理解 C++ 程序的编译方面需要帮助

Question

我没有正确理解 C++ 程序的编译和链接。有没有办法，我可以查看通过编译 C++ 程序（以可理解的格式）生成的目标文件。这应该有助于我理解目标文件的格式、如何编译 C++ 类、编译器需要哪些信息来生成目标文件并帮助我理解以下语句：

如果一个类只用作输入参数和返回类型，我们不需要包含整个类头文件。前向声明就足够了，但是如果派生类从基类派生，我们需要包含包含基类定义的文件（取自“Exceptional C++”）。

我正在阅读“链接和加载”一书以了解目标文件的格式，但我更喜欢专门为 C++ 源代码量身定制的东西。

谢谢，

贾格拉蒂

编辑：

我知道使用 nm 我可以查看目标文件中存在的符号，但我有兴趣了解有关目标文件的更多信息。

score 1 · Accepted Answer

First things, first. Disassembling the compiler output will most probably not help you in any way to understand any of the issues you have. The output of the compiler is no longer a c++ program, but plain assembly and that is really harsh to read if you do not know what the memory model is.

On the particular issues of why is the definition of base required when you declare it to be a base class of derived there are a few different reasons (and probably more that I am forgetting):

When an object of type derived is created, the compiler must reserve memory for the full instance and all subclasses: it must know the size of base
When you access a member attribute the compiler must know the offset from the implicit this pointer, and that offset requires knowledge of the size taken by the base subobject.
When an identifier is parsed in the context of derived and the identifier is not found in derived class, the compiler must know whether it is defined in base before looking for the identifier in the enclosing namespaces. The compiler cannot know whether foo(); is a valid call inside derived::function() if foo() is declared in the base class.
The number and signatures of all virtual functions defined in base must be known when the compiler defines the derived class. It needs that information to build the dynamic dispatch mechanism --usually vtable--, and even to know whether a member function in derived is bound for dynamic dispatch or not --if base::f() is virtual, then derived::f() will be virtual regardless of whether the declaration in derived has the virtual keyword.
Multiple inheritance adds a few other requirements --like relative offsets from each baseX that must be rewritting before final overriders for the methods are called (a pointer of type base2 that points to an object of multiplyderived does not point to the beginning of the instance, but to the beginning of the base2 subobject in the instance, which might be offsetted by other bases declared before base2 in the inheritance list.

To the last question in the comments:

So doesn't instantiation of objects (except for global ones) can wait until runtime and thus the size and offset etc could wait until link time and we shouldn't necessarily have to deal with it at the time we are generating object files?

void f() {
   derived d;
   //...
}

The previous code allocates and object of type derived in the stack. The compiler will add assembler instructions to reserve some amount of memory for the object in the stack. After the compiler has parsed and generated the assembly, there is no trace of the object, in particular (assuming a trivial constructor for a POD type: i.e. nothing is initialized), that code and void f() { char array[ sizeof(derived) ]; } will produce exactly the same assembler. When the compiler generates the instruction that will reserve the space, it needs to know how much.

score 0 · Accepted Answer

您是否尝试过检查您的二进制文件readelf（前提是您在 Linux 平台上）？这提供了有关 ELF 目标文件的非常全面的信息。

不过，老实说，我不确定这对理解编译和链接有多大帮助。我认为正确的方法可能是掌握 C++ 代码如何映射到程序集预链接和后链接。

score 0 · Accepted Answer

我正在阅读“ http://www.network-theory.co.uk/docs/gccintro/”-“GCC简介”。这使我对链接和编译有了很好的了解。它处于初学者水平，但我不在乎。

score 0 · Accepted Answer

您通常不需要详细了解 Obj 文件的内部格式，因为它们是为您生成的。您需要知道的是，对于您创建的每个类，编译器都会生成一个 Obj 文件，它是您的类的二进制字节码，适用于您正在编译的操作系统。然后下一步 - 链接 - 将把程序所需的所有类的目标文件放在一个 EXE 或 DLL（或非 Windows 操作系统的任何其他格式）中。也可以是 EXE + 几个 DLL，这取决于您的意愿。

最重要的是您将类的接口（声明）和实现（定义）分开。

始终只放入类的头文件接口声明。没有别的了——这里没有实现。还要避免使用不是指针的自定义类型的成员变量，因为对于它们来说，前向声明是不够的，您需要在头文件中包含其他头文件。如果您的标题中有包含，那么设计会产生异味并且还会减慢构建过程。

类方法或其他函数的所有实现都应该在 CPP 文件中。这将保证当有人包含您的标头并且您只能在 CPP 文件中包含其他人的包含时，不需要由编译器生成的 Obj 文件。

但是为什么要打扰呢？答案是，如果你有这样的分离，那么链接会更快，因为你的每个 Obj 文件在每个类中使用一次。此外，如果您更改您的类，这也会在下一次构建期间更改少量其他目标文件。

如果你在头文件中包含，这意味着当编译器为你的类生成 Obj 文件时，它应该首先为你的头文件中包含的其他类生成 Obj 文件，这可能再次需要其他 Obj 文件等等。甚至可能是循环依赖，然后您无法编译！或者，如果您更改了类中的某些内容，那么编译器将需要重新生成许多其他 Obj 文件，因为如果您不分开，它们会在一段时间后变得非常紧密依赖。

score 0 · Accepted Answer

nm是一个 unix 工具，它将向您显示目标文件中符号的名称。

objdump是一个 GNU 工具，它将向您显示更多信息。

但是这两种工具都会向您显示链接器使用的非常原始的信息，但不是为人类阅读而设计的。这可能不会帮助您更好地理解在 C++ 级别发生的事情。

c++ - 在理解 C++ 程序的编译方面需要帮助

5 回答 5

Related

Reference