c - 跨平台 VM 的 C 内存管理

Question

我问了一个关于C 型尺寸的问题，我得到了一个很好的答案，但我意识到我可能无法很好地表述这个问题以对我的目的有用。

在转到软件工程师之前，我的背景是计算机工程师，所以我喜欢计算机体系结构，并且一直在考虑制作 VM。我刚刚完成了一个在 Java 上制作 VM 的有趣项目，对此我感到非常自豪。但是有一些法律问题我现在不能开源它，我目前有一些空闲时间。所以我想看看我是否可以在 C 上制作另一个 VM（速度更快），只是为了好玩和教育。

问题是我不是 C 程序，上一次我写一个非琐碎的 C 问题是在 10 多年前。我是 Pascal、Delphi，现在是 Java 和 PHP 程序员。

我可以预见到许多障碍，我正在努力解决一个障碍，那就是访问现有的库（在 Java 中，反射解决了这个问题）。

我计划通过数据缓冲区（类似于堆栈）来解决这个问题。我的 VM 的客户端可以在给我指向本机函数的指针之前将数据放入这些堆栈中。

int main(void) {
    // Prepare stack
    int   aStackSize = 1024*4;
    char *aStackData = malloc(aStackSize);

    // Initialise stack
    VMStack aStack;
    VMStack_Initialize(&aStack, (char *)aStackData, aStackSize);

    // Push in the parameters
    char *Params = VMStack_CurrentPointer(&aStack);
    VMStack_Push_int   (&aStack, 10  ); // Push an int
    VMStack_Push_double(&aStack, 15.3); // Push a double

    // Prepare space for the expected return
    char *Result = VMStack_CurrentPointer(&aStack);
    VMStack_Push_double(&aStack, 0.0); // Push an empty double for result

    // Execute
    void (*NativeFunction)(char*, char*) = &Plus;
    NativeFunction(Params, Result); // Call the function

    // Show the result
    double ResultValue = VMStack_Pull_double(&aStack); // Get the result
    printf("Result:  %5.2f\n", ResultValue);               // Print the result

    // Remove the previous parameters
    VMStack_Pull_double(&aStack); // Pull to clear space of the parameter
    VMStack_Pull_int   (&aStack); // Pull to clear space of the parameter

    // Just to be sure, print out the pointer and see if it is `0`
    printf("Pointer: %d\n", aStack.Pointer);

    free(aStackData);
    return EXIT_SUCCESS;
}

本机函数的推送、拉取和调用可以由字节码触发（这就是稍后制作 VM 的方式）。

为了完整起见（以便您可以在您的机器上尝试），这里是 Stack 的代码：

typedef struct {
    int  Pointer;
    int  Size;
    char *Data;
} VMStack;

inline void   VMStack_Initialize(VMStack *pStack, char *pData, int pSize) __attribute__((always_inline));
inline char   *VMStack_CurrentPointer(VMStack *pStack)                    __attribute__((always_inline));
inline void   VMStack_Push_int(VMStack *pStack, int pData)                __attribute__((always_inline));
inline void   VMStack_Push_double(VMStack *pStack, double pData)          __attribute__((always_inline));
inline int    VMStack_Pull_int(VMStack *pStack)                           __attribute__((always_inline));
inline double VMStack_Pull_double(VMStack *pStack)                        __attribute__((always_inline));

inline void VMStack_Initialize(VMStack *pStack, char *pData, int pSize) {
    pStack->Pointer = 0;
    pStack->Data    = pData;
    pStack->Size    = pSize;
}

inline char *VMStack_CurrentPointer(VMStack *pStack) {
    return (char *)(pStack->Pointer + pStack->Data);
}

inline void VMStack_Push_int(VMStack *pStack, int pData) {
    *(int *)(pStack->Data + pStack->Pointer) = pData;
    pStack->Pointer += sizeof pData; // Should check the overflow
}
inline void VMStack_Push_double(VMStack *pStack, double pData) {
    *(double *)(pStack->Data + pStack->Pointer) = pData;
    pStack->Pointer += sizeof pData; // Should check the overflow
}

inline int VMStack_Pull_int(VMStack *pStack) {
    pStack->Pointer -= sizeof(int);// Should check the underflow
    return *((int *)(pStack->Data + pStack->Pointer));
}
inline double VMStack_Pull_double(VMStack *pStack) {
    pStack->Pointer -= sizeof(double);// Should check the underflow
    return *((double *)(pStack->Data + pStack->Pointer));
}

在本机功能方面，我创建了以下用于测试目的：

// These two structures are there so that Plus will not need to access its parameter using
//    arithmetic-pointer operation (to reduce mistake and hopefully for better speed).
typedef struct {
    int    A;
    double B;
} Data;
typedef struct {
    double D;
} DDouble;

// Here is a helper function for displaying
void PrintData(Data *pData, DDouble *pResult) {
    printf("%5.2f + %5.2f = %5.2f\n", pData->A*1.0, pData->B, pResult->D);
}

// Some native function
void Plus(char* pParams, char* pResult) {
    Data    *D  = (Data    *)pParams; // Access data without arithmetic-pointer operation
    DDouble *DD = (DDouble *)pResult; // Same for return
    DD->D = D->A + D->B;
    PrintData(D, DD);
}

执行时，上面的代码返回：

10.00 + 15.30 = 25.30
Result:  25.30
Pointer: 0

这在我的机器上运行良好（Linux x86 32bits GCC-C99）。如果这也适用于其他操作系统/架构，那就太好了。但是我们必须注意至少三个与内存相关的问题。

1）。数据大小 - 如果我在相同架构上使用相同的编译器编译 VM 和本机函数，大小类型应该是相同的。

2）。字节序 - 与数据大小相同。

3）。内存对齐 - 这是一个问题，因为填充字节可能会添加到结构中，但是在准备参数堆栈时很难同步它（除了硬编码之外，没有办法知道如何添加填充）。

我的问题是：

1）。如果我知道类型的大小，有没有办法修改推拉功能以与结构填充完全同步？（修改为让编译器像 Datasize 和 Endians 问题一样处理它）。

2）。如果我按一个（使用#pragma pack(1)）打包结构；(2.1) 性能损失是否可以接受？(2.2) 项目稳定性是否会受到威胁？

3）。填充 2,4 或 8 怎么样？哪个适合一般的 32 位或 64 位系统？

4）。您能否指导我查看有关 x86 上 GCC 的精确填充算法的文档？

5）。有没有更好的方法？

注意：跨平台不是我的最终目标，但我无法抗拒。此外，只要性能不那么难看，性能就不是我的目标。所有这些都是为了娱乐和学习。

对不起我的英语和很长的帖子。

提前感谢大家。

score 2 · Accepted Answer

Tangential Comments

These first items are tangential to the questions you asked, but...

// Execute
void (*NativeFunction)(char*, char*) = &Plus;
NativeFunction(Params, Result); // Call the function

I think you should probably be using 'void *' instead of 'char *' here. I would also have a typedef for the function pointer type:

typedef void (*Operator)(void *params, void *result);

Then you can write:

Operator NativeFunction = Plus;

The actual function would be modified too - but only very slightly:

void Plus(void *pParams, void *pResult)

Also, you have a minor naming problem - this function is 'IntPlusDoubleGivesDouble()', rather than a general purpose 'add any two types' function.

Direct answers to the questions

1). If I know the size of the types, is there a way to modify push and pull function to exactly synchronize with struct padding? (modify to let compiler takes care of it like Datasize and Endians problems).

There isn't an easy way to do that. For example, consider:

struct Type1
{
     unsigned char byte;
     int           number;
};
struct Type2
{
     unsigned char byte;
     double        number;
};

On some architectures (32-bit or 64-bit SPARC, for example), the Type1 structure will have 'number' aligned at a 4-byte boundary, but the Type2 structure will have 'number' aligned on an 8-byte boundary (and might have a 'long double' on a 16-byte boundary). Your 'push individual elements' strategy would bump the stack pointer by 1 after pushing the 'byte' value - so you would want to move the stack pointer by 3 or 7 before pushing the 'number', if the stack pointer is not already appropriately aligned. Part of your VM description will be the required alignments for any given type; the corresponding push code will need to ensure the correct alignment before pushing.

2). If I pack structure by one (using #pragma pack(1)); (2.1) Will the performance penalty be acceptable? and (2.2) Will the program stability be at risk?

On x86 and x86_64 machines, if you pack the data, you will incur a performance penalty for the misaligned data access. On machines such as SPARC ~~or PowerPC~~(per mecki), you will get a bus error or something similar instead - you must access the data at its proper alignment. You might save some memory space - at a cost in performance. You'd do better to ensure performance (which here includes 'performing correctly instead of crashing') at the marginal cost in space.

3). How about padding by 2,4, or 8? Which should be good for general 32 or 64 bits system?

On SPARC, you need to pad an N-byte basic type to an N-byte boundary. On x86, you will get best performance if you do the same.

4). Can you guide me to a documentation for an exact padding algorithm let's say for GCC on x86?

You would have to read the manual.

5). Is there is a better way?

Note that the 'Type1' trick with a single character followed by a type gives you the alignment requirement - possibly using the 'offsetof()' macro from <stddef.h>:

offsetof(struct Type1, number)

Well, I would not pack the data on the stack - I would work with the native alignment because that is set to give the best performance. The compiler writer does not idly add padding to a structure; they put it there because it works 'best' for the architecture. If you decide you know better, you can expect the usual consequences - slower programs that sometimes fail and are not as portable.

I am also not convinced that I would write the code in the operator functions to assume that the stack contained a structure. I would pull the values off the stack via the Params argument, knowing what the correct offsets and types were. If I pushed an integer and a double, then I'd pull an integer and a double (or, maybe, in reverse order - I'd pull a double and an int). Unless you are planning an unusual VM, few functions will have many arguments.

score 1 · Accepted Answer

有趣的帖子，表明你已经付出了很多努力。几乎是理想的 SO 帖子。

我没有现成的答案，所以请多多包涵。我将不得不再问几个问题：P

1）。如果我知道类型的大小，有没有办法修改推拉功能以与结构填充完全同步？（修改为让编译器像 Datasize 和 Endians 问题一样处理它）。

这仅仅是从性能的角度来看吗？您是否打算将指针与本机算术类型一起引入？

2）。如果我将结构打包一个（使用#pragma pack(1)）；(2.1) 性能损失是否可以接受？(2.2) 项目稳定性是否会受到威胁？

这是一个实现定义的东西。不是您可以跨平台依赖的东西。

3）。填充 2,4 或 8 怎么样？哪个适合一般的 32 位或 64 位系统？

与本机字长匹配的值应该可以为您提供最佳性能。

4）。您能否指导我查看有关 x86 上 GCC 的精确填充算法的文档？

我不知道我的头顶。但是我已经看到了与此类似的代码。

请注意，您可以使用 GCC指定变量的属性（它也有一个叫做default_struct __attribute__((packed))关闭填充的东西）。

score 1 · Accepted Answer

这里有一些非常好的问题，其中许多会与一些重要的设计问题纠缠在一起，但对于我们大多数人来说 - 我们可以看到你正在努力的方向（在我写的时候刚刚发布，所以你可以看到你正在产生兴趣）我们可以很好地理解你的英语，你正在努力解决一些编译器问题和一些语言设计问题 - 解决这个问题变得很困难，但你已经在 JNI 工作，所以有希望......

一方面，我会尝试摆脱杂注；很多人，很多人会不同意这一点。有关为什么的规范讨论，请参阅 D 语言在该问题上立场的理由。另一方面，您的代码中隐藏了一个 16 位指针。

这些问题几乎是无止境的，经过充分研究，很可能让我们陷入对立和内部的不妥协之中。如果我可以建议阅读Kenneth Louden 的主页以及英特尔架构手册。我有它，我试图阅读它。数据结构对齐以及您提出要讨论的许多其他问题都深深地埋藏在历史编译器科学中，并且可能会让您不知所措。（俚语或惯用语，表示不可预见的后果）

话虽如此，这里是：

C 型尺寸 什么类型尺寸？
转到软件工程师之前的计算机工程师 曾经研究过微控制器吗？看看唐兰开斯特的一些作品。
Pascal、Delphi，以及现在的 Java 和 PHP 程序员。 尽管很多人会展示或尝试展示如何使用它们来编写强大的基本例程，但它们相对地从处理器的基本架构中删除了。我建议查看 David Eck 的递归下降解析器，以确切了解如何开始研究此事。同样，Kenneth Louden 有一个“Tiny”的实现，它是一个实际的编译器。不久前我发现了一些我认为被称为 asm dot org 的东西……在那里可以研究非常先进、非常强大的工作，但是要开始用汇编程序编写以进入编译器科学，还有很长的路要走。此外，大多数架构在一个处理器与另一个处理器之间存在不一致的差异。
访问现有库

周围有很多库，Java 有一些不错的库。我不知道其他人。一种方法是尝试编写一个库。Java 有一个很好的基础，并为喜欢尝试提出更好的东西的人们留下了空间。从改进 Knuth-Morris-Pratt 或其他东西开始：不乏起点。试试Computer Programming Algorithms Directory，当然，看看NIST的Dictionary of Algorithms and Data Structures

always_inline

不一定，请参阅 Dov Bulka - 该工人拥有计算机科学博士学位，并且在时间效率/可靠性-稳健性等不受某些“商业模式”范式约束的领域中也是一位精通作者，我们从中获得了一些在真正重要的问题上“哦！那没关系”。

最后，仪器和控制占您所描述的已完成编程技能的实际市场的 60% 以上。出于某种原因，我们听到的主要是商业模式。让我与您分享我从可靠来源获得的内部消息。10% 到 60% 或更多的实际安全和财产风险来自车辆问题，而不是来自盗窃、盗窃等。您将永远不会听到关于“在县矿产开采设施进行 90 天开采矿产的呼吁！” 对于交通罚单，事实上大多数人甚至没有意识到交通引用是（NA-USA）第 4 类轻罪，实际上可以这样分类。

在我看来，你已经朝着一些好的工作迈出了一大步，......

c - 跨平台 VM 的 C 内存管理

3 回答 3

Tangential Comments

Direct answers to the questions

Related

Reference