c - 用 C 实现的解释器中的垃圾收集问题

Question

我正在为一种玩具程序语言开发一个爱好编译器/解释器，并且我已经实现了我打算探索的大部分功能，除了一个好的垃圾收集算法（类似于this guy）。我已经阅读了很多关于各种算法的内容，并且对如何实现它们有了大致的了解。我的语言运行时的早期迭代使用了引用计数，但我放弃了它以学习更高级的东西，所以我现在正在考虑标记和复制压缩算法。

我开始的第一个问题是阻止算法在本机扩展函数（即用 C 编写的函数）中收集“对象”。根集由解释器堆栈上的“对象”和符号表中的“对象”组成，我不应该对这些有太多麻烦，但是，如果在 C 函数中创建容器“对象”，然后填充子“对象”，我如何防止 GC 收集它们，因为它实际上不在解释器堆栈上或绑定到符号？

使实施 GC 更容易的事情：

我的语言中的所有“对象”都是内置类型（例如，不是面向对象的）
解释器堆栈只是一个指向结构的指针堆栈
符号表只是指向结构的指针数组

用户代码：

f = open('words.txt', 'r');
lines = readlines(f);
close(f);

解释器（解析后编译为字节码...）：

push文件名，open_mode
调用builtin_fopen它返回一个包装 aFILE*
将结果存储在符号中f
推动符号f
builtin_flines创建列表类型的调用l，然后使用 Cfread将文件的每一行作为字符串类型读取，并将其附加到列表中l
将结果存储在符号lines中，依此类推....

现在，如果 GC 在分配文件中包含一行的字符串之一时运行，则根集还没有对的任何引用l，因此应该收集它。关于如何更好地处理这个问题的任何想法？

score 3 · Accepted Answer

为解释器的堆专用一个单独的连续分配区域。永远不要在竞技场之外收集任何东西。
您始终拥有竞技场的当前顶部（假设它从较低地址增长到较高地址）。顶部以上的所有内容均不可收藏，但在根集中考虑。必须分配多个链接对象的内置函数将它们分配到顶部上方，然后将顶部向上移动，以便所有分配的对象立即进入可回收堆。如果收集发生在函数执行过程中，则顶部上方的对象将立即全部移动到新堆中。

score 1 · Accepted Answer

您需要为您的本机函数提供一个接口，通过该接口它们可以告诉垃圾收集器它们引用了哪些对象，然后让它们使用该接口。

最简单的方法可能是根本不让本机代码直接指向解释器/垃圾收集的数据。相反，您为本机代码提供对象的句柄并让它回调到运行时以从对象中获取值。在您的示例中，builtin_flines将调用运行时分配一个列表并取回它的句柄。然后它会读取行，并调用运行时将每一行附加到列表中，最后返回完整的列表。运行时将管理给定本机调用的所有句柄，在本机调用返回后释放它们。

score 1 · Accepted Answer

由于我是您提到的最初的“这个”人，尽管我可以根据我迄今为止在我的项目中设计的内容为您提供有关您的第一个问题的一些见解（我保证我最终会在博客上介绍它）。因此，首先，所有内存分配都通过一个 mutator 函数。输入参数是您正在创建的对象的类型，以及对指向它的指针类型对象的引用。然后在创建新对象时更新该指针对象。如果一个对象被分配给解释器运行时的 C 函数独占使用，那么它就是一个根对象。在这种情况下，NULL 作为第二个参数传递，并且该对象被添加到根对象列表中。现在稍后，如果该内部函数不再需要该对象，则必须从根对象列表中删除该对象。（它不会取消分配对象本身，因为最终将由垃圾收集例程处理）。哦，解释器堆栈本身也是解释器中的一个对象（列表类型或数组类型对象），因此指向它的指针也在根对象列表中（同样，另一个列表类型对象也是口译员知道）。指向根对象列表的指针是垃圾收集器需要知道的唯一指针。

Also, as for when to start a garbage collection run -- since memory is effectively unlimited on modern architectures, I've decided to kick off the garbage collector when X number of objects has been allocated. After running you have Y objects left. If Y still greater than Z percent of X, then X gets bumped up enough to make that so. Then I just hope that malloc() never fails (if it does, I just dump an error out and exit the interpreter).

Hope this helps, and hopefully someone else will add more clarifications since I am more of an amateur when it comes to language / interpreter design.

score -1 · Accepted Answer

Some complications:

When you input a line to be interpreted like 100 if X then gosub 5000

But 5000 does not exist yet, you are spaghetti coding... Maybe x does not have any assigned value or data type yet. If we don't index now, are we going wait till someone types "run" or executes a line directly from the prompt?

if we do index now to speed things up later, how will we know the last instance of "100" or "X" or "5000" gets removed?

What entry do we make in the master index of "things"? Assuming these things may include lines of basic code, strings, and other variables we want to handle by name or line number.

We want to find quickly, and use to strategically identify garbage collection potential when the need for collection arises.

How much static space do we burn on the index of things that may change in size? Which details besides label, location, and length are useful enough to justify indexing? Should we attempt to index empty space when a variable shrinks? Or just index the variable's largest historical size along with its current size? How do we identify those variables that change in size most frequently, and should we avoid cleaning them, or even deliberately pad them?

When do we clean up the entire mess? or is it better to defrag only enough free space to squeeze in something that cannot be otherwise jammed sideways into an existing hole?

Purposeful delays and waiting for "input" seem good targets that we might exploit to proactively clean up some of the mess. There is no assurance any basic program will have such deadtime.

Sorry this is not an answer, but the original question seems to invite some brainstorming towards a better scheme. We need a clear strategy that requires defining the entire problem.

c - 用 C 实现的解释器中的垃圾收集问题

4 回答 4

Related

Reference