33

我正在寻找一种将局部变量分配给寄存器的方法。我知道有几种严肃的方法可以做到这一点(即Wikipedia 上提到的那些),但我坚持“溢出”是如何完成的。此外,相关文献相当吓人。我希望有一些更简单的东西可以满足我的优先事项:

  1. 正确性——无论有多少局部变量,都会生成正确代码的算法。
  2. 简单——我不需要阅读太多文献就能理解的东西。
  3. 效率——它需要比目前的方法更好,即:

将操作转换x = y # z为:

movl y, %eax
movl z, %ebx
op %ebx, %eax
movl %eax, x

由于我的目标是 Intel 386,一些相关的限制是:

  • 二元运算有两个参数,其中一个是源和目标。一元运算采用单个参数。
  • 操作只能访问一个内存位置;因此,二进制操作至少需要一个寄存器中的参数。
  • 最多有六个寄存器可用:%eax %ebx %ecx %edx %esi %edi. (%ebp也可以作为最后的手段。)
  • 有一些特殊情况,例如整数除法和返回寄存器,但我现在可以忽略它们。

编译器目前需要完成三个步骤:

  • i386ification:所有操作都转换为一种形式a = a # b(或a = #a一元操作)。
  • 活度分析:确定每次操作前后的活变量集。
  • 寄存器分配:建立干扰图并着色。

And then the compiler throws its crayons in the air and doesn't know what to do next.

Example

public int mf(int cr, int ci) {
    int i = 0;
    int zr = 0;
    int zi = 0;

    while (i < 100 && zr*zr + zi*zi < 4) {
        int t = zr * zr - zi * zi + cr;
        zi = 2 * zr * zi + ci;
        zr = t;

        i = i + 1;
    }
    return i;
}

Here's the rather pretty interference graph for the function, and the CFG with liveness information. The CFG image does require some vertical scrolling, unfortunately.

Seven colours were used. I would like to spill one of them (or the set of variables assigned that colour). The method of choosing which isn't too important. What gets tricky is how to deal with the spilt variables.

Let's say I spill "pink", which is the set of variables t, $t4, $t7. This means that those operations referring to one of these variables will access it from its position on the stack frame, rather than through a register. This should work for this example.

But what if the program was:

...
a = a + b
...

and both a and b had to be spilled? I can't emit an instruction addl b, a with two memory addresses. I would need another spare register to temporarily hold one of the operands, and that means spilling another colour. This suggests a general method of:

  1. If all variables can be coloured with r colours, great!
  2. Otherwise, spill some colours and their associated variables.
  3. If an operation exists that accesses two spilled variables, spill another colour and use the spare register for temporary storage for all such operations.

At this point I would suspect that a lot more stuff is being spilled than necessary, and wonder if there is some smarter way to spill things, such as spilling part of a variable's lifetime, rather than the whole variable itself. Are there some simple(ish) techniques that I could use here? Again, I'm not aiming particularly high -- certainly not high enough to require reading anything too deep. ;-)

Specific problems

The main specific problem is: when a variable is spilled, how does this affect the instructions generated? Do all instructions using that variable need to access it directly in memory (from its stack position) ? How will this work if an operation uses two spilled variables? (The architecture does not permit instructions to access two distinct memory locations.)

Secondary problems are:

  • How do I determine where to insert load/store instructions, for correctness (and less importantly, efficiency) ?
  • Can I spill a variable for only that part of its lifetime when it is not in immediate use, and unspill it later? So that all instructions act on unspilled registers. A variable might live in different registers at different times.
  • Can I be a little more efficient with special cases. For example, %eax is used for the return value, so it would be nice if the variable to be returned happened to be allocated to that register by the time the return was encountered. Similarly, some registers are "callee-save", so if fewer variables happened to be live at the time of a function call, having them allocated to non-callee-save registers would mean I can avoid storing those registers.
  • Would SSA form help much (if at all) ? Being able to eliminate common subexpressions and evaluate constants might reduce(?) register pressure, but otherwise would it have any effect?

The aspects I'm not concerned about (right now) are:

  • Stack allocation and optimisation: it's implemented naively already, and can be optimised using the interference graph if need be.
  • Compile-time efficiency, just as long as it terminates. (NP-completeness does not imply a given algorithm should be avoided.)

Update

Sorry about the downtime -- I've been thinking about the answers given and trying to find an easy approach to take to start implementing some of the ideas. To be honest, I've been procrastinating... :-\

I found the very nice presentation (PPT, sadly):

http://www.cs.princeton.edu/courses/archive/spr05/cos320/notes/Register%20Allocation.ppt

Which answers the question about how to deal with specific operation needs (like using the same register for source and destination; or needing a certain register for some operations). What I'm not sure about is whether the Liveness-Colouring-Allocation cycle terminates.

I'll try to do some actual work soon and hopefully close the question.

4

2 回答 2

12

I've used a greedy approach in a JVM allocator once, which worked pretty well. Basically start at the top of a basic block with all values stored on the stack. Then just scan the instructions forward, maintaining a list of registers which contain a value, and whether the value is dirty (needs to be written back). If an instruction uses a value which is not in a register (or not in the correct register), issue a load (or move) to put it in a free register before the instruction. If an instruction writes a value, ensure it is in a register and mark it dirty after the instruction.

If you ever need a register, spill a used register by deallocating the value from it, and writing it to the stack if it is dirty and live. At the end of the basic block, write back any dirty and live registers.

This scheme makes it clear exactly where all the loads/stores go, you generate them as you go. It is easily adaptable to instructions which take a value in memory, or which can take either of two arguments in memory, but not both.

If you're OK with having all data on the stack at every basic block boundary, this scheme works pretty well. It should give results similar to linear scan within a basic block, as it basically does very similar things.

You can get arbitrarily complicated about how to decide which values to spill and which registers to allocate. Some lookahead can be useful, for example by marking each value with a specific register it needs to be in at some point in the basic block (e.g. eax for a return value, or ecx for a shift amount) and preferring that register when the value is first allocated (and avoiding that register for other allocations). But it is easy to separate out the correctness of the algorithm from the improvement heuristics.

I've used this allocator in an SSA compiler, YMMV.

于 2010-01-04T22:48:18.590 回答
8

First: There is no smart way to do it. The problem is NP-complete ;-)

How spilling is done:

You run your register allocation algorithm and get a list of variables you have to spill. Now you can allocate some space on the stack at the beginning of your function. Link every spilled variable too a place on the stack. If you want to be smart coalesce memory with non-overlapping live ranges. Whenever you need to spill a register save it to memory and load it, when it is needed again.

How to handle eax:

Mark the register as filled, but do not store any variable in it (pre-allocation). This will make the code generator clear that register. To be smart store the value in another register if beneficial.

Easy and correct ways to handle spilling:

Just spill everything. This assume that every variable's live range is the whole program. This can be augmented by using stuff like LRU or usage count to choose which registers should be freed.

The next best thing to do is probably linear scan register allocation. It should be quite easy to implement even when using pre-allocation. I suggest you look into the linked paper.

Specific Answers

  1. What does correctness mean for you? Even simple allocations algorithms are correct if you do not make a programming error. Proofing (mathematical) correctness is a lot more difficult. Both loads and stores need to be inserted before the value/register is needed again. Both need to be inserted after the value is stored/created.

  2. Yes. If you program it that way. If your algorithm can handle a value in multiple registers during its livetime you can use those optimizations.

  3. It's again up to you to implement certain improvements. One possibility would be to only block eax when it's needed, not for the whole program.

  4. Under certain conditions SSA does help. Inference graphs of SSA code are always chordal, meaning that there is no cycle with more than 3 nodes. This is a special case of graph coloring, in which a minimal coloring can be found in polynomial time. Converting to SSA does not necessarily mean more or less register pressure. While SSA form has usually more variables, these tend to have smaller livetimes.

于 2009-12-25T22:09:48.947 回答