0

我目前正在为一种自定义的类似 asm 的编程语言编写编译器,我真的很困惑如何为数据标签进行正确的 PC 相关寻址。

main    LDA RA hello
        IPT #32
        HLT

hello   .STR "Hello, world!"

上面的伪代码在编译后会产生以下十六进制:

31 80 F0 20 F0 0C 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 00

3180,F020F00CLDA,IPTHLT指令。

从代码中可以看出,该LDA指令使用标签hello作为参数。编译后,它变成 value 02,这意味着“Incremented PC + 0x02”(如果您查看代码,那是“Hello, world!”行的位置,相对于LDA调用。事情是:.STR不是指令,因为它只告诉编译器它需要在可执行文件的末尾添加一个(0 终止的)字符串,因此,在hello标签声明之后是否还有其他指令,该偏移量将是错误的。

但是除了让编译器能够穿越时间之外,我找不到计算正确偏移量的方法。我必须“编译”两次吗?首先是数据标签,然后是实际指令?

4

1 回答 1

2

Yes, most assemblers are (at least) two-pass - precisely because of forward references like these. Adding macro capabilities can add more passes.

Look at an assembly listing, not just the op-codes. As you said the actual offset is "2", I'm assuming memory is word-addressed.

0000 3180   main    LDA RA hello
0001 F020           IPT #32
0002 F00C           HLT

0003 4865   hello   .STR "Hello, world!"

The first two columns are the PC and opcode. I'm not sure how the LDA instruction has been encoded (where is the +2 offset in there?)

In the first pass, assuming all addressing is relative, the assmebler would emit the fixed part of the op-code (covering the LDA RA part) along with a marker to show it needed to patch up the instruction with the address of hello in the second pass.

At this point it knows the size, but not the complete value, of the final machine language.

It then continues on, working out the address of each instruction and building its symbol table.

In the second pass, now knowing the above information, it patches each instruction by calculating relative offsets etc. It also often regenerates the entire output (including PC values).

Occasionally, something will be detected in the second pass which prevents it continuing. For example, perhaps you can only reference objects within 256 words (-127 thru +128), but the label hello turns out to be more than 128 words away. This means it should have used a two-word instruction (with an absolute address), which changes everything it learnt during the first pass.

This is often referred to as a 'fix up' error. The same thing can happen during the link phase.

Single pass assemblers are only possible if you insist on 'define before use'. In which case, your code would report hello as an undefined symbol.

You also need to read up on "program sections". Whilst .STR is not an executable instruction, it is a directive to the assembler to place the binary representation of the string into the CODE section of the image (vs DATA).

于 2016-08-26T20:24:58.620 回答