assembly - ARM架构下添加命令和程序计数器

Question

我专注于add使用它的命令的 ARM 程序集片段。如下所示的代码片段简单地指出：在程序计数器的地址上加上计算出的偏移量，以找到存储在的字符串的位置L._str，其中L._str是包含在数据段中的字符串的符号（地址）。

movw    r0, :lower16:(L_.str-(LPC1_0+4))
movt    r0, :upper16:(L_.str-(LPC1_0+4))
LPC1_0:
    add r0, pc

前两条指令 (movw和movt) 加载表示该字符串地址的 32 位数字。我在拇指模式，对吧？好的，这么说，我很难弄清楚整体内存布局。以下是内存代码段的正确表示吗？另外，字符串的地址是LPC1_0和L._str基地址吗？每个盒子的尺寸是多少？32 位或 64 位取决于架构。add r0, pcA simple string

--------------------------------------------
| movw    r0, :lower16:(L_.str-(LPC1_0+4)) |
--------------------------------------------
| movt    r0, :upper16:(L_.str-(LPC1_0+4)) |
-------------------------------------------- LPC1_0
| add r0, pc                               |
--------------------------------------------
                       .
                       .
                       .
-------------------------------------------- L._str
| "A simple string"                        |
--------------------------------------------

pc如果是这样，我可以使用差异检索偏移量（将添加到） L_.str-LPC1_0。但是，这里+4也考虑到了。

来自ADD、pc 或 sp 相对

添加 Rd，Rp，#expr

如果 Rp 是 pc，则使用的值是：（当前指令的地址 + 4）AND &FFFFFFFC。

因此，如果pc是的话，Rp我似乎还需要考虑+4偏移量的更多字节。行。那么，这些字节是在哪里添加的呢？为什么将这 4 个字节考虑在mov指令中而不是在add命令之前？这是编译器引入的优化功能吗？

score 1 · Accepted Answer

我有根据的猜测：

您想获取L_.str内存中的“绝对”地址。 movw并且movt似乎添加了立即值，因此该值在操作码内。

编译器计算和之间的偏移量LPC1_0，L_.str并减去另一个4（字节）。

add r0,pc 说明会增加该pc+4值。

the +4 are added by the processor. I think it is because the pc is incremented quite early in the processors "logic", and the add only can read the value of pc afterwards. It's simpler to document that it is really pc+4 than to add extra logic to add pc+4-4 by the processor...

The advantage of that whole solution to calculate the address of L_.str is that its independent of relocation of that code.

score 1 · Accepted Answer

The normal position-independent "get the address of something" instruction would be simply adr, r0, L._str (which is equivalent to having the assembler/linker automatically calculate an appropriate offset for add r0, pc, #offset). However, since the ARM architecture uses fixed-width encodings - ARM instructions are 32 bits wide, Thumb instructions are either 16 or 32 bits - there are only a limited number of bits of the instruction available to encode the immediate value for the offset, so the maximum range is limited. The maximum possible offset that a Thumb encoding of adr can support is +/-4095 bytes. Since the compiler has no idea how far apart the linker will put the sections, it can't safely emit adr for risk of the final offset being too big to assemble, so instead you get the 3-instruction generate immediate/add PC sequence. The advantage is that it can reach any 32-bit address, the tradeoff is that it takes up more space in the program image and instruction cache - adr alone is 2 or 4 bytes (depending on the offset and target register), the movw/movt/add sequence weighs in at 10 bytes and takes at least twice as long to execute.

As for why the PC offset is folded into the section offset, well, why wouldn't it be? Both are constant, so when the linker is calculating the distance between LPC1_0 and L_.str in the final image to encode the immediate value into the movw/movt instructions, it has absolutely nothing to gain by not adding the PC correction at the same time. That's why the 2-instruction fetch/execute offset of the original ARM's 3-stage pipeline was exposed in the first place, because it was considerably simpler to fix up addresses in the assembler/linker when building software, than to implement all the logic to "correct" it in hardware.

assembly - ARM架构下添加命令和程序计数器

2 回答 2

Related

Reference