gcc - Gcc inline assembly: what's wrong with the dynamic allocated register `r` in input operand?

Question

When I test the GCC inline-assembly, I use the test function to display a character on the screen with the BOCHS emulator. This code is running in 32-bit protected mode. The code is as follows:

test() {
    char ch = 'B';
    __asm__ ("mov $0x10, %%ax\n\t" 
                "mov %%ax, %%es\n\t"
                "movl $0xb8000, %%ebx\n\t"
                "mov $0x04, %%ah\n\t" 
                "mov %0, %%al\n\t" 
                "mov %%ax, %%es: ((80 * 3 + 40) * 2)(%%ebx)\n\t" 
                ::"r"(ch):);
}

The result I'm getting is:

The red character on the screen isn't displaying B correctly. However, when I changed the input register r to c like this: ::"c"(ch):);, which is the last line of the above code, the character 'B' displays normally:

What's the difference? I accessed the video memory through the data segment directly after the computer entered into protected mode.

I have trace the assembly code, I have found that the code has been assembled to mov al, al when the r register is chosen and the value of ax is 0x0010, so al is 0x10. The result should be like this, but why did it choose the al register. Isn't it supposed to choose the register which hasn't been used before? When I add the clobbers list, I have solved the problem.

score 4 · Accepted Answer

就像@MichaelPetch 评论的那样，您可以使用 32 位地址从 C 访问您想要的任何内存。 asm gcc 发出将假定一个平坦的内存空间，并假设它可以复制esp到edi并使用rep stos归零一些堆栈内存，例如（这需要与)%es具有相同的基数%ss。

我猜最好的解决方案是不使用任何内联汇编，而是使用全局常量作为指向char. 例如

// pointer is constant, but points to non-const memory
uint16_t *const vga_base = (uint16_t*)0xb8000;   // + whatever was in your segment

// offsets are scaled by 2.  Do some casting if you want the address math to treat offsets as byte offsets
void store_in_flat_memory(unsigned char c, uint32_t offset) {
  vga_base[offset] = 0x0400U | c;            // it matters that c is unsigned, so it zero-extends instead of sign-extending
}
    movzbl  4(%esp), %eax       # c, c
    movl    8(%esp), %edx       # offset, offset
    orb     $4, %ah   #, tmp95         # Super-weird, wtf gcc.  We get this even for -mtune=core2, where it causes a partial-register stall
    movw    %ax, 753664(%edx,%edx)  # tmp95, *_3   # the addressing mode scales the offset by two (sizeof(uint16_t)), by using it as base and index
    ret

来自 Godbolt 上的 gcc6.1（下面的链接），带有-O3 -m32.

如果没有const, 之类的代码vga_base[10] = 0x4 << 8 | 'A';将不得不加载vga_base全局然后从它偏移。,const是&vga_base[10]一个编译时常量。

如果你真的想要一个片段：

由于您不能留下%es修改，您需要保存/恢复它。这是首先避免使用它的另一个原因。如果你真的想要一个特殊的段来做某事，设置一次%fs或%gs一次并保持设置，这样就不会影响任何不使用段覆盖的指令的正常操作。

对于线程局部变量，有使用%fs或不使用内联 asm的内置语法。%gs 您也许可以利用它来完全避免内联汇编

如果您使用的是自定义段，则可以将其基地址设为非零，因此您无需0xb8000自己添加。但是，Intel CPU 针对平面内存情况进行了优化，因此使用非零段基的地址生成要慢几个周期，IIRC。

我确实找到了一个请求 gcc 允许在没有内联 asm 的情况下覆盖段，以及一个关于向 gcc 添加段支持的问题。目前你不能这样做。

在 asm 中手动执行，带有专用段

为了查看 asm 输出，我将它与ABI放在 Godbolt 上-mx32，因此 args 在寄存器中传递，但地址不需要符号扩展为 64 位。（我想避免从堆栈中加载 args-m32代码的噪音。-m32保护模式的 asm 看起来很相似）

void store_in_special_segment(unsigned char c, uint32_t offset) {
    char *base = (char*)0xb8000;               // sizeof(char) = 1, so address math isn't scaled by anything

    // let the compiler do the address math at compile time, instead of forcing one 32bit constant into a register, and another into a disp32
    char *dst = base+offset;               // not a real address, because it's relative to a special segment.  We're using a C pointer so gcc can take advantage of whatever addressing mode it wants.
    uint16_t val = (uint32_t)c | 0x0400U;  // it matters that c is unsigned, so it zero-extends

    asm volatile ("movw  %[val], %%fs: %[dest]\n"
         : 
         : [val] "ri" (val),  // register or immediate
           [dest] "m" (*dst)
         : "memory"   // we write to something that isn't an output operand
    );
}
    movzbl  %dil, %edi        # dil is the low 8 of %edi (AMD64-only, but 32bit code prob. wouldn't put a char there in the first place)
    orw     $1024, %di        #, val   # gcc causes an LCP stall, even with -mtune=haswell, and with gcc 6.1
    movw  %di, %fs: 753664(%esi)    # val, *dst_2

void test_const_args(void) {
    uint32_t offset = (80 * 3 + 40) * 2;
    store_in_special_segment('B', offset);
}
    movw  $1090, %fs: 754224        #, MEM[(char *)754224B]

void test_const_offset(char ch) {
    uint32_t offset = (80 * 3 + 40) * 2;
    store_in_special_segment(ch, offset);
}
    movzbl  %dil, %edi  # ch, ch
    orw     $1024, %di        #, val
    movw  %di, %fs: 754224  # val, MEM[(char *)754224B]

void test_const_char(uint32_t offset) {
    store_in_special_segment('B', offset);
}
    movw  $1090, %fs: 753664(%edi)  #, *dst_4

所以这段代码让 gcc 在使用寻址模式做地址数学方面做得很好，并在编译时做尽可能多的事情。

段寄存器

如果您确实想为每个存储修改一个段寄存器，请记住它很慢：Agner Fog 的 insn 表在 Nehalem 之后停止mov sr, r，但在 Nehalem 上，它是一个 6 uop 指令，包括 3 个加载 uop（来自我假设的 GDT）。它的吞吐量为每 13 个周期一个。读取段寄存器很好（例如push sror mov r, sr）。 pop sr甚至有点慢。

我什至不会为此编写代码，因为这是个坏主意。确保你使用clobber约束让编译器知道你踩到的每个寄存器，否则你将遇到难以调试的错误，周围的代码停止工作。

有关 GNU C 内联 asm 信息，请参阅x86标记 wiki。

gcc - Gcc inline assembly: what's wrong with the dynamic allocated register `r` in input operand?

1 回答 1

如果你真的想要一个片段：

在 asm 中手动执行，带有专用段

段寄存器

Related

Reference