assembly - 使用 wprintf linux x86-64 平台打印出汇编中的 unicode 字符

Question

我正在使用 linux，只是在试验 nasm 和 gas。我能够使用 c++ 使用 wprintf 打印出 unicode 字符

#include <wchar.h>
#include <locale.h>
#include <stdio.h>
int main() 
{
  //printf("helloworld"); // can't do this AND wprintf in same program
  setlocale(LC_ALL, "");
  wprintf(L"%lc",0x307E); //prints out japanese hiragana ma ま
}

但是，我对尝试在汇编中执行此操作（包括英特尔和气体语法）感到非常困惑。我的主要困惑是 .data 部分。我什至给 gcc 提供了 -S 开关，看看他们是如何做到的。他们用 13 个 .string 语句来格式化字符串，其中许多是空白字符串，每个字符都在一个单独的 .string 上。我读到，您可以通过将字符串放在 dw 而不是 db 中来使常规字符串成为 nasm 中的宽字符串。因此，我当然尝试使用 .int 来获取气体，但效果并不好。我的意思是它会打印出额外的灰色问号。这是我当前的代码

.section .data
locale:
  .string ""
printformat:
  .int '%','l','c'
printwide:
  .int 0x307E,0
.section .text
.global _start
_start:
movq    $locale,%rsi
movq    $6,%rdi
call    setlocale
movq    $printformat,%rdi
movq    $printwide,%rsi
movq    $0,%rax
call    wprintf
movq    $2,%rdi
call    exit

这导致 5 个灰色问号，然后是平假名ま (ma)。你会认为在 '%','l','c' 之后应该有一个 ,0，但这是行不通的——这样做之后只会输出问号。我能够打印出平假名ma并且没有问号的唯一方法是跳过格式字符串并将printwide加载到rdi中。

同样，目前这是出于教育目的。所以基本上，你如何在 at&t 语法和 intel 中处理格式字符串？在 c++ 中，您只需在其前面放一个 L 即可。（是的，我想您可以将 %lc 更改为十六进制，但我不想那样做）

编辑这个工作（我将 $printwide 更改为 printwide 并将 printformat: 更改为 .strings 就像 gcc -S 列表所做的那样。）但是为什么它会起作用，除了使用这么多 .string 语句之外，还有更好的方法来写出格式? 你会怎么用英特尔语法做呢？

.section .data
locale:
    .string ""
printformat:
    .string "%"
    .string ""
    .string ""
    .string "l"
    .string ""
    .string ""
    .string "c"
    .string ""
    .string ""
    .string ""
    .string ""
    .string ""
    .string ""
printwide:
    .word 0x307E
.section .text
.global _start
_start:
movq    $locale,%rsi
movq    $6,%rdi
call    setlocale
movq    $printformat,%rdi
movq    printwide,%rsi
movq    $0,%rax
call    wprintf
movq    $2,%rdi
call    exit

score 1 · Accepted Answer

我对答案感到惊讶。我猜 64 位宽的字符是 32 位的。我通过阅读 nasm 发现了这一点。您可以通过以下方式以 intel 语法创建字符串 utf-16

printformat dw __utf16__("%lc"),0

但是只有当我这样做时它才有效

printformat dd __utf32__("%lc"),0

所以 at&t 语法中的等价物是

.long '%','l','c',0

我猜 gcc -S 使用这么多字符串的列表使其宽度为 32 位

.string "%" = 16 位（% 和自动零），然后是另外 8 位和一个空字符串，然后是另外 8 位和另一个空字符串。

assembly - 使用 wprintf linux x86-64 平台打印出汇编中的 unicode 字符

1 回答 1

Related

Reference