c - 打印 wchar_t 作为 wchar_t* 字符串的一部分不会终止

Question

所以，我在 glibc 中发现了一个我喜欢报告的错误。问题是printf()在语言环境中为分组字符计算错误的宽度，no_NO.utf8因此没有在字符串的左侧留出足够的填充。我最初在 shell util 中发现了这一点printf，但它似乎源于原始printf函数 in libc，我已经使用一个小测试程序对其进行了验证。

我从大学开始就没有接触过 C，所以在创建测试用例时我有点生疏。到目前为止，我唯一的问题是，当使用这个分组字符作为字符串的一部分（一个 wchar_t 数组）时，字符串没有终止，我不确定我做错了什么。

这是我的小测试驱动程序的输出：

$ gcc printf-test.c && ./a.out 
Using locale nb_NO.utf8
<1 234> (length 7 according to strlen)
<1 234> (length -1 according to wcswidth)

Using locale en_US.utf8
<  1,234> (length 7 according to strlen)
<  1,234> (length 7 according to wcswidth)

Width of character e280af: -1

Width of s0  4: (ABCD)
Width of s1  4: (ABCD)
Width of s2 -1: (

很明显，最终字符串中的打印发生了一些可疑的事情，这与我如何尝试使用nb_NO语言环境中使用的多字节分组字符打印字符串有关。

完整来源：

#define _XOPEN_SOURCE       /* See feature_test_macros(7) */
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
#include <string.h>


void print_num(char *locale){ 
    printf("Using locale %s", locale);
    setlocale(LC_NUMERIC, locale);
    char buf[40];
    sprintf(buf,"%'7d", 1234);
    printf("\n<%s> (length %d according to strlen)\n", buf, (int) strlen(buf));

    wchar_t wbuf[40];
    swprintf(wbuf, 40, L"%'7d", 1234); 
    int wide_width = wcswidth (wbuf, 40);
    printf("<%s> (length %d according to wcswidth)\n", buf, wide_width);
    puts("");
}

int main(){
    print_num("nb_NO.utf8");
    print_num("en_US.utf8");

    // just trying to understand
    wchar_t wc = (wchar_t) 0xe280af; // is this a correct way of specifying the char e2 80 af?
    int width = wcwidth (wc);
    printf("Width of character %x: %d\n", (int) wc, width);

    wchar_t s0[] = L"ABCD";
    wchar_t s1[] = {'A','B','C', 'D', '\0'};
    wchar_t s2[] = {'A',wc,'B', '\0'}; // something fishy
    int widthOfS0 = wcswidth (s0, 4);
    int widthOfS1 = wcswidth (s1, 4);
    int widthOfS2 = wcswidth (s2, 4);
    printf("\nWidth of s0  %d: (%ls)", widthOfS0, s0);
    printf("\nWidth of s1  %d: (%ls)", widthOfS1, s1);
    printf("\nWidth of s2 %d: (%ls)", widthOfS2, s2); // this does not terminate the string

    return 0;
}

score 1 · Accepted Answer

也许太明显了，您需要使用wprintf()来打印wchar_t. 您添加的任何字符串都会自动终止，但如果您用单个字符填充它并且强制转换只是更改它显示的大小和类型以使其“适合”，它不会在数字类型之间进行任何类型的转换。

#include <wchar.h>
#include <stdio.h>

#ifndef __STDC_ISO_10646__
    #pragma warning() // 16 bit wchar
#endif

int main(void){

    int ret;
    wchar_t W [] = {                  // 0x80AF
        U'\x42', (wchar_t)0x43, (wchar_t)0xE280AF 
    };

    printf("Num cast %X -> %X \n", 0xE280AF, (wchar_t)0xE280AF);

    wchar_t S1[] = {'A', W[0], 'C',  0};
    wchar_t S2[] = {'A', 'B',  W[1], 0};
    wchar_t S3[] = {'A', W[2], 'C',  0};

    ret = wprintf(L"wstr S1 -> (%ls)", S1);
    printf(" / %i xchars printed \n", ret);

    ret = wprintf(L"wstr S2 -> (%ls)", S2); 
    printf(" / %i xchars printed \n", ret);

    ret = wprintf(L"wstr S3 -> (%ls)", S3);
    printf(" / %i xchars printed \n", ret);

    return 0;
}

c - 打印 wchar_t 作为 wchar_t* 字符串的一部分不会终止

1 回答 1

Related

Reference