c++ - 如何识别 C++ 中的 RTL 字符串

Question

我需要在打印之前知道我的文本的方向。

我正在使用 Unicode 字符。

我怎样才能在 C++ 中做到这一点？

score 6 · Accepted Answer

If you don't want to use ICU, you can always manually parse the unicode database (.e.g., with a python script). It's a semicolon-separated text file, with each line representing a character code point. Look for the fifth record in each line - that's the character class. If it's R or AL, you have an RTL character, and 'L' is an LTR character. Other classes are weak or neutral types (like numerals), which I guess you'd want to ignore. Using that info, you can generate a lookup table of all RTL characters and then use it in your C++ code. If you really care about code size, you can minimize the size the lookup table takes in your code by using ranges (instead of an entry for each character), since most characters come in blocks of their BiDi class.

Now, define a function called GetCharDirection(wchar_t ch) which returns an enum value (say: Dir_LTR, Dir_RTL or Dir_Neutral) by checking the lookup table.

Now you can define a function GetStringDirection(const wchar_t*) which runs through all characters in the string until it encounters a character which is not Dir_Neutral. This first non-neutral character in the string should set the base direction for that string. Or at least that's how ICU seems to work.

score 5 · Accepted Answer

您可以使用ICU库，它有一个功能（ubidi_getDirection ubidi_getBaseDirection）。

通过重新编译数据库（通常大约 15MB 大），可以减少 ICU 的大小，以仅包含项目所需的转换器/本地。

减少 ICU 数据大小：站点http://userguide.icu-project.org/icudata的转换表部分包含如何减少数据库大小的信息。

如果只需要支持最常见的编码（US-ASCII、ISO-8859-1、UTF-7/8/16/32、SCSU、BOCU-1、CESU-8），则无论如何都不需要数据库。

score 2 · Accepted Answer

从 Boaz Yaniv 之前说过，也许这样的事情会比解析整个文件更容易和更快：

int aft_isrtl(int c){
  if (
    (c==0x05BE)||(c==0x05C0)||(c==0x05C3)||(c==0x05C6)||
    ((c>=0x05D0)&&(c<=0x05F4))||
    (c==0x0608)||(c==0x060B)||(c==0x060D)||
    ((c>=0x061B)&&(c<=0x064A))||
    ((c>=0x066D)&&(c<=0x066F))||
    ((c>=0x0671)&&(c<=0x06D5))||
    ((c>=0x06E5)&&(c<=0x06E6))||
    ((c>=0x06EE)&&(c<=0x06EF))||
    ((c>=0x06FA)&&(c<=0x0710))||
    ((c>=0x0712)&&(c<=0x072F))||
    ((c>=0x074D)&&(c<=0x07A5))||
    ((c>=0x07B1)&&(c<=0x07EA))||
    ((c>=0x07F4)&&(c<=0x07F5))||
    ((c>=0x07FA)&&(c<=0x0815))||
    (c==0x081A)||(c==0x0824)||(c==0x0828)||
    ((c>=0x0830)&&(c<=0x0858))||
    ((c>=0x085E)&&(c<=0x08AC))||
    (c==0x200F)||(c==0xFB1D)||
    ((c>=0xFB1F)&&(c<=0xFB28))||
    ((c>=0xFB2A)&&(c<=0xFD3D))||
    ((c>=0xFD50)&&(c<=0xFDFC))||
    ((c>=0xFE70)&&(c<=0xFEFC))||
    ((c>=0x10800)&&(c<=0x1091B))||
    ((c>=0x10920)&&(c<=0x10A00))||
    ((c>=0x10A10)&&(c<=0x10A33))||
    ((c>=0x10A40)&&(c<=0x10B35))||
    ((c>=0x10B40)&&(c<=0x10C48))||
    ((c>=0x1EE00)&&(c<=0x1EEBB))
  ) return 1;
  return 0;
}

score -1 · Accepted Answer

如果您使用的是 Windows GDI，GetFontLanguageInfo(HDC) 似乎返回一个 DWORD；如果设置了 GCP_REORDER，则语言需要重新排序才能显示，例如希伯来语或阿拉伯语。

c++ - 如何识别 C++ 中的 RTL 字符串

4 回答 4

Related

Reference