7

哪些开源 C 或 C++ 库可以将任意 UTF-32 转换为NFC

到目前为止,我认为可以做到这一点的库:ICU、Qt、GLib(不确定?)。

我不需要任何其他复杂的 Unicode 支持;只是从任意但已知正确的 UTF-32 转换为NFC形式的 UTF-32。

我对可以直接执行此操作的库最感兴趣。例如,Qt 和 ICU(据我所知)都通过与 UTF-16 之间的中间转换阶段完成所有工作。

4

2 回答 2

2

ICU 或Boost.Locale(包装 ICU)将在很长很长一段时间内成为你最好的。规范化映射将与来自更多软件的映射等效,我认为这是此转换的重点。

于 2011-12-01T04:53:08.937 回答
0

Here is the main part of the code I ended up using after deciding on ICU. I figured I should put it here in case it helps someone who tries this same thing.

std::string normalize(const std::string &unnormalized_utf8) {
    // FIXME: until ICU supports doing normalization over a UText
    // interface directly on our UTF-8, we'll use the insanely less
    // efficient approach of converting to UTF-16, normalizing, and
    // converting back to UTF-8.

    // Convert to UTF-16 string
    auto unnormalized_utf16 = icu::UnicodeString::fromUTF8(unnormalized_utf8);

    // Get a pointer to the global NFC normalizer
    UErrorCode icu_error = U_ZERO_ERROR;
    const auto *normalizer = icu::Normalizer2::getInstance(nullptr, "nfc", UNORM2_COMPOSE, icu_error);
    assert(U_SUCCESS(icu_error));

    // Normalize our string
    icu::UnicodeString normalized_utf16;
    normalizer->normalize(unnormalized_utf16, normalized_utf16, icu_error);
    assert(U_SUCCESS(icu_error));

    // Convert back to UTF-8
    std::string normalized_utf8;
    normalized_utf16.toUTF8String(normalized_utf8);

    return normalized_utf8;
}
于 2013-02-03T01:37:41.820 回答