0

在 Debian 测试(boost 1.49 ),我得到的结果对我来说没有意义。

“Muller”被认为小于“Müller”(如预期的那样),但“Muller 2”被认为比“Müller 1”多,尽管仅名称的结果不同。

我的代码、底层库或我的期望中是否有错误?

#include <locale.h>

#include <boost/locale.hpp>
#include <boost/assign/std/vector.hpp>
#include <boost/foreach.hpp>
#include <boost/assign/list_of.hpp>
#include <boost/algorithm/string/join.hpp>
#include <iostream>

int main(int argc, char **argv)
{
    setlocale(LC_ALL, "");

    std::cout << "backends: " <<
        boost::join(boost::locale::localization_backend_manager::global().get_all_backends(),
                    ", ") << std::endl;
    boost::locale::localization_backend_manager::global().select(argc > 2 ? argv[2] : "icu");
    std::locale loc = boost::locale::generator()(argc > 1 ? argv[1] : "de_DE.UTF-8");

    typedef boost::tuple<std::string, std::string> string_pair_t;
    std::vector<string_pair_t> pairs =
        boost::assign::tuple_list_of("Muller", "Müller")
        ("Muller 2", "Müller 1")
        ("Muller B", "Müller A");
    BOOST_FOREACH (const string_pair_t &pair, pairs) {
        const std::string &a = boost::get<0>(pair),
            &b = boost::get<1>(pair);
        int cmp = std::use_facet<boost::locale::collator<char> >(loc).
            compare(boost::locale::collator_base::identical, a, b);
        std::cout <<
            a << " and " << b <<
            " are " <<
            (cmp == 0 ? "identical" : "different") <<
            " (" <<
            (cmp < 0 ? '<' :
                   cmp > 0 ? '>' : '=') <<
            ")" << std::endl;
    }

    return 0;
}

我系统上的输出:

$ /tmp/mueller de_DE.utf-8 icu
backends: icu, posix, std
Muller and Müller are different (<)
Muller 2 and Müller 1 are different (>)
Muller B and Müller A are different (>)
4

1 回答 1

0

意外的结果来自不了解 Unicode 排序算法的工作原理:它总是从最低级别开始,如果字符串在该级别已经被认为不同,则使用该结果。允许更高级别只会导致不同的结果,其中较低级别没有发现字符串之间的相关差异。

于 2012-08-29T12:45:41.903 回答