1

我的问题和这个未回答的问题一样吗?

如何使用 rapidxml 读取 Unicode XML 值

但是我的 XML 的内容是用 UTF-8 编码的。我是 MS Visual Studio、C++ 的新手。

我的问题是,我们如何将 UTF-8 字符串读入 wchar_t 类型的字符串?

说,我定义了一个这样的结构,

typedef struct{
    vector<int> stroke_labels;
    int stroke_count;
    wchar_t* uni_val;
}WORD_DETAIL;

当我从 xml 读取值时,我使用..

WORD_DETAIL this_detail;
this_detail.uni_val=curr_word->first_node("labelDesc")->first_node("annotationDetails")->first_node("codeSequence")->value();

但是正在存储的 utf-8 字符串并不像预期的那样。他们是腐败的角色。

我的问题是:

  1. 如何使用 rapidxml 读取 Unicode/Utf-8 值?
  2. 是否有更简单的 xml 解析器可以做同样的事情?
  3. 任何示例代码将不胜感激。

在此处的第 2.1 节中提到了

请注意,RapidXml 不执行解码 - name() 和 value() 函数返回的字符串将包含使用与源文件相同的编码进行编码的文本。

如果我的 XML 的编码是 UTF-8 ,那么获取 ->value() 函数返回值的最佳方法是什么?

提前致谢。

4

1 回答 1

3

Remember that RapidXML is an 'in-situ' parser: It parses the XML and modifies the content by adding null terminators in the correct places (and other things).

So the value() function is really just returning a char * pointer into your original data. If that's UTF-8, then RapidXML returns a pointer to a UTF-8 character string. In other words, you're already doing what you asked for in the question title.

But, in the code snippet you posted you want to store a wchar_t in a struct. First off, I recommend you don't do that at all, because of the memory ownership issues. Remember, you're meant to be using C++, not C. And if you really want to store a raw pointer, why not the UTF-8 one you already have? http://www.utf8everywhere.org/

But, because it's windows there's a (remote) chance you'll need to pass a wide char array to an API function. If so, you will need to convert UTF-8 to Wide chars, using the OS function MultiByteToWideChar

// Get the UTF-8
char *str = xml->first_node("codeSequence")->value();

// work out the size
int size = MultiByteToWideChar(CP_UTF8, 0, str, -1, NULL, 0);

// allocate a vector for that size
std::vector<wchar_t> wide(size);

// do the conversion
MultiByteToWideChar(CP_UTF8, 0, str, -1, &wide[0], size);
于 2013-10-15T13:30:00.413 回答