libxml2
似乎将其所有字符串存储在 UTF-8 中,如xmlChar *
.
/**
* xmlChar:
*
* This is a basic byte in an UTF-8 encoded string.
* It's unsigned allowing to pinpoint case where char * are assigned
* to xmlChar * (possibly making serialization back impossible).
*/
typedef unsigned char xmlChar;
与libxml2
C 库一样,没有提供std::wstring
从xmlChar *
. 我想知道在 C++11 中转换为 a的谨慎方法是否是通过类似这样的方法使用mbstowcs C 函数(正在进行中):xmlChar *
std::wstring
std::wstring xmlCharToWideString(const xmlChar *xmlString) {
if(!xmlString){abort();} //provided string was null
int charLength = xmlStrlen(xmlString); //excludes null terminator
wchar_t *wideBuffer = new wchar_t[charLength];
size_t wcharLength = mbstowcs(wideBuffer, (const char *)xmlString, charLength);
if(wcharLength == (size_t)(-1)){abort();} //mbstowcs failed
std::wstring wideString(wideBuffer, wcharLength);
delete[] wideBuffer;
return wideString;
}
编辑:仅供参考,我非常清楚xmlStrlen
返回的是什么;它是xmlChar
用于存储字符串的数量;我知道这不是字符数,而是unsigned char
. 如果我给它命名,它会不会那么混乱byteLength
,但我认为它会更清楚,因为我同时拥有charLength
和wcharLength
。至于代码的正确性,wideBuffer 总是会大于或等于保存缓冲区所需的大小(我相信)。作为需要更多空间而不是wide_t
被截断的字符(我认为)。