我使用阿拉比卡包装 Xerces-c 来解析 XML。下面的示例代码在使用 .getNodeName() 方法时返回正确的名称,但在使用 .getNodeValue() 方法时返回正确的值:
bool readXML(bfs::path xmlfullfile)
{
// first check to see if the file exists
if (!bfs::is_regular_file(xmlfullfile)) return false;
Arabica::SAX2DOM::Parser<std::string> domParser;
Arabica::SAX::CatchErrorHandler<std::string> eh;
Arabica::DOM::Document<std::string> xmlDoc;
Arabica::SAX::InputSource<std::string> is;
domParser.setErrorHandler(eh);
is.setSystemId(xmlfullfile.string());
domParser.parse(is);
if(!eh.errorsReported())
{
xmlDoc = domParser.getDocument();
xmlDoc.normalize();
Arabica::DOM::NodeList<string_type> objects = xmlDoc.getElementsByTagName("object");
for (size_t t = 0; t < objects.getLength(); t++)
{
Arabica::DOM::Node<std::string> object = objects.item(t);
Arabica::DOM::NodeList<std::string> values = object.getChildNodes();
for (size_t u = 0; u < values.getLength(); u++)
{
values.item(u).normalize();
string name = values.item(u).getNodeName();
string val = values.item(u).getNodeValue();
cout << "Node streaming = \"" << values.item(u) << "\", meaning that name = \"" << name << "\" and value = \"" << val << "\"" << endl;
}
}
return true;
} else {
std::cerr << eh.errors() << std::endl;
eh.reset();
return false;
}
}
我试图解析的示例 XML 是:
<annotation>
<filename>1a.jpg</filename>
<folder>Sample</folder>
<source>
<database>Some database</database>
<annotation>Annotator</annotation>
<image>Some source</image>
</source>
<size>
<width>3264</width>
<height>1840</height>
<depth>0</depth>
</size>
<segmented>0</segmented>
<object>
<name>somename</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<occluded>0</occluded>
<bndbox>
<xmin>48</xmin>
<ymin>671</ymin>
<xmax>3213</xmax>
<ymax>1616</ymax>
</bndbox>
</object>
</annotation>
输出类似于:
Node streaming = "
", meaning that name = "#text" and value = "
"
Node streaming = "<name>somename</name>", meaning that name = "name" and value = ""
Node streaming = "
", meaning that name = "#text" and value = "
"
Node streaming = "<pose>Unspecified</pose>", meaning that name = "pose" and valu
e = ""
Node streaming = "
", meaning that name = "#text" and value = "
"
Node streaming = "<truncated>0</truncated>", meaning that name = "truncated" and
value = ""
Node streaming = "
", meaning that name = "#text" and value = "
"
Node streaming = "<difficult>0</difficult>", meaning that name = "difficult" and
value = ""
Node streaming = "
", meaning that name = "#text" and value = "
"
Node streaming = "<occluded>0</occluded>", meaning that name = "occluded" and va
lue = ""
Node streaming = "
", meaning that name = "#text" and value = "
"
Node streaming = "<bndbox>
<xmin>48</xmin>
<ymin>671</ymin>
<xmax>3213</xmax>
<ymax>1616</ymax>
</bndbox>", meaning that name = "bndbox" and value = ""
Node streaming = "
", meaning that name = "#text" and value = "
"
不太确定我做错了什么。由于 getNodeName() 返回正确的名称(当它当然不是 #text 时),所以 getNodeValue() 不返回任何内容的事实让我感到奇怪。