1

我使用阿拉比卡包装 Xerces-c 来解析 XML。下面的示例代码在使用 .getNodeName() 方法时返回正确的名称,但在使用 .getNodeValue() 方法时返回正确的值:

bool readXML(bfs::path xmlfullfile) 
{
  // first check to see if the file exists
  if (!bfs::is_regular_file(xmlfullfile)) return false;

  Arabica::SAX2DOM::Parser<std::string> domParser;
  Arabica::SAX::CatchErrorHandler<std::string> eh;
  Arabica::DOM::Document<std::string> xmlDoc; 
  Arabica::SAX::InputSource<std::string> is;

  domParser.setErrorHandler(eh);
  is.setSystemId(xmlfullfile.string());
  domParser.parse(is);

  if(!eh.errorsReported()) 
  {
    xmlDoc = domParser.getDocument();
    xmlDoc.normalize();

    Arabica::DOM::NodeList<string_type> objects = xmlDoc.getElementsByTagName("object");
    for (size_t t = 0; t < objects.getLength(); t++) 
    {
      Arabica::DOM::Node<std::string> object = objects.item(t);
      Arabica::DOM::NodeList<std::string> values = object.getChildNodes(); 
      for (size_t u = 0; u < values.getLength(); u++) 
      {
        values.item(u).normalize(); 
        string name = values.item(u).getNodeName(); 
        string val = values.item(u).getNodeValue(); 
        cout << "Node streaming = \"" << values.item(u) << "\", meaning that name = \"" << name << "\" and value = \"" << val << "\"" << endl; 
      }
    }
    return true;
  } else {
    std::cerr << eh.errors() << std::endl;
    eh.reset();
    return false;
  }
}

我试图解析的示例 XML 是:

<annotation>
    <filename>1a.jpg</filename>
    <folder>Sample</folder>
    <source>
        <database>Some database</database>
        <annotation>Annotator</annotation>
        <image>Some source</image>
    </source>
    <size>
        <width>3264</width>
        <height>1840</height>
        <depth>0</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>somename</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <occluded>0</occluded>
        <bndbox>
            <xmin>48</xmin>
            <ymin>671</ymin>
            <xmax>3213</xmax>
            <ymax>1616</ymax>
        </bndbox>
    </object>
</annotation>

输出类似于:

Node streaming = "
                ", meaning that name = "#text" and value = "
                "
Node streaming = "<name>somename</name>", meaning that name = "name" and value = ""
Node streaming = "
                ", meaning that name = "#text" and value = "
                "
Node streaming = "<pose>Unspecified</pose>", meaning that name = "pose" and valu
e = ""
Node streaming = "
                ", meaning that name = "#text" and value = "
                "
Node streaming = "<truncated>0</truncated>", meaning that name = "truncated" and
 value = ""
Node streaming = "
                ", meaning that name = "#text" and value = "
                "
Node streaming = "<difficult>0</difficult>", meaning that name = "difficult" and
 value = ""
Node streaming = "
                ", meaning that name = "#text" and value = "
                "
Node streaming = "<occluded>0</occluded>", meaning that name = "occluded" and va
lue = ""
Node streaming = "
                ", meaning that name = "#text" and value = "
                "
Node streaming = "<bndbox>
                        <xmin>48</xmin>
                        <ymin>671</ymin>
                        <xmax>3213</xmax>
                        <ymax>1616</ymax>
                </bndbox>", meaning that name = "bndbox" and value = ""
Node streaming = "
        ", meaning that name = "#text" and value = "
        "

不太确定我做错了什么。由于 getNodeName() 返回正确的名称(当它当然不是 #text 时),所以 getNodeValue() 不返回任何内容的事实让我感到奇怪。

4

2 回答 2

1

您也在计算仅空白文本节点。在该位置添加一个不允许文本节点的 DTD 可能会有所帮助。非验证解析器必须报告所有空白节点,并且不允许对可忽略和不可忽略的内容做出假设。

底线,如果你想摆脱空白文本节点,你必须自己在你的 DOM 程序中编程

于 2013-01-11T10:13:07.350 回答
0

在将我的代码与其他一些 XML 库进行比较后,我找到了解决方案。显然,节点的值不是一个简单的文本字段,必须获得该简单叶节点的第一个子节点才能访问文本值。不知道我这样做的方式是否是最好的方式,但这里是代码以防其他人遇到同样的问题:

for (size_t u = 0; u < values.getLength(); u++) 
{
  string name = values.item(u).getNodeName();
  if (name == "#text") continue;
  string val = values.item(u).getFirstChild().getNodeValue(); 
  cout << "Node streaming = \"" << values.item(u) << "\", meaning that name = \"" << name << "\" and value = \"" << val << "\"" << endl; 
}

注意:生产代码应该考虑到并非所有节点都是简单叶节点的事实。所以我的代码只是解决方案的一半。

于 2012-12-28T21:03:29.210 回答