c++ - TinyXML2 C++ - 从旧的/格式错误的 XML 文件中提取特定数据

Question

我希望在相当旧的 XML 块中搜索（文档日期为 1999 年），但让 TinyXML2 按预期运行有点困难。我可以抓取某些片段，但是当另一个片段中有一个元素时我会遇到问题。拿这个样本：

  <SUBJECT><TITLE>Mathematics</TITLE></SUBJECT>
     <AREA><TITLE>Arithmetic</TITLE></AREA>
     <SECTION><TITLE>Whole Numbers</TITLE></SECTION> 
        <TOPIC GRADELEVEL="4"><TITLE>Introduction to Numbers</TITLE></TOPIC> 
          <DESCRIPTION><TITLE>Description</TITLE></DESCRIPTION>  
             <FIELDSPACE>
                <PARA>To represent each conceivable number by means of a separate
                  little picture or number symbol is impossible. Therefore the civilizations of
                  the past all developed a certain pattern whereby they could write down numbers,
                  by making use of a small number of symbols. </PARA>
             </FIELDSPACE> 
             <FIELDSPACE>
                <PARA>Today, we use the Hindu-Arabic system, which first of all is
                  decimal, because we make use of only 10 different symbols, namely,</PARA>
                <LITERALLAYOUT>     0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.</LITERALLAYOUT>
             </FIELDSPACE>
             <FIELDSPACE>
                <PARA>Secondly, a place value applies. This means that if only 1
                  digit is written down then it is that number, such as a 3, a 6, or an 8.</PARA>
             </FIELDSPACE>
             <FIELDSPACE>
                <PARA>Thirdly, only the addition principle is built into our number
                  symbols.</PARA>
                <PARA>In other words,</PARA>
                <LITERALLAYOUT>     135 means 100 + 300 + 5</LITERALLAYOUT>
                <LITERALLAYOUT>     6.3 means 6 + three tenths = 6 + <EQUATION>
<INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq.png" />
</EQUATION></LITERALLAYOUT>
                <LITERALLAYOUT>     and two and a quarter = <EQUATION>
<INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq2.png" />
</EQUATION></LITERALLAYOUT>
                <PARA>means</PARA>
                <LITERALLAYOUT>     two plus a quarter = <EQUATION>
<INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq3.png" />
</EQUATION></LITERALLAYOUT>
             </FIELDSPACE>

这是我写的：

    XMLDocument doc;
    Resource::resource_t *f = Resource::Open("IntroductionNumbers.xml"); // File load

        if (!f)
            return;

        doc.Parse((const char*)f->buffer, f->size);
        Resource::Close(f);

        XMLElement *pElem;
        pElem = doc.FirstChildElement();

        if (!pElem)
            return;
        for (pElem = pElem->FirstChildElement(); pElem; pElem = pElem->NextSiblingElement())
        {
            if (!strcmp(pElem->Value(), "SUBJECT"))
            {
                // Print what's in pElem->FirstChildElement("TITLE")->GetText()
                // This works fine.
            }
            else if (!strcmp(pElem->Value(), "AREA"))
            {
                // Print what's in pElem->FirstChildElement("TITLE")->GetText()
                // This works fine.
            }
...
...
...
             else if (!strcmp(pElem->Value(), "TOPIC"))
            {
                 char *temp;
                 temp = msprintf("%s - Section %s", pElem->FirstChildElement("TITLE")->GetText(), pElem->FirstAttribute()->Value());
                // Print what's in temp
                // This still works!
            }
             else if (!strcmp(pElem->Value(), "FIELDSPACE"))
            {
                // I can print PARA or FIELDSPACE, but I can't seem to read LITERALLAYOUT, EQUATION, or INLINEGRAPHIC.
            }
        }

我需要通用代码，而不是特定于该解决方案的代码——有数百个 XML 文件，我需要编写一些能够解析所有这些文件的东西。我将如何在 LITERALLAYOUT/EQUATION/INLINEGRAPHIC 中获取信息？

提前致谢！

score 0 · Accepted Answer

只是建立在先前的答案之上。这就是你所拥有的：

<LITERALLAYOUT>xxxxxxxxx
    <EQUATION>
        <INLINEGRAPHIC FILEREF="Mathematics/Arithmetic/WholeNumbers/IntroductionNumbers/eq.png" />
    </EQUATION>
</LITERALLAYOUT>

你有两件事在这里发生。当你到达时，LITERALLAYOUT你可以使用GetText它会返回xxxxxxxxx。

但是你有一个选择。如果您希望它是通用的，则必须迭代LITERALLAYOUT指针的所有子元素。如果您不想这样做，那么您必须提取第一个孩子，例如：

XMLElement *pLITERALLAYOUT = xxxx; // You get this pointer.

XMLElement *pEQUATION = pLITERALLAYOUT->FirstChildElement("EQUATION");
if (pEQUATION != nullptr)
{
    // Now get the INLINEGRAPHIC element
    XMLElement *pINLINEGRAPHIC = pEQUATION->FirstChildElement("INLINEGRAPHIC");

   if (pINLINEGRAPHIC != nullptr)
   {
       const char * FILEREF;
       FILEREF = pINLINEGRAPHIC ->Attribute("FILEREF");
   }
}

看？您必须知道导航 XML 文件的正确方法。

score 0 · Accepted Answer

EQUATION这里没有字符串值。它在标记中不包含任何文本。所以你不会得到任何回报。您需要查看元素的属性，EQUATION例如ig->attribute("FILEREF")，指向表示元素ig的结构的指针在哪里。INLINEGRAPHIC

c++ - TinyXML2 C++ - 从旧的/格式错误的 XML 文件中提取特定数据

2 回答 2

Related

Reference