在带有 libxml2 的 iOS 应用程序中,在解析这个 HTML 片段(它是大页面的一部分)时 -
...
<span class="ingredient">
<span class="amount">
<span class="value">500 </span>
<span class="type">g</span>
</span>
<a href="...">bread flour</a>
or
<span class="ingredient">
<span class="amount">
<span class="value">500 </span>
<span class="type">g</span>
</span>
<span class="name">
<a href="...">all-purpose flour</a>
</span>
</span>
</span>
...
我只需要提取文本:“500 克面包粉或 500 克通用面粉”。
//span[@class="ingredient"]
XPath 查询的解析 NSDictionary 结果返回 -
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = ingredient;
}
);
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = amount;
}
);
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = value;
}
);
nodeContent = 500;
nodeName = span;
},
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = type;
}
);
nodeContent = g;
nodeName = span;
}
);
nodeContent = "";
nodeName = span;
},
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "http://www.food.com/library/flour-64";
}
);
nodeContent = "bread flour";
nodeName = a;
},
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = ingredient;
}
);
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = amount;
}
);
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = value;
}
);
nodeContent = 500;
nodeName = span;
},
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = type;
}
);
nodeContent = g;
nodeName = span;
}
);
nodeContent = "";
nodeName = span;
},
{
nodeAttributeArray = (
{
attributeName = class;
nodeContent = name;
}
);
nodeChildArray = (
{
nodeAttributeArray = (
{
attributeName = href;
nodeContent = "http://www.food.com/library/flour-64";
}
);
nodeContent = "all-purpose flour";
nodeName = a;
}
);
nodeContent = "";
nodeName = span;
}
);
nodeContent = "";
nodeName = span;
}
);
nodeContent = or;
nodeName = span;
}
问题是字典根的“nodeContent”是文本“或”,并且所有标签都作为根节点的子节点,所以片段的顺序丢失了 - 我不能说或者实际上在中间在连接所有文本时,我得到以下字符串:“或 500 克面包粉 500 克通用面粉”。
任何人都可以想出一种方法来在 1 个 XPath 查询中提取纯文本,或者使用 XPath 引擎来读取元素的有序列表吗?