这是一些很棒的 XML 示例:
<root>
<section>Here is some text<mightbe>a tag</mightbe>might <not attribute="be" />. Things are just<label>a mess</label>but I have to parse it because that's what needs to be done and I can't <font stupid="true">control</font> the source. <p>Why are there p tags here?</p>Who knows, but there may or may not be spaces around them so that's awesome. The point here is, there's node soup inside the section node and no definition for the document.</section>
</root>
我只想从部分节点和所有子节点中获取文本作为字符串。但是,请注意子节点周围可能有也可能没有空格,所以我想填充子注释并附加一个空格。
这是一个更精确的示例,说明输入可能是什么样的,以及我希望输出是什么样的:
<root>
<sample>A good story is the<book>Hitchhikers Guide to the Galaxy</book>. It was published<date>a long time ago</date>. I usually read at<time>9pm</time>.</sample>
</root>
我希望输出是:
A good story is the Hitchhikers Guide to the Galaxy. It was published a long time ago. I usually read at 9pm.
请注意,子节点周围没有空格,因此我需要填充它们,否则单词会一起运行。
我试图使用这个示例代码:
XDocument doc = XDocument.Parse(xml);
foreach(var node in doc.Root.Elements("section"))
{
output += String.Join(" ", node.Nodes().Select(x => x.ToString()).ToArray()) + " ";
}
但是输出包括子标签,并且不会起作用。
这里有什么建议吗?
TL;DR: 获得了节点汤 xml,并希望通过子节点周围的填充对其进行字符串化。