0

前段时间,我编写了一个将 ONIX 文件导入零售数据库系统的程序。(ONIX 是出版商用于发布其目录信息的 XML 标准。)该过程将 XML 文件直接导入数据集,并且对于我们收到的大多数文件都足够好,但偶尔会有例外。

在这种特殊情况下,我尝试导入的文件在产品描述字段中包含 HTML 标记,这对标准 Dataset.ReadXML() 方法造成了严重破坏,因为它试图将 HTML 标记解释为 XML。一些 ONIX 文件包含避免此问题的 CDATA 标记,但是在这种情况下,发布者选择使用标记属性来指定该字段为 HTML 格式,如下所示:

    <othertext>
        <d102>03</d102>
        <d104 textformat="05">
            <p>Enter a world where bloody battles, and heroic deeds combine in the historic struggle to unite Britain in the face of a common enemy.</p>
            <p>The third instalment in Bernard Cornwell’s King Alfred series, follows on from the outstanding previous novels The Last Kingdom and The Pale Horseman.</p>
            <p>The year is 878 and the Vikings have been thrown out of Wessex. Uhtred, fresh from fighting for Alfred in the battle to free Wessex, travels north to seek revenge for his father's death, killed in a bloody raid by Uhtred's old enemy, renegade Danish lord, Kjartan.</p>
            <p>While Kjartan lurks in his formidable stronghold of Dunholm, the north is overrun by chaos, rebellion and fear. Together with a small band of warriors, Uhtred plans his attack on his enemy, revenge fuelling his anger, resolute on bloody retribution. But, he finds himself betrayed and ends up on a desperate slave voyage to Iceland. Rescued by a remarkable alliance of old friends and enemies, he and his allies, together with Alfred the Great, are free to fight once more in a battle for power, glory and honour.</p>
            <p>‘The Lords of the North’ is a tale of England's making, a powerful story of betrayal, struggle and romance, set in an England torn apart by turmoil and upheaval.</p>
        </d104>
    </othertext>

textformat="05" 属性表示 HTML。

如果不编写用于解释 HTML 的自定义代码,是否仍然可以使用 ReadXML() 导入它,还是我需要先以编程方式插入 CDATA 标记来解决它?

注意:我不想去掉 HTML 标签,因为数据会出现在网站上。

4

1 回答 1

1

这是Linqpad中的一个程序,它应该找到 textformat=05 节点并将其内容包装在 CData 部分中。请参阅此stackoverflow 帖子

void Main()
{
    string xml = @"<othertext>
            <d102>03</d102>
            <d104 textformat=""05"">
                <p>Enter a world where bloody battles, and heroic deeds combine in the historic struggle to unite Britain in the face of a common enemy.</p>
                <p>The third instalment in Bernard Cornwell’s King Alfred series, follows on from the outstanding previous novels The Last Kingdom and The Pale Horseman.</p>
                <p>The year is 878 and the Vikings have been thrown out of Wessex. Uhtred, fresh from fighting for Alfred in the battle to free Wessex, travels north to seek revenge for his father's death, killed in a bloody raid by Uhtred's old enemy, renegade Danish lord, Kjartan.</p>
                <p>While Kjartan lurks in his formidable stronghold of Dunholm, the north is overrun by chaos, rebellion and fear. Together with a small band of warriors, Uhtred plans his attack on his enemy, revenge fuelling his anger, resolute on bloody retribution. But, he finds himself betrayed and ends up on a desperate slave voyage to Iceland. Rescued by a remarkable alliance of old friends and enemies, he and his allies, together with Alfred the Great, are free to fight once more in a battle for power, glory and honour.</p>
                <p>‘The Lords of the North’ is a tale of England's making, a powerful story of betrayal, struggle and romance, set in an England torn apart by turmoil and upheaval.</p>
            </d104>
        </othertext>";

    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.LoadXml(xml);
    var nodes = xmlDoc.SelectNodes("//othertext/*[@textformat='05']");
    foreach(XmlNode node in nodes)
    {
        var cdata = xmlDoc.CreateCDataSection(node.InnerXml);
        node.InnerText = string.Empty;
        node.AppendChild(cdata);
        node.InnerXml.Dump(); 
    }
}
于 2013-07-16T03:13:03.063 回答