所以,我有一个巨大的 XML 文件,我想删除所有 CDATA 部分并用安全的 html 编码文本节点替换 CDATA 节点内容。
只是用正则表达式去除 CDATA 当然会破坏解析。是否有 LINQ 或 XmlDocument 或 XmlTextWriter 技术将 CDATA 替换为编码文本?
我还不太关心最终的编码,只是如何用我选择的编码替换这些部分。
原始示例
---
<COLLECTION type="presentation" autoplay="false">
<TITLE><![CDATA[Rights & Responsibilities]]></TITLE>
<ITEM id="2802725d-dbac-e011-bcd6-005056af18ff" presenterGender="male">
<TITLE><![CDATA[Watch the demo]]></TITLE>
<LINK><![CDATA[_assets/2302725d-dbac-e011-bcd6-005056af18ff/presentation/presentation-00000000.mp4]]></LINK>
</ITEM>
</COLLECTION>
---
灵魂成为
<COLLECTION type="presentation" autoplay="false">
<TITLE>Rights & Responsibilities</TITLE>
<ITEM id="2802725d-dbac-e011-bcd6-005056af18ff" presenterGender="male">
<TITLE>Watch the demo</TITLE>
<LINK>_assets/2302725d-dbac-e011-bcd6-005056af18ff/presentation/presentation-00000000.mp4</LINK>
</ITEM>
</COLLECTION>
我想最终目标是转向 JSON。我试过这个
XmlDocument doc = new XmlDocument();
doc.Load(Server.MapPath( @"~/somefile.xml"));
string jsonText = JsonConvert.SerializeXmlNode(doc);
但我最终得到了丑陋的节点,即“#cdata-section”键。重新开发前端以接受这一点需要 WAAAAY 很多小时。
"COLLECTION":[{"@type":"whitepaper","TITLE":{"#cdata-section":"SUPPORTING DOCUMENTS"}},{"@type":"presentation","@autoplay":"false","TITLE":{"#cdata-section":"Demo Presentation"},"ITEM":{"@id":"2802725d-dbac-e011-bcd6-005056af18ff","@presenterGender":"male","TITLE":{"#cdata-section":"Watch the demo"},"LINK":{"#cdata-section":"_assets/2302725d-dbac-e011-bcd6-005056af18ff/presentation/presentation-00000000.mp4"}