3

I have an XML document with a tag that contains a user entered message, I would like to avoid unnecessary escaping of characters.

According to the link below the only strictly illegal characters are "<" and "&".

Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is a good habit to replace it.

http://www.w3schools.com/xml/xml_syntax.asp

But in some parsers i encountered problems with the sequence ]]>, is this due to problems with the parsers or is it really defined as illegal somewhere in the XML-standard?

Example message:

<?xml version="1.0" encoding="UTF-8" ?> 
<root>
  <message>&lt;!-- -- -- &lt;![CDATA[&quot;TEST&quot;]]></message>
  <signature>Evil</signature>
</root>

As you can see < and & are escaped and this message is successfully parsed by C++ tinyxml and Java JAXB. Both Firefox 20.0.1 and IE 8.0 tell me

XML Parsing Error: not well-formed

and

The literal string ']]>' is not allowed in element content.

respectively.

Is this really a standard enforced behavior?

EDIT: Should have searched some more it seems, Legally use CDATA in XML. So I guess the XML parser in Firefox and IE are just broken?

4

1 回答 1

3

XML 规范(强调我的):

& 符号 (&) 和左尖括号 (<) 不得以其文字形式出现,除非用作标记分隔符,或者在注释、处理指令或 CDATA 部分中。如果在其他地方需要它们,它们必须分别使用数字字符引用或字符串“ &amp;”和“ &lt;”进行转义。右尖括号 (>) 可以使用字符串 " &gt;" 表示,并且为了兼容性,当它出现在内容中的字符串 " " 中时,必须使用 " &gt;" 或字符引用进行转义]]>,当该字符串未标记CDATA 部分的结尾。

这意味着只要]]>不使用分隔符来标记 CDATA 部分的结尾以供读取此文档的 XML 解析器使用,不转义它就是不合法的,即使它不在 CDATA 的上下文中发生部分。

我不熟悉浏览器内部使用的 XML 解析器,但鉴于出于兼容性原因,此要求已到位,您的猜测似乎是正确的。

于 2013-04-19T09:25:38.177 回答