4

我正在尝试使用创建一个 xml 对象<cfxml>。我用XMLFormat(). 在 XML 中有一些无效字符,例如 '»'。我将此字符添加到 xml 文档类型中,如下所示:

<!ENTITY raquo "»">

HTML 文本的格式不是很好,但大部分都适用于我的代码。但在某些文本中有一些控制字符。我收到以下错误:

在文档的元素内容中发现了无效的 XML 字符 (Unicode: 0x13)。

我尝试将 unicode 添加到 doctype 并尝试了此解决方案。两个都没用...

4

3 回答 3

2

这是清理 XML 的有效 cfscript 代码,有两种方法,一种清除较高的国际字符,另一种仅清除破坏 XML 的较低 ASCII 字符,如果发现更多字符,只需扩展过滤规则。

<cfscript>    
    function cleanHighAscii(text){
        var buffer = createObject("java", "java.lang.StringBuffer").init();
        var pattern = createObject("java", "java.util.regex.Pattern").compile(javaCast( "string", "[^\x00-\x7F]" ));
        var matcher = pattern.Matcher(javaCast( "string", text));

        while(matcher.find()){
            var value = matcher.group();
            var asciiValue = asc(value);

            if ((asciiValue == 8220) OR (asciiValue == 8221))
                value = """";
            else if ((asciiValue == 8216) || (asciiValue == 8217))
                value = "'";
            else if (asciiValue == 8230)
                value = "...";
            else
                value = "&###asciiValue#;";

            matcher.AppendReplacement(buffer, javaCast( "string", value ));
        }

        matcher.AppendTail(buffer);
        return buffer.ToString();
    }

    function removeSubAscii(text){

        return rereplaceNoCase(text, "\x1A","&###26#;", "all");
    }

    function XMLSafe(text){
        text = cleanHighAscii(text);
        text = removeSubAscii(text);
        return text;
    }
</cfscript>

其他可能性是用户 CF10 函数 encodeForXML():

https://learn.adobe.com/wiki/display/coldfusionen/EncodeForXML

或者直接使用 CF10 附带的 ESAPI,或者从 OWASP 站点https://www.owasp.org/index.php/ESAPI_Overview将 ESAPI jar 添加到您的旧 CF :

var esapi = createObject("java", "org.owasp.esapi.ESAPI");
var esapiEncoder = esapi.encoder();
return esapiEncoder.encodeForXML(text);
于 2013-08-22T14:03:36.253 回答
0

Try using &#187; instead of ». For example, this CFML:

<cfxml variable="x"><?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc
[
    <!ENTITY raquo "&#187;">
]>
<doc>
    Hello, &raquo; !
</doc>
</cfxml>

<cfdump var="#x#">
于 2012-12-06T18:50:29.730 回答
-1

Pass your XML string through this method, and this will solve your problem.

It allows only valid characters to be sent in the input, if you want to replace invalids with some other character, you can modify the below method to do that

public String stripNonValidXMLCharacters(String in) {
    StringBuffer out = new StringBuffer(); // Used to hold the output.
    char current; // Used to reference the current character.

    if (in == null || ("".equals(in))) return ""; // vacancy test.
    for (int i = 0; i < in.length(); i++) {
        current = in.charAt(i);
        if ((current == 0x9) ||
            (current == 0xA) ||
            (current == 0xD) ||
            ((current >= 0x20) && (current <= 0xD7FF)) ||
            ((current >= 0xE000) && (current <= 0xFFFD)) ||
            ((current >= 0x10000) && (current <= 0x10FFFF)))
            out.append(current);
    }
    return out.toString();
}  
于 2013-08-21T08:33:34.260 回答