3

我正在使用 StringEscapeUtils 来转义和取消转义 html。我有以下代码

import org.apache.commons.lang.StringEscapeUtils;

public class EscapeUtils {

    public static void main(String args[]) {

        String string = "    4-Spaces    ,\"Double Quote\", 'Single Quote', \\Back-Slash\\, /Forward Slash/ ";

        String escaped = StringEscapeUtils.escapeHtml(string);
        String myEscaped = escapeHtml(string);

        String unescaped = StringEscapeUtils.unescapeHtml(escaped);
        String myUnescaped = StringEscapeUtils.unescapeHtml(myEscaped);

        System.out.println("Real String: " + string);
        System.out.println();
        System.out.println("Escaped String: " + escaped);
        System.out.println("My Escaped String: " + myEscaped);
        System.out.println();
        System.out.println("Unescaped String: " + unescaped);
        System.out.println("My Unescaped String: " + myUnescaped);
        System.out.println();
        System.out.println("Comparison:");
        System.out.println("Real String == Unescaped String: " + string.equals(unescaped));
        System.out.println("Real String == My Unescaped String: " + string.equals(myUnescaped));
        System.out.println("Unescaped String == My Unescaped String: " + unescaped.equals(myUnescaped));

    }

    public static String escapeHtml(String s) {
        String escaped = "";
        if(null != s) {
            escaped = StringEscapeUtils.escapeHtml(s);
            escaped = escaped.replaceAll(" "," ");
            escaped = escaped.replaceAll("'","'");
            escaped = escaped.replaceAll("\\\\","\");
            escaped = escaped.replaceAll("/","/");
        }
        return escaped;
    }

}

输出:

Real String:     4-Spaces    ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/ 

Escaped String:     4-Spaces    ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/ 
My Escaped String:     4-Spaces    ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/ 

Unescaped String:     4-Spaces    ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/ 
My Unescaped String:     4-Spaces    ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/ 

Comparison:
Real String == Unescaped String: true
Real String == My Unescaped String: false
Unescaped String == My Unescaped String: false

escaped是真实的string然后unescaped它。但myEsceped首先使用相同的过程进行转义,然后将更多的 html 字符替换为其 html 代码。myUnescaped实际上是 unescape ,myEscaped其内容与真实字符串的内容相同。

输出显示 real string, unescaped, 和myUnescaped内容相同。但是,与比较部分一样,myUnescaped不等于stringand unescaped

我还不明白这里到底发生了什么。谁能解释一下?

4

2 回答 2

3

This due to while escaping HTML, you are replacing ' ' with  

public static String escapeHtml(String s) {
        String escaped = "";
        if(null != s) {
            escaped = StringEscapeUtils.escapeHtml(s);
            escaped = escaped.replaceAll(" "," "); // HERE
            escaped = escaped.replaceAll("'","'");
            escaped = escaped.replaceAll("\\\\","\");
            escaped = escaped.replaceAll("/","/");
        }
        return escaped;
    }

While StringEscapeUtils.escapeHtml does not escape ' ', below is the example on their site:

"bread" & "butter" 

becomes

"bread" & "butter"

Which means StringEscapeUtils.escapeHtml preserves spaces

If from escapeHtml you remove escaped = escaped.replaceAll(" "," ");, unescaped and myUnescaped match !

于 2013-04-25T07:05:19.253 回答
1

Apurv Answer之后,我分析了字符串的字节数组。

String:        32,  32,  32,  32,  52,  45,  83, 112,  97,  99, 101, 115,  32,  32,  32,  32,  44,  34,  68, 111, 117,  98, 108, 101,  32,  81, 117, 111, 116, 101,  34,  44,  32,  39,  83, 105, 110, 103, 108, 101,  32,  81, 117, 111, 116, 101,  39,  44,  32,  92,  66,  97,  99, 107,  45,  83, 108,  97, 115, 104,  92,  44,  32,  47,  70, 111, 114, 119,  97, 114, 100,  32,  83, 108,  97, 115, 104,  47,  32
unescaped :    32,  32,  32,  32,  52,  45,  83, 112,  97,  99, 101, 115,  32,  32,  32,  32,  44,  34,  68, 111, 117,  98, 108, 101,  32,  81, 117, 111, 116, 101,  34,  44,  32,  39,  83, 105, 110, 103, 108, 101,  32,  81, 117, 111, 116, 101,  39,  44,  32,  92,  66,  97,  99, 107,  45,  83, 108,  97, 115, 104,  92,  44,  32,  47,  70, 111, 114, 119,  97, 114, 100,  32,  83, 108,  97, 115, 104,  47,  32
myUnescaped:  -96, -96, -96, -96,  52,  45,  83, 112,  97,  99, 101, 115, -96, -96, -96, -96,  44,  34,  68, 111, 117,  98, 108, 101, -96,  81, 117, 111, 116, 101,  34,  44, -96,  39,  83, 105, 110, 103, 108, 101, -96,  81, 117, 111, 116, 101,  39,  44, -96,  92,  66,  97,  99, 107,  45,  83, 108,  97, 115, 104,  92,  44, -96,  47,  70, 111, 114, 119,  97, 114, 100, -96,  83, 108,  97, 115, 104,  47, -96

我似乎在myUnescaped,空格已转换为 ascii-96而不是32.

所以我写了一个unescapeHtml方法如下。此方法首先替换&nbsp为空格,然后用于StringEscapeUtils对 html 进行转义。

public static String unescapeHtml(String s) {
    String unescaped = "";
    if(null != s) {
        unescaped = s.replaceAll(" ", " ");
        unescaped = StringEscapeUtils.unescapeHtml(unescaped);
    }
    return unescaped;
}

然后我开始myUnescaped使用以下代码。

String myUnescaped = unescapeHtml(myEscaped);

这给了我myUnescaped等于stringand的字符串unescaped

或者我替换  . 这不需要我写方法unescapeHtml。更新escapeHtml方法的代码如下。

public static String escapeHtml(String s) {
    String escaped = "";
    if(null != s) {
        escaped = StringEscapeUtils.escapeHtml(s);
        escaped = escaped.replaceAll(" "," ");    //updated line 
        escaped = escaped.replaceAll("'","'");
        escaped = escaped.replaceAll("\\\\","\");
        escaped = escaped.replaceAll("/","/");
    }
    return escaped;
}
于 2013-04-25T10:10:44.523 回答