java - 转义 html 除了一些特殊字符

Question

为了防止 html 代码注入和跨站点脚本，为服务请求构建了一个过滤器，以使用以下方法转义某些字符：StringEscapeUtils.escapeHtml(text)

但是，这也会转义一些 UTF8 字符，例如 äöü。在调用“StringEscapeUtils.escapeHtml”之前使用 excludeList 并将这些值转换为它们的哈希码，并在此调用之后从哈希值转换回字符串，可以解决问题。但这不是一个非常优雅的解决方案！

    String[] excludeList = {"ü", "Ü", "ö", "Ö", "ä", "Ä", "ß"};

    private static String escapeHtml(String text, String[] exclusionList) {
    TreeMap<Integer, String> excludeTempMap = new TreeMap<Integer, String>();

    //replace characters from exclusionList in the text with their equivalent hashCode
    for(String excludePart : exclusionList) {
        Matcher matcher = Pattern.compile(excludePart, Pattern.MULTILINE).matcher(text);

        while(matcher.find()) {
            String match = matcher.group();
            Integer matchHash = match.hashCode();

            text = matcher.replaceFirst(String.valueOf(matchHash));

            excludeTempMap.put(matchHash, match);

            matcher.reset(text);
        }
    }

    //escape malicious html characters
    text = StringEscapeUtils.escapeHtml(text);

    //replace back characters from exclusionList from hash values to string
    for(Map.Entry<Integer, String> excludeEntry : excludeTempMap.entrySet()) {
        text = text.replaceAll(
            String.valueOf(excludeEntry.getKey()),
            excludeEntry.getValue()
        );
    }

    return text;
}

有人有提示如何通过更好的解决方案实现这一目标吗？他们是一个更好的库，可用于将某些语言特定字符列入白名单吗？

java - 转义 html 除了一些特殊字符

0 回答 0

Related

Reference