2

我有以下代码应该从字符串的一部分中删除所有 HTML,该字符串由美元符号引用(可能更多)。这很好用,但我还需要保留那些美元符号。任何建议,谢谢

private static String removeMarkupBetweenDollars(String input){
    if ((input.length()-input.replaceAll("\\$","").length())%2!=0)
    {
        throw new RuntimeException("Missing or extra: dollar");
    }
    Pattern pattern = Pattern.compile("\\$(.*?)\\$",Pattern.DOTALL);
    Matcher matcher = pattern.matcher(input);

    StringBuffer sb =new StringBuffer();

    while(matcher.find())
         { //prepending does NOT work, if sth. is in front of first dollar
        matcher.appendReplacement(sb,matcher.group(1).replaceAll("\\<.*?\\>", ""));
        sb.append("$"); //note this manual appending
    }
    matcher.appendTail(sb);
    System.out.println(sb.toString());

    return sb.toString();
}

感谢帮助!

        String input="<p>$<em>something</em>$</p>  <p>anything else</p>";
    String output="<p>$something$</p>  <p>anything else</p>";

更复杂的输入输出:

String input="<p>$ bar  <b>foo</b>  bar <span style=\"text-decoration: underline;\">foo</span>  $</p><p>another foos</p> $ foo bar <em>bar</em>$";
String output="<p>$ bar  foo  bar foo  $</p><p>another foos</p> $ foo bar bar$"
4

2 回答 2

1

只需对您的代码进行一些小调整:

private static String removeMarkupBetweenDollars(String input) {
    if ((input.length() - input.replaceAll("\\$", "").length()) % 2 != 0) {
        throw new RuntimeException("Missing or extra: dollar");
    }

    Pattern pattern = Pattern.compile("\\$(.*?)\\$", Pattern.DOTALL);
    Matcher matcher = pattern.matcher(input);

    StringBuffer sb = new StringBuffer();

    while (matcher.find()) {
        String s = matcher.group().replaceAll("<[^>]+>", "");
        matcher.appendReplacement(sb, Matcher.quoteReplacement(s));
    }
    matcher.appendTail(sb);

    return sb.toString();
}
于 2012-07-19T19:12:21.937 回答
0
String output = input.replaceAll("\\$<.*?>(.*?)<.*?>\\$", "\\$$1\\$");

正则表达式中的一个关键点是?in .*?- 这意味着“非贪婪”匹配,这反过来意味着“尽可能少地消耗输入”。如果没有这个,正则表达式将尝试尽可能多地消耗 - 直到$<html>foo</html>$输入中后续出现的结尾(如果存在)。

这是一个测试:

public static void main(String[] args) throws Exception {
    String input = "<p>$<em>something</em>$</p> <p>and $<em>anything</em>$ else</p>";
    String output = input.replaceAll("\\$<.*?>(.*?)<.*?>\\$", "\\$$1\\$");
    System.out.println(output);
}

输出:

<p>$something$</p> <p>and $anything$ else</p>
于 2012-07-19T18:58:30.710 回答