0

我正在尝试使用正则表达式删除 nbsp;从我的字符串。以下是程序。

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

     public class MyTest {

    private static final StringBuffer testRegex = 
        new StringBuffer("<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#ff6600\">Test</font></p><br><p>" +
        "<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#ff6600\">Test</font></p><br><p>" +
        "<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#ff6600\">Test</font>" +
        "<BLOCKQUOTE&nbsp;style=\"MARGIN-RIGHT:&nbsp;0px\"&nbsp;dir=ltr><br><p>Test</p><strong>" +
        "<FONT&nbsp;color=#333333>TestTest</font></strong></p><br><p>Test</p></blockquote>" +
        "<br><p>TestTest</p><br><BLOCKQUOTE&nbsp;style=\"MARGIN-RIGHT:&nbsp;0px\"&nbsp;dir=ltr><br><p>" +
        "<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#ffcc66\">TestTestTestTestTest</font><br>" +
        "<p>TestTestTestTest</p></blockquote><br><p>" +
        "<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#003333\">TestTestTest</font></p><p>" +
        "<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#003399\">TestTest</font></p><p>&nbsp;</p>");

    //"This&nbsp;is&nbsp;test<P>Tag&nbsp;Tag</P>";

    public static void main(String[] args) {
        System.out.println("***Testing***");
        String temp = checkRegex(testRegex);
        System.out.println("***FINAL = "+temp);

    }

    private static String checkRegex(StringBuffer sample){
        Pattern pattern = Pattern.compile("<[^>]+?&nbsp;[^<]+?>");      
        Matcher matcher = pattern.matcher(sample);      
        while (matcher.find()) {
            int start = matcher.start();
            int end = matcher.end();
            String group = matcher.group();
            System.out.println("start = "+start+" end = "+end+"" +"***GROUP = "+group);

            String substring = sample.substring(start, end);
            System.out.println(" Substring = "+substring);
            String replacedSubString = substring.replaceAll("&nbsp;"," ");  
            System.out.println("Replaced Substring = "+replacedSubString);

            sample.replace(start, end, replacedSubString);
            System.out.println(" NEW SAMPLE = "+sample);

        }
        System.out.println("********WHILE OVER ********");
        return sample.toString();
    }

}

我正在java.lang.StringIndexOutOfBoundsException排队while (matcher.find())。我目前正在使用 java Pattern 和 Matcher 来查找nbsp;并将其替换为" ". 有谁知道这是什么原因?我应该怎么做才能删除多余的nbsp;从我的字符串?

谢谢

4

3 回答 3

1

matcher.reset();之后使用sample.replace(start, end, replacedSubString);

这是因为当你替换字符串sample时,end会指向一个 无效的位置。所以,你需要 matcher.reset();在 every 之后使用replace

例如,如果 start 是 0 并且 end 是 5 并且当您替换&nbsp;为 时,end 将指向无效位置,然后如果 end 指向字符串长度之外的位置,则find方法将引发异常。StringIndexOutOfBoundsException


如果字符串很大,重置可能会导致主要的性能瓶颈,因为reset会再次从头开始匹配。您可以改用

 matcher.region(start,sample.length());

这将从最后匹配的位置开始匹配!

于 2013-04-23T05:17:38.013 回答
0
// change the group and it is source string is automatically updated

没有办法改变Java中的任何字符串,所以你要求的是不可能的。

可以通过调用来实现用字符串删除或替换模式

someString = someString.replaceAll(toReplace, replacement);

转换匹配的子字符串,如您的行所示

m.group().replaceAll("something","");

最好的解决方案可能是使用 aStringBuffer作为结果

Matcher.appendReplacement and Matcher.appendTail.

例子:

String regex = "ipsum";
String sourceString = "lorem ipsum dolor sit";

Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(sourceString);
StringBuffer sb = new StringBuffer();

while (m.find()) {
    // For example: transform match to upper case
    String replacement = m.group().toUpperCase();
    m.appendReplacement(sb, replacement);
}

m.appendTail(sb);

sourceString = sb.toString();

System.out.println(sourceString); // "lorem IPSUM dolor sit"
于 2013-04-23T05:36:35.037 回答
0

您需要创建一个新StringBuffer的来保存替换的字符串,然后使用类中的appendReplacement(StringBuffer sb, String replacement)appendTail(StringBuffer sb)方法Matcher来进行替换。可能有办法就地执行此操作,但上述方法是执行此操作的最直接方法。

这是您checkRegex重写的方法:

private static String checkRegex(String inputString){
    Pattern pattern = Pattern.compile("<[^>]+?&nbsp;[^<]+?>");      
    Matcher matcher = pattern.matcher(inputString);

    // Create a new StringBuffer to hold the string after replacement
    StringBuffer replacedString = new StringBuffer();

    while (matcher.find()) {
        // matcher.group() returns the substring that matches the whole regex
        String substring = matcher.group();
        System.out.println(" Substring = "+substring);

        String replacedSubstring = substring.replaceAll("&nbsp;"," "); 
        System.out.println("Replaced Substring = "+replacedSubstring);


        // appendReplacement is a clean approach to append the text which comes
        // before a match, and append the replacement text for the matched text

        // Note that appendReplacement will interpret $ in the replacement string
        // with special meaning (for referring to text matched by capturing group).
        // Matcher.quoteReplacement is necessary to provide a literal string as
        // replacement
        matcher.appendReplacement(replacedString, Matcher.quoteReplacement(replacedSubstring));

        System.out.println(" NEW SAMPLE = "+replacedString);
    }

    // appendTail is used to append the text after the last match to the
    // replaced string.
    matcher.appendTail(replacedString);

    System.out.println("********WHILE OVER ********");
    return replacedString.toString();
}
于 2013-04-23T05:38:23.037 回答