0

我有以下字符串:

String input = "Remove from em?ty sentence 1? Remove from sentence 2! But not from ip address 190.168.10.110!";

我想删除正确位置的标点符号。我的输出需要是:

String str = "Remove from em?ty sentence 1 Remove from sentence 2 But not from ip address 190.168.10.110";

我正在使用以下代码:

while (stream.hasNext()) { 
    token = stream.next();
    char[] tokenArray = token.toCharArray();
    token = token.trim();

    if(token.matches(".*?[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}[\\.\\?!]+")){
        System.out.println("case2");
        stream.previous();
        int len = token.length()-1;
        for(int i = token.length()-1; i>7; i--){
            if(tokenArray[i]=='.'||tokenArray[i]=='?'||tokenArray[i]=='!'){
                --len;
            }
            else
                break;
        }
        stream.set(token.substring(0, len+1));
    }
    else if(token.matches(".*?\\b[a-zA-Z_0-9]+\\b[\\.\\?!]+")){
        System.out.println("case1");
        stream.previous();
        str = token.replaceAll("[\\.\\?!]+", "");
        stream.set(str);

        System.out.println(stream.next());                          
    }
}

“令牌”是从“输入”字符串发送的。您能否指出我在正则表达式或逻辑方面做错了什么?

标点符号在句子结束时被视为一个标点符号,它不存在于 IP 地址中,也不存在于诸如 , 之类的单词中!trueemp?ty不要理会它们)。也可以后跟空格或字符串结尾。

4

5 回答 5

1

您可以使用此模式:

\\p{Punct}(?=\\s|$)

并用任何东西代替它。

例子:

String subject = "Remove from em?ty sentence 1? Remove from sentence 2! But not from ip address 190.168.10.110!";
String regex = "\\p{Punct}(?=\\s|$)";
String result = subject.replaceAll(regex, "");
System.out.println(result);
于 2013-10-06T12:29:24.847 回答
0
String input = "Remove from em?ty sentence 1? Remove from sentence 2! But not from ip address 190.168.10.110!";
System.out.println(input.replaceAll("[?!]", ""));

给出输出:

Remove from emty sentence 1 Remove from sentence 2 But not from ip address 190.168.10.110
于 2013-10-06T12:38:56.757 回答
0

这样的事情可能会奏效。它排除了一切,然后选择对你来说
重要的标点符号。[,.!?]

只需更换 $1

    # ([^\pL\pN\s]*[\pL\pN](?:[\pL\pN_-]|\pP(?=[\pL\pN\pP_-]))*)|[,.!?]
    # "([^\\pL\\pN\\s]*[\\pL\\pN](?:[\\pL\\pN_-]|\\pP(?=[\\pL\\pN\\pP_-]))*)|[,.!?]"

    (                              # (1 start)
         [^\pL\pN\s]* [\pL\pN] 
         (?:
              [\pL\pN_-] 
           |  \pP 
              (?= [\pL\pN\pP_-] )
         )*
    )                              # (1 end)
 |  
    [,.!?] 
于 2013-10-06T16:10:20.460 回答
0

为什么不使用

string.replaceAll("[?!] ", ""));
于 2013-10-06T12:44:42.250 回答
0

我会反过来做。

if(token.matches("[\\.\\!\\:\\?\\;] "){
token.replace("");
}

现在,我假设标点符号将有一个尾随空格。它只省略了句子中的最后一个标点符号,您可以单独删除。

于 2013-10-06T12:46:50.620 回答