19

我正在编辑一些从 tesseract ocr 获得的电子邮件。

这是我的代码:

 if (email != null) {
        email = email.replaceAll(" ", "");
        email = email.replaceAll("caneer", "career");
        email = email.replaceAll("canaer", "career");
        email = email.replaceAll("canear", "career");
        email = email.replaceAll("caraer", "career");
        email = email.replaceAll("carear", "career");
        email = email.replace("|", "l");
        email = email.replaceAll("}", "j");
        email = email.replaceAll("j3b", "job");
        email = email.replaceAll("gmaii.com", "gmail.com");
        email = email.replaceAll("hotmaii.com", "hotmail.com");
        email = email.replaceAll(".c0m", ".com");
        email = email.replaceAll(".coin", ".com");
        email = email.replaceAll("consuit", "consult");
    }
    return email;

但是输出不正确。

输入 :

amrut=ac.hrworks@g mai|.com

输出 :

lalcl.lhlrlwlolrlklsl@lglmlalil|l.lclolml

但是当我在每次替换后将结果分配给一个新字符串时,它工作正常。为什么同一字符串中的连续赋值不起作用?

4

6 回答 6

43

您将在String.replaceAll() 的 Javadoc 中注意到第一个参数是正则表达式

句号 ( ) 和竖线 ( ) 和花括号 ( .) 都有特殊含义。您需要将它们全部转义,例如:|}

email = email.replaceAll("gmaii\\.com", "gmail.com");
于 2013-02-12T05:44:31.160 回答
12

(这是 Java 吗?)

请注意,在 Java 中,replaceAll 接受正则表达式并且点匹配任何字符。您需要转义点或使用

somestring.replaceAll(Pattern.quote("gmail.com"), "replacement");

还要注意这里的错字:

email = emai.replaceAll("canear", "career");

应该

email = email.replaceAll("canear", "career");
于 2013-02-12T05:52:08.890 回答
6

您必须.通过\\.以下方式逃脱:

if (email != null) {
    email = email.replaceAll(" ", "");
    email = email.replaceAll("caneer", "career");
    email = email.replaceAll("canaer", "career");
    email = email.replaceAll("canear", "career");
    email = email.replaceAll("caraer", "career");
    email = email.replaceAll("carear", "career");
    email = email.replace("|", "l");
    email = email.replaceAll("}", "j");
    email = email.replaceAll("j3b", "job");
    email = email.replaceAll("gmaii\\.com", "gmail.com");
    email = email.replaceAll("hotmaii\\.com", "hotmail.com");
    email = email.replaceAll("\\.c0m", "com");
    email = email.replaceAll("\\.coin", "com");
    email = email.replaceAll("consuit", "consult");
}
return email;
于 2013-02-12T05:47:12.337 回答
6

通过意识到replaceAll()第一个论点是regex你可以减少你的比较

例如,您可以career通过以下方式检查单词的可能拼写错误regex

email = email.replaceAll("ca[n|r][e|a][e|a]r", "career"));

于 2013-02-12T06:11:21.390 回答
5

您正在使用一些正则表达式字符。

\请使用或使用Pattern.quote方法逃避它们

于 2013-02-12T05:47:21.810 回答
5

我想你不知道第一个参数replaceAll是正则表达式。

., |,}的解释可能与您的预期不同。

.   Any character (may or may not match line terminators)

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

更好地利用空间

\s  A whitespace character: [ \t\n\x0B\f\r]

并以前导转义其他特殊字符\\

于 2013-02-12T05:47:37.960 回答