4

我必须将一行文本分成单词,并且对使用什么正则表达式感到困惑。我到处寻找一个匹配一个单词的正则表达式,并找到了与这篇文章类似的正则表达式,但在 java 中想要它(java 不处理常规字符串中的 \)。

正则表达式匹配单词和带有撇号的单词

我已经为每个答案尝试了正则表达式,但不确定如何为此构建 Java 正则表达式(我假设所有正则表达式都是相同的)。如果在我看到的正则表达式中将 \ 替换为 \,则正则表达式不起作用。

我也尝试过自己查找并来到此页面: http ://www.regular-expressions.info/reference.html

但我无法完全理解正则表达式的高级技术。

我正在使用 String.split(regex string here) 来分隔我的字符串。一个例子是,如果给我以下内容:“我喜欢吃,但我不喜欢吃每个人的食物,否则他们会饿死。” 我想匹配:

I
like
to
eat
but
I
don't
like
to
eat
everyone's
food
or
they'll
starve

我也不想匹配 '' 或 '''' 或 ' ' 或 '.'' 或其他排列。我的分隔符条件应该类似于:[匹配任何单词字符][如果撇号前面有单词字符,也匹配撇号,如果有的话,则匹配后面的单词字符]

我得到的只是一个匹配单词 [\w] 的简单正则表达式,但我不确定如何使用前瞻或后视来匹配撇号,然后匹配剩余的单词。

4

2 回答 2

4

WhirlWind使用我评论中所述页面上的答案,您可以执行以下操作:

String candidate = "I \n"+
    "like \n"+
    "to "+
    "eat "+
    "but "+
    "I "+
    "don't "+
    "like "+
    "to "+
    "eat "+
    "everyone's "+
    "food "+
    "''  ''''  '.' ' "+
    "or "+
    "they'll "+
    "starv'e'";

String regex = "('\\w+)|(\\w+'\\w+)|(\\w+')|(\\w+)";
Matcher matcher = Pattern.compile(regex).matcher(candidate);
while (matcher.find()) {
  System.out.println("> matched: `" + matcher.group() + "`");
}

它将打印:

> matched: `I`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `but`
> matched: `I`
> matched: `don't`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `everyone's`
> matched: `food`
> matched: `or`
> matched: `they'll`
> matched: `starv'e`

您可以在这里找到一个运行示例:http: //ideone.com/pVOmSK

于 2012-11-29T19:26:42.447 回答
0

The following regex seems to cover your sample string correctly. But it doesn't cover you scenario for the apostrophe.

[\s,.?!"]+

Java Code:

String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("[\\s,.?!]+");

If I understand correctly, the apostrophe should be left alone as long as it is after a word character. This next regex should cover the above plus the special case for the apostrophe.

(?<!\w)'|[\s,.?"!][\s,.?"'!]*

Java Code:

String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("(?<!\\w)'|[\\s,.?\"!][\\s,.?\"'!]*");

If I run the second regex on the string: Hey there! Don't eat 'the mystery meat'. I get the following words in my string array:

Hey
there
Don't
eat
the
mystery
meat'
于 2012-12-02T02:08:42.263 回答