java - 将简单的 Ruby 正则表达式转换为 Java

Question

在与 Ruby 中断几年后，我将返回 Java。我正在寻找能够完成以下 Ruby 语句的惯用且简短的 Java 代码：

some_string.scan(/[\w|\']+/)

上面的表达式从字符串创建一个数组。数组中的元素some_string是由字母字符 ( \w) 或撇号组成的所有部分（\'因此“John's”不会分成两个单词。）

例如：

"(The farmer's daughter) went to the market".scan(/[\w|\']+/)

=>

["The", "farmer's", "daughter", ...]

更新

我知道解决方案将使用如下内容：

String[] words = sentence.split(" ");

我只需要进入的正则表达式部分split()。

score 3 · Accepted Answer

Java 没有可以在函数调用中执行此操作的内置scan方法，因此您需要自己滚动循环。Matcher使用 Java 的 regex类可以很容易地做到这一点。

import java.util.regex.*;

String yourString = "(The farmer's daughter) went to the supermarket";

/* The regex syntax is basically identical to Ruby, except that you need
 * to specify your regex as a normal string literal, and therefore you need to 
 * double up on your backslashes. The other differences between my regex and 
 * yours are all things that I think you need to change about the Ruby version
 * as well. */
Pattern p = Pattern.compile("[\\w']+");
Matcher m = p.matcher(yourString);
List<String> words = new Vector<String>();
while (m.find()) {
   words.add(m.group());
}

我不确定在这种情况下使用Matcher与使用的相对优点是什么Scanner。

score 2 · Accepted Answer

即使跨语言，正则表达式的行为也应该或多或少相同。在这种情况下，唯一的区别是您必须转义反斜杠和单引号。

如果我们用 Ruby 编写/[\w']+/，那么我们会用 Java 编写Pattern.compile("[\\w\']+")。

哦，Scanners也可以扫描Strings！

final String s = "The farmer's daughter went to the market";
Scanner sc = new Scanner(s);
Pattern p = Pattern.compile("[\\w\\']+");
while (sc.hasNext(p)) { System.out.println(sc.next(p)); }

这不完全一样，但为什么不是split空格上的字符串，即单词边界？

"The farmer's daughter went to the market".split("\s");

score 0 · Accepted Answer

怎么样

String[] words = test.split("[^a-zA-Z0-9']+");

或者

words = test.split("[^\\w']+");

这些模式与您的 Ruby 示例的不同之处在于您使用的是 Ruby 的 String#scan - 您在其中提供与单词匹配的模式。Java 的 String#split 就像 Ruby 的同名方法 - 您提供与您的单词分隔符匹配的模式。

java - 将简单的 Ruby 正则表达式转换为 Java

更新

3 回答 3

Related

Reference