我使用 Java Pattern 类将正则表达式指定为字符串。
例如,我喜欢成为蜘蛛侠:“彼得帕克”
应该将蜘蛛侠和“彼得帕克”列为单独的标记。谢谢
try {
BufferedReader br = new BufferedReader(new FileReader(f));
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
line = br.readLine();
}
String everything = sb.toString();
List<String> result = new ArrayList<String>();
Pattern pat = Pattern.compile("([\"'].*?[\"']|[^ ]+)");
PatternTokenizer pt = new PatternTokenizer(new StringReader(everything),pat,0);
while (pt.incrementToken()) {
result.add(pt.getAttribute(CharTermAttribute.class).toString());
}
}
catch (Exception e) {
throw new RuntimeException(e);
}
所以我猜“某个词”不起作用的原因是因为每个标记本身就是一个字符串。有什么提示吗?谢谢