java - 分组由特定单词分隔的句子

Question

我正在尝试将 2 个由特定单词（在示例中为“AND”）分隔的任何合理长度的子句分组，其中第二个子句可以是可选的。一些例子：

情况1：

foo sentence A AND foo sentence B

应该给

"foo sentence A" --> matching group 1

"AND" --> matching  group 2 (optionally)

"foo sentence B" --> matching  group 3

案例2：

foo sentence A

应该给

"foo sentence A" --> matching  group 1
"" --> matching  group 2 (optionally)
"" --> matching  group 3

我尝试了以下正则表达式

(.*) (AND (.*))?$

它有效，但前提是，在 CASE2 中，我在字符串的最终位置放置一个空格，否则模式不匹配。如果我在圆括号组内包含“AND”之前的空格，则在情况 1 中，匹配器将整个字符串包含在第一组中。我想知道前瞻和后瞻断言，但不确定它们是否能帮助我。有什么建议吗？谢谢

score 2 · Accepted Answer

只是使用怎么样

String split[] = sentence.split("AND");

这将按您的单词拆分句子并为您提供子部分列表。

score 2 · Accepted Answer

描述

此正则表达式会将请求的字符串部分返回到请求的组中。是可选的and，如果在字符串中找不到它，则整个字符串被放入组 1。所有\s*?捕获的组都强制自动修剪其空白。

^\s*?\b(.*?)\b\s*?(?:\b(and)\b\s*?\b(.*?)\b\s*?)?$

在此处输入图像描述

团体

0 获取整个匹配字符串

获取分隔词之前的字符串and，如果没有and则整个字符串出现在这里
得到分隔词，在这种情况下是and
获取字符串的第二部分

Java 代码示例：

情况1

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "foo sentence A AND foo sentence B";
  Pattern re = Pattern.compile("^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$",Pattern.CASE_INSENSITIVE);
  Matcher m = re.matcher(sourcestring);
    if(m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + groupIdx + "] = " + m.group(groupIdx));
      }
    }
  }
}

$matches Array:
(
    [0] => foo sentence A AND foo sentence B
    [1] => foo sentence A
    [2] => AND
    [3] =>  foo sentence B
)

案例2，使用相同的正则表达式

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "foo sentence A";
  Pattern re = Pattern.compile("^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$",Pattern.CASE_INSENSITIVE);
  Matcher m = re.matcher(sourcestring);
    if(m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + groupIdx + "] = " + m.group(groupIdx));
      }
    }
  }
}

$matches Array:
(
    [0] => foo sentence A
    [1] => foo sentence A
)

score 2 · Accepted Answer

我会使用这个正则表达式：

^(.*?)(?: (AND) (.*))?$

解释：

The regular expression:

(?-imsx:^(.*?)(?: (AND) (.*))?$)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
                             ' '
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      AND                      'AND'
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
                             ' '
----------------------------------------------------------------------
    (                        group and capture to \3:
----------------------------------------------------------------------
      .*                       any character except \n (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
    )                        end of \3
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

score 0 · Accepted Answer

Change your regex to make the space after he first sentence optional:

(.*\\S) ?(AND (.*))?$

Or you could use split() to consume the AND and any surrounding spaces:

String sentences = sentence.spli("\\s*AND\\s*");

score 0 · Accepted Answer

你的情况2有点奇怪......

但我会做

String[] parts = sentence.split("(?<=AND)|(?=AND)"));

你检查parts.length. 如果长度==1，则为case2。你只有在数组中的句子，你可以添加空字符串作为你的“group2/3”

如果在case1中您直接parts：

[foo sentence A , AND,  foo sentence B]

java - 分组由特定单词分隔的句子

5 回答 5

描述

团体

Java 代码示例：

Related

Reference