27

我有一个大文本:

"Big piece of text. This sentence includes 'regexp' word. And this
sentence doesn't include that word"

我需要找到以' this '开头并以' word '结尾但包含单词' regexp '的子字符串。

在这种情况下,字符串:“ this sentence doesn't include that word”正是我想要接收的。

如何通过正则表达式做到这一点?

4

2 回答 2

45

使用忽略大小写选项,以下应该有效:

\bthis\b(?:(?!\bregexp\b).)*?\bword\b

示例:http ://www.rubular.com/r/g6tYcOy8IT

解释:

\bthis\b           # match the word 'this', \b is for word boundaries
(?:                # start group, repeated zero or more times, as few as possible
   (?!\bregexp\b)    # fail if 'regexp' can be matched (negative lookahead)
   .                 # match any single character
)*?                # end group
\bword\b           # match 'word'

每个单词的\b周围确保您不匹配子字符串,例如匹配 'thistle' 中的 'this' 或 'wordy' 中的 'word'。

这通过检查起始词和结束词之间的每个字符来确保排除的词不会出现。

于 2012-08-08T17:25:39.510 回答
10

使用前瞻资产。

当你想检查一个字符串是否不包含另一个子字符串时,你可以这样写:

/^(?!.*substring)/

您还必须检查 and 的行首this和行尾word

/^this(?!.*substring).*word$/

这里的另一个问题是你不想找到字符串,你想找到句子(如果我理解你的任务的话)。

所以解决方案如下所示:

perl -e '
  local $/;
  $_=<>;
  while($_ =~ /(.*?[.])/g) { 
    $s=$1;
    print $s if $s =~ /^this(?!.*substring).*word[.]$/
  };'

使用示例:

$ cat 1.pl
local $/;
$_=<>;
while($_ =~ /(.*?[.])/g) {
    $s=$1;
    print $s if $s =~ /^\s*this(?!.*regexp).*word[.]/i;
};

$ cat 1.txt
This sentence has the "regexp" word. This sentence doesn't have the word. This sentence does have the "regexp" word again.

$ cat 1.txt | perl 1.pl 
 This sentence doesn't have the word.
于 2012-08-08T17:21:52.170 回答