2
some text and some text too bad,
some too  bad again some bad
and other words bad, it is too       bad 

我正在尝试将所有单词“bad”替换为“good”,但也有例外:

如果单词“too”在“bad”之前,“bad”不应该改为“good”,“too”和“bad”之间可以有一个或微小的空格,甚至html空格“”

所以在正则表达式操作文本之后应该是

    some text and some text too bad,
    some too  bad again some good
    and other words good, it is too       bad 

尝试过这样的事情,但它不能正常工作。

$text ~= s/(too(\s+|\s* \s*))bad/good/ig;

请帮忙

4

2 回答 2

1

我不相信这可以使用正则表达式方便地完成。它变得更加复杂,因为单词的概念不明确:例如,您想将“bad”视为“bad”一词。

该程序通过将字符串标记为单词和分隔符来工作,然后将所有出现的“bad”更改为“good”,除非它们前面有“too”(忽略大小写)。我在可能的分隔符列表中包含了逗号、冒号和分号。您可能需要调整它以获得您期望的结果。

use strict;
use warnings;

my $text = <<END;
some text and some text too bad,
some too&nbsp; bad again some bad
and other words bad, it is too       bad 
END

my @tokens = split /((?:[\s,;.:]|&nbsp;)+)/, $text;

for my $i (grep { lc $tokens[$_] eq 'bad' } 1 .. $#tokens) {
  $tokens[$i] = 'good' unless lc $tokens[$i-2] eq 'too';
}

print join '', @tokens;

输出

some text and some text too bad,
some too&nbsp; bad again some good
and other words good, it is too       bad 
于 2013-10-25T12:16:57.727 回答
-1

您可以尝试解码html空格,并应用一个正则表达式来评估前面的字符串是否为too

#!/usr/bin/env perl;

use strict;
use warnings;
use HTML::Entities;

while ( <DATA> ) { 
    _decode_entities($_, { nbsp => "\xA0" }); 
    s/(\w+)(\s+)bad/$1 eq 'too' ? $& : "$1$2good"/eg;
    encode_entities($_);
    print $_; 
}

__DATA__
some text and some text too bad,
some too&nbsp; bad again some bad
and other words bad, it is too       bad

像这样运行它:

perl script.pl

这会产生:

some text and some text too bad,
some too&nbsp; bad again some good
and other words good, it is too       bad
于 2013-10-25T12:21:52.483 回答