regex - perl 正则表达式匹配一个固定关键字和另外两个变量关键字

Question

我需要在 perl 中编写一个正则表达式来执行以下操作。

起始行是关键字 1（如“这是关键字 1”），结束行是关键字 2（如“此处的结束 1”）或关键字 3（如“此处的结束 2”）。例如，文本文件可能如下所示：

*********** this is keyword1***********
*****
..
*******apple***********
******
..
*********** this is keyword1***********
*****
..
*******orange***********
******
..
*********** this is keyword1***********
*****
..
*******orange***********
******
..

我的任务是匹配那些块

*********** this is keyword1***********
*****
..(comment: no "this is keyword1" here)
*******apple***********

或者

*********** this is keyword1***********
*****
.. (comment: no "this is keyword1" here)
*******orange***********

感谢你的帮助！

score 1 · Accepted Answer

原始建议解决方案

请注意，最初的“apple”拼写为“end1 here”，而“orange”最初的拼写为“end2 here”。

#!/usr/bin/env perl
use strict;
use warnings;

my $printing = 0;

while (<>)
{
    $printing = 1 if m/this is keyword1/;
    print if $printing;
    $printing = 0 if m/end[12] here/;
}

如果要从输出中排除结束行，则将该测试移到打印上方。如果要从输出中排除开头行，请将该测试移到打印下方。显然，如果您不能像示例中那样轻松组合两个端部模式，则可以简单地使用两行：

    $printing = 0 if m/the first end pattern/;
    $printing = 0 if m/a radically different end marker/;

对于样本数据，输出为：

*********** this is keyword1***********
*****
..
*******end1 here***********
*********** this is keyword1***********
*****
..
*******end1 here***********
*********** this is keyword1***********
*****
..
*******end2 here***********

修订要求——修订程序

满足修改后的输出要求的一种简单方法是在时简单地将行累积到一个字符串中$printing = 1：

my $saving = 0;
my $result;

while (<>)
{
    $saving  = 1  if m/this is keyword1/;
    $result .= $_ if $saving;
    $saving  = 0  if m/end[12] here/;
}

但是，这不会将整个文件吞入内存，也不会使用m//g，因此它不符合为修改后的要求定义的机制。

有了修改后的要求，我认为这段代码或多或少地做了你想要的：

#!/usr/bin/env perl
use strict;
use warnings;

my $file;
{
    local $/;
    $file = <>;
}

my $result;
while ($file =~ m/(^[^\n]*this is keyword1.*?end[12] here[^\n]*$)/gms)
{
    print "Found: $1\n";
    $result .= "$1\n";
}

print "Overall set of matched material:\n";
print $result;

显然，如果您不希望找到每个段落，则可以省略循环中的打印。注意使用 non-greedy.*?在中间停止扫描，以及使用^and$与/m(multi-line) 修饰符一起拾取整行。

样本数据的输出为：

Found: *********** this is keyword1***********
*****
..
*******end1 here***********
Found: *********** this is keyword1***********
*****
..
*******end1 here***********
Found: *********** this is keyword1***********
*****
..
*******end2 here***********
Overall set of matched material:
*********** this is keyword1***********
*****
..
*******end1 here***********
*********** this is keyword1***********
*****
..
*******end1 here***********
*********** this is keyword1***********
*****
..
*******end2 here***********

重新修订的要求 - 重新修订的解决方案

#!/usr/bin/env perl
use strict;
use warnings;

my $file;
{
    local $/;
    $file = <>;
}

my $result;
while ($file =~ m/(^[^\n]*this is keyword1.*?(apple|orange)[^\n]*$)/gms)
{
    print "Found: $1\n";
    $result .= "$1\n";
}

print "Overall set of matched material:\n";
print $result;

样本数据

*********** this is keyword1***********
*****
..
*******orange***********
******
..
*********** this is keyword1***********
*****
..
*******orange***********
******
..
*********** this is keyword1***********
*****
..
*******apple***********
******

样本输出

Found: *********** this is keyword1***********
*****
..
*******orange***********
Found: *********** this is keyword1***********
*****
..
*******orange***********
Found: *********** this is keyword1***********
*****
..
*******apple***********
Overall set of matched material:
*********** this is keyword1***********
*****
..
*******orange***********
*********** this is keyword1***********
*****
..
*******orange***********
*********** this is keyword1***********
*****
..
*******apple***********
$

score 0 · Accepted Answer

我之前的回答错过了您修改后的要求。这是更新的代码：

#!/usr/bin/env perl

use 5.012;
use strict;
use warnings;

my $text = do { local $/; <DATA> };
my $pat = qr{
    (
        [^\n]*?
        keyword1
        .*?
        (?:apple|orange)
        [^\n]*?
        \n
    )
}sx;

my $result;

while ($text =~ /$pat/g) {
    $result .= "[[[\n$1]]]\n";
}

say $result;


__DATA__
*********** this is keyword1***********
*****
..(comment: no "this is keyword1" here)
*******apple***********
*****
..
*********** this is keyword1***********
*****
..
*******apple***********
******
..
*********** this is keyword1***********
*****
.. (comment: no "this is keyword1" here)
*******orange***********
*****
..
*********** this is keyword1***********
*****
..
*******orange***********
******
..
*********** this is keyword1***********
*****
..
*******orange***********
******
..

输出：

[[[
***********这是关键字1***********
*****
..（评论：这里没有“这是关键字1”）
*******苹果***********
]]]
[[[
***********这是关键字1***********
*****
..
*******苹果***********
]]]
[[[
***********这是关键字1***********
*****
..（注释：这里没有“这是关键字 1”）
*******橙***********
]]]
[[[
***********这是关键字1***********
*****
..
*******橙***********
]]]
[[[
***********这是关键字1***********
*****
..
*******橙***********
]]]

括号用于直观地验证是否匹配了正确的块。

regex - perl 正则表达式匹配一个固定关键字和另外两个变量关键字

2 回答 2

原始建议解决方案

修订要求——修订程序

重新修订的要求 - 重新修订的解决方案

Related

Reference