perl - 在 perl 中跟踪上一行和下一行的最佳方法

Question

在 perl 中，保留上一行和/或下一行的信息的最佳/正确方法是什么。例如，使用以下代码：

while (<IN>) {
   print;
}

仅当文件中的上一行或下一行与 foo 匹配时，如何更改为不打印该行，否则打印？

你能给出代码示例吗？谢谢。

score 2 · Accepted Answer

你可以将你的行读入一个数组，然后如果你得到一些以某种方式向你发出信号的东西，弹出数组的最后几个元素。阅读完所有内容后，您可以打印它：

use strict;
use warnings;
use feature qw(say);
use autodie;  #Won't catch attempt to read from an empty file

use constant    FILE_NAME => "some_name.txt"
   or die qq(Cannot open ) . FILE_NAME . qq(for reading: $!\n);
open my $fh, "<", FILE_NAME;

my @output;
LINE:
while ( my $line = <DATA> ) {
    chomp $line;
    if ( $line eq "foo" ) {
        pop @output;  #The line before foo
        <DATA>;        #The line after foo
        next LINE;    #Skip line foo. Don't push it into the array
    }
    push @output, $line;
}

从那里，您可以打印出包含您不想打印的值的数组。

for my $line ( @output ) {
   say $line;
}

唯一的问题是这需要内存。如果您的文件非常大，则可能会耗尽内存。

解决此问题的一种方法是使用缓冲区。您将值存储在数组中，并在将另一个值推入数组时移出最后一个值。如果读入的值为foo，则可以重置数组。在这种情况下，缓冲区最多包含一行：

#! /usr/bin/env perl

use strict;
use warnings;
use autodie;
use feature qw(say);

my @buffer;
LINE:
while ( my $line = <DATA> ) {
    chomp $line;
    if ( $line eq "foo" ) {
        @buffer = ();    #Empty buffer of previous line
        <DATA>;           #Get rid of the next line
        next LINE;       #Foo doesn't get pushed into the buffer
    }
    push @buffer, $line;
    if ( @buffer > 1 ) {    #Buffer is "full"
        say shift @buffer; #Print out previous line
    }
}
#
# Empty out buffer
#
for my $line ( @buffer ) {
    say $line;
}
__DATA__
2
3
4
5
6
7
8
9
10
11
12
13
1
2
foo
3
4
5
foo
6
7
8
9
foo

请注意，当我跳过下一行时，我很可能会尝试从空文件中读取。这没关系。将<$fh>返回一个空字符串或 undef，但我可以忽略它。当我回到循环顶部时，我会发现错误。

score 2 · Accepted Answer

我没有看到您对“最佳”有任何具体标准，因此我将为您提供一个可能是“最佳”的解决方案，该解决方案可能与迄今为止提出的不同。您可以使用Tie::File并将整个文件视为一个数组，然后使用索引迭代该数组。$index-1上一行和下一行是$index+1分别的。你只需要担心你的索引超出了你的数组范围。这是一个例子：

#!/usr/bin/env perl

use strict;
use warnings;
use 5.010;          # just for "say"
use Tie::File;

tie my @array, 'Tie::File', "filename" or die;

for my $i (0..$#array) {
    if ($i > 0 && $i < $#array) {   # ensure $i-1 and $i+1 make sense
        next if $array[$i-1] =~ /BEFORE/ &&
                $array[$i+1] =~ /AFTER/;
    }
    say $array[$i];
}

如果更方便，您可以指定文件句柄而不是文件名，并且Tie::File还具有一些参数来控制内存使用或更改“行”的含义（如果需要）。检查文档以获取更多信息。

无论如何，如果您熟悉数组并喜欢从数组的角度思考，那是另一种做您想做的事情的方法，在概念上可能会更简单。

score 2 · Accepted Answer

更新：简化的说明。

基本上，如果要根据其他两行中包含的信息打印当前行，则需要跟踪两行额外的行。这是一个简单的脚本，所有内容都经过硬编码：

#!/usr/bin/env perl

use strict;
use warnings;

my $prev = undef;
my $candidate = scalar <DATA>;

while (defined $candidate) {
    my $next = <DATA>;
    unless (
        (defined($prev) && ($prev =~ /foo/)) ||
        (defined($next) && ($next =~ /foo/))
    ) {
        print $candidate;
    }
    ($prev, $candidate) = ($candidate, $next);
}

__DATA__
1
2
foo
3
4
5
foo
6
foo
7
8
9
foo

我们可以将其概括为一个接受文件句柄和测试（作为子例程引用）的函数：

#!/usr/bin/env perl

use strict; use warnings;

print_mid_if(\*DATA, sub{ return !(
    (defined($_[0]) && ($_[0] =~ /foo/)) ||
    (defined($_[1]) && ($_[1] =~ /foo/))
)} );

sub print_mid_if {
    my $fh = shift;
    my $test = shift;

    my $prev = undef;
    my $candidate = scalar <$fh>;

    while (defined $candidate) {
        my $next = <$fh>;
        print $candidate if $test->($prev, $next);
        ($prev, $candidate) = ($candidate, $next);
    }
}

__DATA__
1
2
foo
3
4
5
foo
6
foo
7
8
9
foo

score 1 · Accepted Answer

我会将文件读入一个数组，每一行都是一个数组元素，然后你可以进行比较。唯一真正的设计考虑是被读入内存的文件的大小。

perl - 在 perl 中跟踪上一行和下一行的最佳方法

4 回答 4

更新：简化的说明。

Related

Reference