perl - 如何在 Perl 中 grep 一个段落？

Question

我有一个需要正确格式化为可读格式的日志文件。但是，文本文件没有静态行数或固定的主要值，并且具有随机数量的空格，但只有一个日志文件头，可用于确定每次应用程序日志的开始和结束。

日志文件示例：

Log File header
<text>
<text>
Log File header
<text>

脚本格式化后应该如下所示：

Log File header
<text>
<text>

<space>

Log File header
<text>
<text>

因此，每次 Perl 脚本检测到“日志文件头”时，我都需要一些关于找出整个段落的建议。

这是 grep perl 脚本：

#!/usr/bin/perl

#use 5.010; # must be present to import the new 5.10 functions, notice 
#that it is 5.010 not 5.10

my $file = "/root/Desktop/Logfiles.log";
open LOG, $file or die "The file $file has the error of:\n =>  $!";

@lines = <LOG>;
close (LOG);

@array = grep(/Log File header/, @lines);

print @array;

有人可以就代码提供一些建议吗？谢谢。

score 0 · Accepted Answer

所以你只想在你的日志文件部分之间有垂直空间？

有几种方法，特别是因为您知道标题将位于完全独立的行上。在以下所有示例中，假设@lines已经从您的输入文件中填充了。

所以第一种技术：在标题前插入空格：

foreach my $line ( @lines ) {
    if ( $line =~ m/Log File header/ ) {
        print( "\n\n\n" ); # or whatever you want <space> to be
    }

    print( $line );
}

下一个技术是使用正则表达式来搜索/替换文本块：

my $space = "\n\n\n"; # or whatever you want <space> to be
my $everything = join( "", @lines );
$everything =~ s/(Log File header.*?)(?=Log File header)/$1$space/sg;
print( $everything );

关于正则表达式的一些解释。意思是“(?=前瞻”，它将匹配但不构成要替换的表达式的一部分。/sg修饰符意味着-将s换行符视为普通空格并g- 进行全局搜索和替换。手段选择任何东西，.*?但尽可能少地满足表达式（非贪婪），这在这个应用程序中非常重要。

更新：编辑了我未能明确指定要在哪个变量上进行匹配的第一种技术。

perl - 如何在 Perl 中 grep 一个段落？

1 回答 1

Related

Reference