我会稍微升级您的时间匹配正则表达式,以分别捕获时间的组成部分。我们是否必须担心在午夜之前开始构建并运行到第二天凌晨?
#!/usr/bin/env perl
use strict;
use warnings;
my $oldtime = ""; # hh:mm:ss for end of long interval
my $oldlineno = 0; # line number in the file of second line
my $oldoffset = 0; # offset in seconds from midnight of second command
my $olddiff = 0; # time taken for longest command
sub hhmmss
{
my($time) = @_;
my(@tm) = (int($time/3600), int($time/60)%60, $time%60);
return @tm;
}
while (<>)
{
chomp;
next unless m/^((\d\d):(\d\d):(\d\d))\s+/;
my $newoffset = (($2 * 60) + $3) * 60 + $4;
if ($oldoffset == 0)
{
$oldtime = $1;
$olddiff = 0;
$oldoffset = $newoffset;
$oldlineno = $.;
}
elsif (($newoffset - $oldoffset) > $olddiff)
{
$oldtime = $1;
$olddiff = $newoffset - $oldoffset;
$oldoffset = $newoffset;
$oldlineno = $.;
}
}
if ($oldoffset != 0)
{
my $prvlineno = $oldlineno - 1;
my $newoffset = $oldoffset - $olddiff;
my(@tm) = hhmmss($newoffset);
printf "line $prvlineno: %.2d:%.2d:%.2d\n", $tm[0], $tm[1], $tm[2];
print "line $oldlineno: $oldtime\n";
@tm = hhmmss($olddiff);
printf "diff: %.2d:%.2d:%.2d\n", $tm[0], $tm[1], $tm[2];
}
给定数据文件 ( data
) 和上面的脚本 ( dt.pl
):
17:31:16 line1
17:31:18 line2
17:31:29 line3
17:33:59 line4
18:00:21 line5
18:21:03 line6
18:41:25 line7
19:51:54 line8
19:52:34 line9
下面的 scriptlet 产生如下所示的输出:
$ for i in $(seq 1 9); do sed ${i}q data | perl dt.pl; done | so
line 0: 17:31:16
line 1: 17:31:16
diff: 00:00:00
line 1: 17:31:16
line 2: 17:31:18
diff: 00:00:02
line 2: 17:31:18
line 3: 17:31:29
diff: 00:00:11
line 3: 17:31:29
line 4: 17:33:59
diff: 00:02:30
line 4: 17:33:59
line 5: 18:00:21
diff: 00:26:22
line 4: 17:33:59
line 5: 18:00:21
diff: 00:26:22
line 6: 18:00:21
line 7: 18:41:25
diff: 00:41:04
line 7: 18:41:25
line 8: 19:51:54
diff: 01:10:29
line 7: 18:41:25
line 8: 19:51:54
diff: 01:10:29
$
我很想听听您在编写任何代码之前是如何考虑这个问题的。
这显然是一个问题,需要记录前一行信息的(相关部分)以计算它与当前行之间的差异。您还需要保持当前的最大差异,直到您阅读第二个匹配行才能正式确定。这推动了设计。代码中的大重复可以通过无条件地分配 3 个值和$olddiff
有条件地分配第四个 ( ) 来减少。在那之后,它主要是一个机械和战术的问题。
像这样跨多行匹配是一个麻烦的过程;你必须处理保持适当的状态。部分是经验问题;这种事情你做了几十次之后,下一次就不需要那么长时间了。