1

下面的程序打印以下数据:

 Wed,Jun,13,10:37:34,2012,759,41,0,30,10,0,0,1
 Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
 Wed,Jun,13,10:39:34,2012,758,42,0,32,10,0,0,0
 Wed,Jun,13,10:40:35,2012,758,42,0,29,11,0,0,2
 Wed,Jun,13,10:41:35,2012,761,39,0,34,5,0,0,0
 Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3
 Wed,Jun,13,10:43:35,2012,754,46,0,29,17,0,0,0

我需要每隔 5 分钟输出一次最大值(例如 769)。理想情况下,这将是 10:00:00 - 10:05:00 等。时间是军用时间(24 小时)。这样做的最佳方法是什么?请注意,我是 Perl 的初学者。下面是我的代码:

#!/usr/bin/perl

# This program displays the max thread count at 5 minute intervals and writes the lines to a CSV file.

use strict;
use warnings;
use diagnostics;

# Initialize functions
my @data;
my $line;
my @L1;
#my $outFivemin = "log_5min.csv";
#open (FiveMin, ">> $outFivemin");

# Open the error_log 
open(FH, "error_log");
@data = <FH>;

# Filter the results to MPMStats only
sub findLines {
    my @return = ();
    foreach $line (@data) {
        if ( ($line =~ /notice/) && ($line =~ /rdy/) ) {  
                $line =~ s/ /,/g;   
                my @L1 = split(/|notice|\[|,mpmstats:,|\t|rdy,|bsy,|rd,|wr,|ka,|log,|dns,|cls,/, $line);
                $line =~ s/|notice|\[|,mpmstats:,|\t|rdy,|bsy,|rd,|wr,|ka,|log,|dns,|cls,//g;                   
                push @return, join("", @L1);
        }
    }
    return @return;
}

# Initializers for my data
my($dayOfWeek1,$month1,$dayOfMonth1,$time,$year1,$rdy,$bsy,$rd,$wr,$ka,$log,$dns);
my($cls);

# Create a 2D array
my @L2 = &findLines;
foreach my $line (@L2){
    ($dayOfWeek1, $month1, $dayOfMonth1, $time, $year1, $rdy, $bsy, $rd, $wr, $ka, $log, $dns, $cls) = split(/,/, $line);
    print "$dayOfWeek1,$month1,$dayOfMonth1,$time,$year1,$rdy,$bsy,$rd,$wr,$ka,$log,$dns,$cls";
}
4

4 回答 4

4

我建议您操纵每条记录中的日期/时间以提供五分钟的密钥,并为每个密钥保持最大值。

例如,如果记录开始,Wed,Jun,13,10:37:34,2012则适当的键是Jun 13 10:35 2012

通常这将是一个哈希,但由于很可能需要按时间顺序输出,并且需要额外的工作和模块来提供可排序的日期/时间字符串,所以下面的程序使用了一个对数组。

该程序通过在时间(第四个)字段上使用正则表达式s///替代来工作,该字段将分钟和秒替换为时间之前的第一个两位数分钟:秒被忽略并且分钟向下舍入到五的倍数。

如果数组为空或者我们在不同[$range, $value]的. 否则,如果我们找到了新的最大值,则更新最新对的元素。@maxima$range$value

请注意,该程序需要命令行上的日志文件名,并且默认为error_log无。

use strict;
use warnings;

@ARGV = ('error_log') unless @ARGV;

my @maxima;

while (<>) {

  my @fields = /([^,\s]+)/g;
  next unless @fields;
  $fields[3] =~ s|(\d+):\d\d$|5*int($1/5)|e;

  my $range = join ' ', @fields[1..4];
  my $value = $fields[5];

  if (@maxima == 0 or $range ne $maxima[-1][0]) {
    push @maxima, [$range, $value];
  }
  else {
    $maxima[-1][1] = $value if $maxima[-1][1] < $value;
  }
}

for (@maxima) {
  printf "Maximum for five minutes starting %s is %d\n", @$_;
}

输出

Maximum for five minutes starting Jun 13 10:35 2012 is 767
Maximum for five minutes starting Jun 13 10:40 2012 is 769

更新

现在我知道您想要包含字段 6 的最大值的每五分钟的整个记录​​,我已经编写了这个修改后的代码。

它也适用于@L2数组的内容,而不是从文件中读取。

我确信这会更好地编码以while循环读取文件并直接从那里生成输出,但除非您向我们展示一些日志文件数据,否则我无法提出比这更好的替代方案。

该程序从您@L2在自己的程序中填充的位置继续。

my @L2 = findLines();

my @maxima;

for my $record (@L2) {

  my @fields = $record =~ /([^,\s]+)/g;
  next unless @fields;

  my @range = @fields[1..4];
  $range[2] =~ s|(\d+):\d\d$|5*int($1/5)|e;
  my $range = join ' ', @range;
  my $value = $fields[5];

  if (@maxima == 0 or $range ne $maxima[-1][0]) {
    push @maxima, [$range, $value, $record];
  }
  else {
    @{$maxima[-1]}[1,2] = ($value, $record) if $maxima[-1][1] < $value;
  }
}

print $_->[2] for @maxima;

输出

 Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
 Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3
于 2012-06-28T16:36:16.903 回答
3

沿着这些路线的东西应该可以解决问题......

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

# Somewhere to store the data
my %data;

# Process the input a line at a time
while (<DATA>) {
  # Split the input line on commas and colons.
  # Assign the bits we need to variables.
  my ($mon,$day,$hr,$min,$sec,$yr,$val) = (split /[,:]/)[1 .. 7];

  # Normalise the minute value to five-minute increments
  # i.e 37 becomes 35, 42 becomes 40
  $min = int($min / 5) * 5;

  # Create push the value onto an array that is stored in %data using
  # a key generated from the timestamp.
  # Note that we use the 5-min normalised value of the minute so that
  # all values from the same five minute period end up in the same array.
  push @{$data{"$yr-$mon-$day $hr:$min"}}, $val;
}

# For each key in the array (i.e. each five minute increment...
foreach (sort keys %data) {
  # ... sort the array numerically and grab the last element
  # (which will be the largest)
  my $max = (sort { $a <=> $b } @{$data{$_}})[-1];
  # Say something useful
  say "$_ - $max";
}

__DATA__
Wed,Jun,13,10:37:34,2012,759,41,0,30,10,0,0,1
Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
Wed,Jun,13,10:39:34,2012,758,42,0,32,10,0,0,0
Wed,Jun,13,10:40:35,2012,758,42,0,29,11,0,0,2
Wed,Jun,13,10:41:35,2012,761,39,0,34,5,0,0,0
Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3
Wed,Jun,13,10:43:35,2012,754,46,0,29,17,0,0,0
于 2012-06-28T16:02:25.243 回答
-1

哎呀,我错误地认为您的 csv 输出是正在解析的数据文件。

忽略下面的答案。

这是一个打印出原始逗号分隔行的解决方案。最大值和时间也可用于打印。但是我用结果创建了一个逗号分隔的文件。:-)

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;

my %interval;
my $csv = Text::CSV_XS->new ({ binary => 1 }) or
     die "Cannot use CSV: ".Text::CSV_XS->error_diag ();

open my $fh, "<", "o33.txt" or die "o33.txt: $!";
while (my $row = $csv->getline ($fh)) {
    my ($time, $amt) = @$row[3,5];
    my ($hr, $min) = split /:/, $time;
    my $key = sprintf "%02d:%02d", $hr, int($min/5) * 5;

    if (exists $interval{$key}) {
        if ($interval{$key}{amt} < $amt) {
            $interval{$key}{amt} = $amt;
            $interval{$key}{data} = $row;
        }
    }
    else { # first time in this 5 minute interval
        $interval{$key}{amt} = $amt;
        $interval{$key}{data} = $row;
    }
}
$csv->eof or $csv->error_diag ();
close $fh or die $!;;


$csv->eol ("\r\n");
open $fh, ">", 'junk.csv' or die $!;

for my $time (sort keys %interval) {
    $csv->print($fh, $interval{$time}{data});
}

close $fh or die $!;

'junk.csv' 的输出是:

Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3
于 2012-06-28T22:12:30.080 回答
-1

这有效(?),(没有测试),它从你的循环开始my @L2 = &findLines

my %interval;
my %month;
@month{qw/ jan feb mar apr may jun jul aug sep oct nov dec /} = '01' .. '12';

# Create a 2D array 
my @L2 = &findLines;
foreach my $line (@L2){ 
    #($dayOfWeek1, $month1, $dayOfMonth1, $time, $year1, $rdy, $bsy, $rd, $wr, $ka, $log, $dns, $cls) = split(/,/, $line); 
    #print "$dayOfWeek1,$month1,$dayOfMonth1,$time,$year1,$rdy,$bsy,$rd,$wr,$ka,$log,$dns,$cls"; 
    my ($dow, $mon, $day, $hr, $min, $sec, $yr, $amt) = split /[:,]/, $line, 9;
    my $key = sprintf "%4d-%02d-%02d %02d:%02d",
                $yr, $month{lc $mon}, $day, $hr, int($min / 5) * 5;

    if (exists $interval{$key}) {
        if ($interval{$key}{amt} < $amt) {
            $interval{$key}{amt} = $amt;
            $interval{$key}{data} = [split ",", $line];
        }
    }
    else { # first time in this 5 minute interval
        $interval{$key}{amt} = $amt;
        $interval{$key}{data} = [split ",", $line];
    }
} 

my $csv = Text::CSV_XS->new ({ binary => 1 }) or
     die "Cannot use CSV: ".Text::CSV_XS->error_diag ();

$csv->eol ("\r\n");
open my $fh, ">", 'junk.csv' or die $!;

for my $time (sort keys %interval) {
    $csv->print($fh, $interval{$time}{data});
}

close $fh or die $!;

我希望这能让您更接近解决问题的好方法。
更新:添加第一个字段以拆分并从 8 部分更改为 9 部分。

于 2012-06-29T02:33:12.997 回答