3

我有一个包含此类内容的日志文件:

Mon Nov 19 11:00:01 2012
Host: myserver
accurev-ent inuse: 629


Mon Nov 19 12:00:01 2012
Host: myserver
accurev-ent inuse: 629

使用 Perl,我想出了如何删除空行并将非空行放入数组中。现在我正在尝试匹配当前的月份、日期和年份。即,我正在尝试获取所有具有May, 21, 和2013(此文件是每天运行 24 次的脚本的产物。我不需要hh:mm:ss数据。

我一直在尝试按照以下方式进行模式匹配:

foreach $prod (@prod)
{
  # Sun May 19 02:00:01 2013
  if ($prod =~ ((/Sun May 19/) && $prod =~(/2013$/)) )
  {
    print "Howdy! \n"; # just using to indicate success
  }
}  

我可以通过模式匹配来做到这一点,还是应该尝试拆分它并找到数据匹配?顺便说一句,一旦我找到匹配项,我需要将包含inuse的行放入一个数组中并找到当天的最大数字。

4

4 回答 4

4
#!/usr/bin/env perl
use strict;
use warnings;
use POSIX qw(strftime);

# The active regex looks for today's date
# The commented out regex looks for dates in the current month
# If you provide a suitable timestamp (seconds since the epoch),
# you can generate the pattern for an arbitrary date by changing
# time (a function call) to $timestamp.
my $pattern = strftime("%B %d \\d+:\\d+:\\d+ %Y", localtime(time));
# my $pattern = strftime("%B \\d+ \\d+:\\d+:\\d+ %Y", localtime(time));
# print "$pattern\n";
my $regex = qr/$pattern/;

# my @prod = <>;

foreach my $prod (@prod)
{
    # print "Check: $prod\n";
    if ($prod =~ $regex)
    {
        print "$prod\n";
    }
}

这使用strftime(来自 POSIX)在正确的位置创建具有当前月份和年份的正则表达式字符串,并处理日期和时间组件应位于的数字字符串。然后它创建一个带引号的正则表达式qr//,并将其应用于@prod数组中的每个条目。如果您愿意,可以使\d+匹配更加严格;是否值得这样做取决于无关匹配的成本。(当前正则表达式的一个版本比它可能的更宽松,识别 5 月 99 日和 00 日,以及 20130 年 5 月等;它们都允许无效时间通过)。所有这些都可以通过调整正则表达式来解决,而不会对答案产生实质性影响。

于 2013-05-21T16:23:31.947 回答
1

快速而肮脏的正则表达式:

my @prod = ('Mon Nov 19 11:00:01 2012', 'accurev-ent inuse: 629');
foreach $prod (@prod)
{
  # Sun May 19 02:00:01 2013
  if ($prod =~ /^\w+ (\w+) (\d+) ..:..:.. (\d+)$/)
  {
    print "Hodwy: $3 $1 $2\n";
  }

  if ($prod =~ /inuse: (\d+)$/)
  {
    print "Yo: $1\n";
  }
}  

产量

Hodwy: 2012 Nov 19
Yo: 629
于 2013-05-21T16:20:44.980 回答
0

你说你需要每天的总数。这是我的尝试。我希望我添加的评论是足够的。虽然我很确定这可以通过正则表达式反向引用来完成,但我已经使用了数组索引,但我没有太多运气。

我想我会纠正我的误读,为什么不呢。

open(FILE, "<stackoverflow.data");
my @prod = <FILE>;
close(FILE);

# Strip newlines.
s/\n// for @prod;

my $data; # Hash to store data.


for (my $i = 0; $i < $#prod; $i) {
    my $date  = $prod[$i];                 # First line.
    my $host  = $prod[$i + 1];             # Second line.
    my $inuse = parseInuse($prod[$i + 2]); # Third line.

    $date =~ /^\w+ (\w+) (\d+) .+? (\d+)$/;
    $date = "$1 $2 $3";

    # Initialize inuse value for date.
    if (!defined($data->{$date})) {
        $data->{$date} = 0;
    }

    # Replace stored inuse value if current loop inuse is greater.
    if ($inuse > $data->{$date}) {
        $data->{$date} = $inuse;
    }

    print "Processing $i raw($prod[$i]) sep(date: $date, host: $host, inuse: $inuse) split($inuse)\n";

    # Skip blank line;
    $i += ($prod[$i + 3] =~ m/^\s*?$/) ? 4 : 3;
}

print "\nTotals:\n";
my $matchdate = 'May 19 2013'; # Set to undef to show all.
#$matchdate = undef;

foreach my $date (sort keys %{$data}) {
    if (defined($matchdate) && $date ne $matchdate) {
        next;
    }
    print "$date: $data->{$date}\n";
}


sub parseInuse
{
    my $i = shift;

    my @parts = split(': ', $i);
    $i = @parts[1];
    $i =~ s/\s\+//g;

    return $i;
}



# Mon Nov 19 11:00:01 2012
# Host: myserver
# accurev-ent inuse: 629
# 
# Mon Nov 19 12:00:01 2012
# Host: myserver
# accurev-ent inuse: 800
# 
# Sun May 19 02:00:01 2013
# Host: myserver
# accurev-ent inuse: 629
# 
# Sun May 19 02:00:01 2013
# Host: myserver
# accurev-ent inuse: 1000
于 2013-05-21T17:01:13.947 回答
0
use strict;
use warnings;
use 5.012;

use DateTime::Format::Strptime;
use List::Util qw/max/;

local $/ = "\n\n";
my $parser = DateTime::Format::Strptime->new(
    pattern   => '%a %b %d %H:%M:%S %Y',
    locale    => 'en_US',
    time_zone => 'America/Chicago',
); 
my @records;
for my $record (<DATA>) {
  my ($timestamp, $host, $inuse) = split ("\n", $record);
  $host =~ s/Host: //;
  $inuse =~ s/accurev-ent inuse: //;
  push @records, { timestamp => $parser->parse_datetime($timestamp), 
                   host => $host,
                   inuse => $inuse,
                 };
}

say max map {$_->{inuse}} grep {$_->{timestamp}->ymd() eq '2013-05-21' } @records;

__DATA__
Mon Nov 19 11:00:01 2012
Host: myserver
accurev-ent inuse: 629

Mon Nov 19 12:00:01 2012
Host: myserver
accurev-ent inuse: 629

Sun May 19 02:00:01 2013
Host: myserver
accurev-ent inuse: 629

Tue May 21 02:00:01 2013
Host: myserver
accurev-ent inuse: 1200

Tue May 21 02:00:01 2013
Host: myserver
accurev-ent inuse: 62

Tue May 21 02:00:01 2013
Host: myserver
accurev-ent inuse: 29

给出:

1200

您可以通过更改 grep 中使用的测试来相当简单地更改过滤器范围(例如,上午 8 点到晚上 10 点之间的最大值,一周内的最大值等)。

于 2013-05-21T18:02:16.410 回答