perl - 需要 perl 脚本来查找在网络日志中找到的图像数量

Question

我有网络日志文件，我遇到了很多麻烦，因为我是 perl 的新手。我只需要一个脚本来查找找到的每个图像的计数。我能够列出它们，但我不确定如何计算，比如“查看了 x jpg 和 x gif”。

到目前为止，我的代码如下所示：

use warnings;
open FILE, "jan28.log";
while ($line = <FILE>) { 

    if ($line =~ /.jpg/) {

        print $line;
    } 
    elsif ($line =~ /.gif/) {

        print $line;
    }
    elsif ($line =~ /tiff/) {

        print $line; 
    }
}

Web 日志如下所示。

24.131.83.162 - - [28/Jan/2007:00:00:00 -0500] "GET /~taler/images/index_09.jpg   HTTP/1.1" 200 1563
207.46.98.53 - - [28/Jan/2007:00:00:04 -0500] "GET /%7Edist/programs/PhD/PhDGuide/guideA.htm HTTP/1.0" 200 19090
74.6.74.184 - - [28/Jan/2007:00:00:12 -0500] "GET /%7Embsclass/hall_of_fame/myicon.ico HTTP/1.0" 200 760
58.68.24.3 - - [28/Jan/2007:00:00:16 -0500] "GET /~dtipper/tipper.html HTTP/1.1" 200 5896
58.68.24.3 - - [28/Jan/2007:00:00:16 -0500] "GET /~dtipper/gifs/head.jpg HTTP/1.1" 200 18318

score 2 · Accepted Answer

use strict;
use warnings;
use feature qw( say );
use URI qw( );

my $jpegs = 0;
my $gifs  = 0;
while (<>) {
   chomp;
   my ($req, $code) = /^(?:\S+\s+){3}\[[^\]]*\] "([^"]*)"\s*(\S+)/
      or next;

   $code >= 200 && $code < 300
      or next;

   my ($meth, $url) = split(' ', $req);
   $url = URI->new($url, 'http');

   my $path = $url->path;
   if    ($path =~ /\.jpe?g\z/i) { ++$jpegs; }
   elsif ($path =~ /\.gif\z/i  ) { ++$gifs; }
}

say "There were $jpegs jpgs and $gifs gifs viewed";

score 0 · Accepted Answer

尝试这样做（在外壳中）：

perl -wane '
    END{
        print "there\047s was $hash{$_} items for $_\n" for sort keys %hash;
    }

    $key = $1 if m!.*\.(jpe?g|gif|ico)\b!i;
    $hash{$key}++
' filename.txt

如果您想要一个具有相同逻辑的真实脚本，该Deparse模块将有所帮助：

$ perl -MO=Deparse -wane '
END{
    print "there\047s was $hash{$_} items for $_\n" for sort keys %hash;
}

$key = $1 if m!.*\.(jpe?g|gif|ico)\b!i;
$hash{$key}++
' filename.txt

“Deparsed”结果脚本：

BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
    our(@F) = split(' ', $_, 0);
    sub END {
        print "there's was $hash{$_} items for $_\n" foreach (sort keys %hash);
    }
    $key = $1 if /.*\.(jpe?g|gif|ico)\b/i;
    ++$hash{$key};
}
-e syntax OK

score 0 · Accepted Answer

这是一个基本示例，但 CPAN 中可能有一些 Log Parser 模块。

use File::Open::OOP qw(oopen);
use Data::Dump qw(dump);

my $fh = oopen 'log';
my %hash;
while ( my $row = $fh->readline ) {
  $row =~ s/.*\"GET\ \/.*\.(\w+)\ .*\n$/$1/;
  $ext = $row;
  $hash{$ext} += 1;
}
dump(%hash);

样品的输出：

$ perl script.pl

("html", 1, "ico", 1, "jpg", 2, "htm", 1)

$

perl - 需要 perl 脚本来查找在网络日志中找到的图像数量

3 回答 3

Related

Reference