0

我很难以一致的格式放置数组列。我有以下输出:

Mon,Jun,25,14:39:29,2012,971,29,0,25,0,0,0,4,Mon,Jun,25,14:39:29,2012,25,mod_was_ap22_http.c
    Mon,Jun,25,14:40:29,2012,972,28,0,25,0,0,0,3,Mon,Jun,25,14:40:29,2012,3,mod_sm22.cpp,22,mod_was_ap22_http.c
    Mon,Jun,25,14:41:29,2012,973,27,0,24,0,0,0,3,Mon,Jun,25,14:41:29,2012,24,mod_was_ap22_http.c
    Mon,Jun,25,14:42:29,2012,974,26,0,20,0,0,0,6,Mon,Jun,25,14:42:29,2012,1,mod_sm22.cpp,19,mod_was_ap22_http.c
    Mon,Jun,25,14:43:29,2012,971,29,0,26,0,0,0,3,Mon,Jun,25,14:43:29,2012,2,mod_sm22.cpp,24,mod_was_ap22_http.c
    Mon,Jun,25,14:44:30,2012,957,43,0,41,0,0,0,2,Mon,Jun,25,14:44:30,2012,1,mod_sm22.cpp,40,mod_was_ap22_http.c
    Mon,Jun,25,14:45:30,2012,963,37,0,35,0,0,0,2,Mon,Jun,25,14:45:30,2012,2,mod_sm22.cpp,32,mod_was_ap22_http.c
    Mon,Jun,25,14:46:30,2012,972,28,0,24,1,1,0,2,Mon,Jun,25,14:46:30,2012,24,mod_was_ap22_http.c,1,ApacheModule.cpp
    Mon,Jun,25,14:47:30,2012,961,39,1,37,0,0,0,1,Mon,Jun,25,14:47:30,2012,37,mod_was_ap22_http.c,1,ApacheModule.cpp
    Mon,Jun,25,14:48:30,2012,968,32,0,30,0,0,0,2,Mon,Jun,25,14:48:30,2012,30,mod_was_ap22_http.c
    Mon,Jun,25,14:49:30,2012,972,28,0,25,0,0,0,3,Mon,Jun,25,14:49:30,2012,1,mod_sm22.cpp,24,mod_was_ap22_http.c

我希望显示的列:DayOfWeek,Month,Day,Time,Year,Rdy,Bsy,Rd,Wr,Ka,Log,Dns,Cls, AP22,SM22,ApacheModule

目前,粗体列的顺序不正确(其余的都是正确的)。每一行都与该格式不一致。该行有时先有 ap22,有时先有 sm22,有时没有或全部三个模块。模块前的数字与模块相关。如何将数据转换为一致的格式?

请注意,每行中的第二个日期 mod_was_http.c、mod_sm22.cpp 和 ApacheModule.cpp 将在最终数组中删除。

到目前为止,这是我的代码:

# This program parses a error log for necessary information and outputs in CSV format.

# chunks of your input to ignore, see below... 
my %ignorables = map { $_ => 1 } qw([notice mpmstats: rdy bsy rd wr ka log dns cls bsy: in);  

# 3-arg open is safer than 2, lexical my $fh better than a global FH glob 
open my $error_fh, '<', 'iset_error_log';   

sub findLines {
    my($item,@result)=("");
    # Iterates over the lines in the file, putting each into $_ 
    while (<$error_fh>) {      

        # Select only those fields that have the word 'notice'
        if (/\[notice/) {          

            # Place those lines with the word 'rdy' on the next line
            if (/\brdy\b/){
                push @result,"$item\n";
                $item="";

            }
            else {
                $item.=",";
            }

            # Split the line into fields, separated by spaces, skip the %ignorables         
            my @line = grep { not defined $ignorables{$_} } split /\s+/;    

            # More cleanup         
            s/|^\[|notice|[]]//g for @line; # remove unnecessary elements from the array

            # Output the line.  
            @line = join(",", @line);          
            s/,,/,/g for @line;
            map $item.=$_, @line;
            }
        } 
        @result
    }  

my @array = &findLines;
foreach $line (@array){
    print $line; #This is where I would like to organize the lines if possible.
}

我的输入文件如下所示:

[Mon Jun 25 07:51:17 2012] [notice] mpmstats: rdy 990 bsy 10 rd 0 wr 7 ka 0 log 0 dns 0 cls 3
[Mon Jun 25 07:51:17 2012] [notice] mpmstats: bsy: 2 in mod_sm22.cpp, 5 in mod_was_ap22_http.c
[Mon Jun 25 08:08:17 2012] [notice] mpmstats: rdy 974 bsy 26 rd 1 wr 24 ka 0 log 0 dns 0 cls 1
[Mon Jun 25 08:08:17 2012] [notice] mpmstats: bsy: 1 in mod_sm22.cpp, 23 in mod_was_ap22_http.c, 1 in ApacheModule.cpp        Mon,Jun,25,14:38:29,2012,962,38,0,36,0,0,0,2,Mon,Jun,25,14:38:29,2012,3,mod_sm22.cpp,33,mod_was_ap22_http.c

    [Mon Jun 25 21:54:41 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 0 ka 0 log 0 dns 0 cls 1
    [Mon Jun 25 21:55:41 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 0 ka 0 log 0 dns 0 cls 1
    [Mon Jun 25 21:59:41 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
    [Mon Jun 25 21:59:41 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
    [Mon Jun 25 22:00:41 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
    [Mon Jun 25 22:00:41 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
    [Mon Jun 25 22:03:41 2012] [notice] mpmstats: rdy 998 bsy 2 rd 0 wr 2 ka 0 log 0 dns 0 cls 0
    [Mon Jun 25 22:03:41 2012] [notice] mpmstats: bsy: 2 in mod_was_ap22_http.c
    [Mon Jun 25 22:08:42 2012] [notice] mpmstats: rdy 998 bsy 2 rd 0 wr 2 ka 0 log 0 dns 0 cls 0
    [Mon Jun 25 22:08:42 2012] [notice] mpmstats: bsy: 2 in mod_was_ap22_http.c
    [Mon Jun 25 22:21:42 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
    [Mon Jun 25 22:21:42 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
    [Mon Jun 25 22:22:42 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
    [Mon Jun 25 22:22:42 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
    [Mon Jun 25 22:31:42 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 0 ka 0 log 0 dns 0 cls 1
    [Mon Jun 25 22:32:42 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
    [Mon Jun 25 22:32:42 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
    [Mon Jun 25 23:06:43 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
    [Mon Jun 25 23:06:43 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
4

1 回答 1

0

您可能希望在仍然有拆分的情况下重新排序列,然后再使用 join 将它们转回一行文本。

你只需要做一个交换。

# 0        ,1    ,2  ,3   ,4   ,5  ,6  ,7 ,8 ,9 ,10 ,11 ,12 ,13  ,14  ,15
# DayOfWeek,Month,Day,Time,Year,Rdy,Bsy,Rd,Wr,Ka,Log,Dns,Cls,AP22,SM22,ApacheModule
#
# Sometimes the last 2 fields are missing and 13 comes before 14 and 15 in the 
# input, so fix that.
if (@line < 16) {
    push @line, '', ''; # or whatever you want for blanks
}

@line = @line[0..12,14,15,13]; # rearrange the array

此外,如果您使用空白字符串 ( ) 作为空字段,您的正则表达式s/,,/,/g将会破坏这一点。''缺少最后一个字段的短线将恢复为缺少正确的 13 和 14 字段。

根据此处和之前提出的各种问题,我强烈建议您获取一份Modern Perl(可下载或购买)或学习 Perl,以更好地掌握整个语言。我最近阅读了很多前者并喜欢它,并且从后者的早期版本中获得了我最初的大部分 Perl 知识。

于 2012-07-03T16:30:13.627 回答