0

我需要一个正则表达式来解析 Apache 文件

   For example:
 Here is a portion of a /var/log/httpd/error_log

[Sun Sep 02 03:34:01 2012] [notice] Digest: done
[Sun Sep 02 03:34:01 2012] [notice] Apache/2.2.15 (Unix) DAV/2 mod_ssl/2.2.15 OpenSSL/1.0.0- fips SVN/1.6.11 configured -- resuming normal operations
[Sun Sep 02 03:34:01 2012] [error] avahi_entry_group_add_service_strlst("localhost") failed: Invalid host name
[Sun Sep 02 08:01:14 2012] [error] [client 216.244.73.194] File does not exist: /var/www/html/manager
[Sun Sep 02 11:04:35 2012] [error] [client 58.218.199.250] File does not exist: /var/www/html/proxy

我想要一个包含空格作为分隔符并排除嵌入空间的正则表达式。并且 apache 错误日志格式在

[DAY MMM DD HH:MM:SS YYYY] [MSG_TYPE] DESCRIPTOR: MESSAGE

[DAY MMM DD HH:MM:SS YYYY] [MSG_TYPE] [SOURCE IP] ERROR: DETAIL

我创建了 2 个正则表达式,第一个是

^(\[[\w:\s]+\]) (\[[\w]+\]) (\[[\w\d.\s]+\])?([\w\s/.(")-]+[\-:]) ([\w/\s]+)$

这个很简单,只需按原样匹配内容即可

我想要类似于我创建的以下正则表达式

      (?<=|\s)([\w:\S]+)

这个没有给我想要的输出,它不包括嵌入式空间。所以我需要一个正则表达式,它对每个字段进行分组,包括嵌入空间并使用空间作为分隔符。请帮我解释一下逻辑!!!!

我的代码

void regexparser( CharBuffer cb)
{ try{
    Pattern linePattern = Pattern.compile(".*\r?\n");
    Pattern csvpat = Pattern.compile( "^\\[([\\w:\\s]+)\\] \\[([\\w]+)\\] (\\[([\\w\\d.\\s]+)\\])?([\\w\\s/.(\")-]+[\\-:]) ([\\w/\\s].+)",Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
    Matcher lm = linePattern.matcher(cb);
    Matcher pm = null;

    while(lm.find())
    {   //System.out.print("1st loop");
        CharSequence cs = lm.group();

        if (pm==null)
            pm = csvpat.matcher(cs);
            else
                pm.reset(cs);
        while(pm.find())
        {  // System.out.println("2nd loop");
                //System.out.println(pm.groupCount());
                //CharSequence ps = pm.group();
                //System.out.print(ps);
            if(pm.group(4)==null)
                System.out.println(pm.group(1)+" "+pm.group(2)+" "+pm.group(5)+" "+pm.group(6));
            else
                System.out.println(pm.group(1)+" "+pm.group(2)+" "+pm.group(4)+" "+pm.group(5)+" "+pm.group(6));
4

1 回答 1

1

我同意这项任务应该使用现有的解析 Apache 日志的解决方案来完成。

但是,如果您想尝试一些用于培训目的的东西,也许您想从这个开始。我不是在一个巨大的正则表达式中解析所有内容,而是以更易读的小步骤进行:

代码

#!/usr/bin/env perl

use strict;
use warnings;
use DateTime::Format::Strptime;
use feature 'say';

# iterate log lines
while (defined(my $line = <DATA>)) {
    chomp $line;

    # prepare
    my %data;
    my $strp = DateTime::Format::Strptime->new(
        pattern => '%a %b %d %H:%M:%S %Y',
    );

    # consume date/time
    next unless $line =~ s/^\[(\w+ \w+ \d+ \d\d:\d\d:\d\d \d{4})\] //;
    $data{date} = $strp->parse_datetime($1);

    # consume message type
    next unless $line =~ s/^\[(\w+)\] //;
    $data{type} = $1;

    # "[source ip]" alternative
    if ($line =~ s/^\[(\w+) ([\d\.]+)\] //) {
        @data{qw(source ip)} = ($1, $2);

        # consume "error: detail"
        next unless $line =~ s/([^:]+): (.*)//;
        @data{qw(error detail)} = ($1, $2);
    }

    # "descriptor: message" alternative
    elsif ($line =~ s/^([^:]+): (.*)//) {
        @data{qw(descriptor message)} = ($1, $2);
    }

    # invalid
    else {
        next;
    }

    # something left: invalid
    next if length $line;

    # parsed ok: output
    say "$_: $data{$_}" for keys %data;
    say '-' x 40;
}

__DATA__
[Sun Sep 02 03:34:01 2012] [notice] Digest: done
[Sun Sep 02 03:34:01 2012] [notice] Apache/2.2.15 (Unix) DAV/2 mod_ssl/2.2.15 OpenSSL/1.0.0- fips SVN/1.6.11 configured -- resuming normal operations
[Sun Sep 02 03:34:01 2012] [error] avahi_entry_group_add_service_strlst("localhost") failed: Invalid host name
[Sun Sep 02 08:01:14 2012] [error] [client 216.244.73.194] File does not exist: /var/www/html/manager
[Sun Sep 02 11:04:35 2012] [error] [client 58.218.199.250] File does not exist: /var/www/html/proxy

输出

descriptor: Digest
date: 2012-09-02T03:34:01
type: notice
message: done
----------------------------------------
descriptor: avahi_entry_group_add_service_strlst("localhost") failed
date: 2012-09-02T03:34:01
type: error
message: Invalid host name
----------------------------------------
detail: /var/www/html/manager
source: client
ip: 216.244.73.194
date: 2012-09-02T08:01:14
error: File does not exist
type: error
----------------------------------------
detail: /var/www/html/proxy
source: client
ip: 58.218.199.250
date: 2012-09-02T11:04:35
error: File does not exist
type: error
----------------------------------------

请注意,根据您的格式描述,第二行无效并被程序忽略。

于 2012-09-24T13:32:10.327 回答