0

我想解析一个包含如下数据的文件:

05\/26\/2013 06:09:47 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=radash@abc.com\,ou=People\,o=zeb.com - 06:09:47 - http - uizweb_zam -  - 2uid=bolched@abc.com
05\/26\/2013 06:09:48 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=rad-ash2s@abc.com\,ou=People\,o=zeb.com - 06:09:48 - http - uizweb_zam -  - 2uid=bolchedssd@abc.com
05\/26\/2013 06:09:49 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.43 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=sjhsjdh@abc.com\,ou=People\,o=zeb.com - 06:09:49 - http - uizweb_zam -  - 2uid=kjsdsdjhjsh@abc.com

并得到:

05/26/2013 06:09:49  and uid=radash@abc.com,ou=People,o=zeb.com 
05/26/2013 06:09:48  and uid=rad-ash2s@abc.com,ou=People,o=zeb.com

我尝试了 split('-') 但它不起作用 split('-') 因为如您所见: 像上面第二行这样的一些行有: rad-ash2s@abc.com ( '-' ) 介于两者之间。有时,数据的其他部分也有“-”。

请帮忙。

4

2 回答 2

1

您最好使用正则表达式。使用正则表达式,我可以快速获取我想要的字符串部分(...)。请参阅 Perldoc 上的正则表达式以了解各种正则表达式元字符的含义。

#! /usr/bin/env perl

use 5.12.0;
use warnings;
use autodie;

while ( my $line = <DATA> ) {
    chomp $line;
    $line =~ s/\\//g;   #Remove all backslashes
    $line =~ /^(.+?) -.+?(uid=\S+)/;
    my $date = $1;
    my $uid = $2;
    say qq($date and $uid);
}

__DATA__
05\/26\/2013 06:09:47 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=radash@abc.com\,ou=People\,o=zeb.com - 06:09:47 - http - uizweb_zam -  - 2uid=bolched@abc.com
05\/26\/2013 06:09:48 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=rad-ash2s@abc.com\,ou=People\,o=zeb.com - 06:09:48 - http - uizweb_zam -  - 2uid=bolchedssd@abc.com
05\/26\/2013 06:09:49 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.43 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=sjhsjdh@abc.com\,ou=People\,o=zeb.com - 06:09:49 - http - uizweb_zam -  - 2uid=kjsdsdjhjsh@abc.com
于 2013-06-05T18:34:07.517 回答
0

该程序按您的要求执行。看起来字段分隔符是' - ',即两边都有空格的连字符,给出了一个空白的倒数第二个字段(第十一个)。

该程序需要输入文件的名称作为命令行上的参数。

use strict;
use warnings;

while (<>) {
  chomp;
  tr/\\//d;
  my @fields = split /\x20-\x20/;
  printf "%s and %s\n", @fields[0,6];
}

使用您自己的数据,这会产生

05/26/2013 06:09:47 -0700 and uid=radash@abc.com,ou=People,o=zeb.com
05/26/2013 06:09:48 -0700 and uid=radash2s@abc.com,ou=People,o=zeb.com
05/26/2013 06:09:49 -0700 and uid=sjhsjdh@abc.com,ou=People,o=zeb.com
于 2013-06-05T17:07:51.443 回答