0

我有一个文件,其中包含这样的长行:

XEP.101     :1804 000000:I:XEPInfoFormat:Status=ok:TID=00000000516F6161-000874C3-00003E19-62F2B0C6:CallType=gprs:CallStart=20130415210553:CallDuration=4334:ServedParty=724044024363999:ServedLocation=724:OtherParty=TIM:OtherLocation=tim.br:ServedZone=ZO00001:OtherZone=ZP32363:TariffZone=ZN1261:CUST_ID=58922505:CO_ID=58891164:account=8327813:MSISDN=554599836655:theoretical_cost_value=33.323525:BA_Line_Main_value=NA:Tariff=TM_PL5PR:FU_Packs_used=FU_PLWI2:SNCODE_FU=1350_1250_1_BA_FU_PLWI2_Byt_Internet2:MCs_used=NO:bcd=20100319,bcp=P1M:InputFilename=201304172345.000020:EipFilename=/gold/rte/data/RatedEvents/EIP/10/101201304172345.000020:RtxFilename=/gold/rte/data/RatedEvents/RTX/01/OPSCGOLD_20130418000000_1011917.xml:BadrateFilename=/gold/rte/data/BadRate/bad_rate_xep10.201304172345.000020.tmp:FILE=/gold/rte/data/IncomingCDRs/ASN1/010/GPRS99+GPRS99-46299-1304172357-SA.TTF;TICKET=660

所以我有这个条件,要匹配 Perl 中的这一行:

if ( $line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(\S*):CallType=(\w*):CallStart=(\d*):CallDuration=(\d*):ServedParty=(\d*):ServedLocation=(\d*):OtherParty=(\w*):OtherLocation=(\w*):ServedZone=(\w*):OtherZone=(\w*):TariffZone=(\w*):CUST_ID=(\d*):CO_ID=(\d*):account=(\d*):MSISDN=(\d*):theoretical_cost_value=(\d*)\.(\d*):BA_Line_Main_value=(\w*):Tariff=(\w*):FU_Packs_used=(\w*):SNCODE_FU=(\w*):MCs_used=(\w*):bcd=(\d*),bcp=(\w*):InputFilename=(\d*)\.(\d*):EipFilename=\/\w*\/\w*\/\w*\/\w*\/\w*\/(\d*)\/(\d*)\.(\d*).*FILE=\/\w*\/\w*\/\w*\/\w*\/\w*\/(\d*)\/(\w*)\+(\w*)-(\d*)-(\d*)-(\w*).(\w*);TICKET=(\d*)/ ) {

所以对我来说没关系,这是匹配并给我带来结果。但是,我想让它更灵活,例如,如果我想匹配这一整行并在我的匹配项中指定一个字段作为我的脚本中的一个选项,例如(包含在 TID= 之前),那么,什么我想做的是:

use Getopt::Std;
getopts("Ch:t:",\%opts);

if ( $opts{t} ) {
    $TIDS = $opts{t};
} else {
    $TIDS = '/S*';
}

所以,我正在尝试这样做,我的匹配替换变量 $TIDS,使用 getopts -t

if ( $line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(${TIDS})

因此,如果我使用 -t 选项指定参数,例如:

perl-script.pl -t 888894343

我希望它在我的整个正则表达式中像这样匹配:

if ( $line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(888894343)

但是,如果我不指定这个,我希望它像这样匹配:

if ( $line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(/S*)

我知道我可以简单地将所有行与 (/S*) 匹配,然后像下面这样放置一些简单的 if 条件,但是这样我会失去性能,因为有很多行像我举的例子一样,所以我会喜欢灵活搭配

print "$line\n" if $6 eq $TIDS;

有人有什么想法吗?我尝试使用quotemeta,加上简单的引号,双引号我的正则表达式,但没有奏效。

4

3 回答 3

0

另一个建议。无需一次检查TID并解析该行的值:您可以先对记录进行非常快速的检查,然后解析(使用散列技术或使用您选择的正则表达式)它,如果它是兴趣。

while (<>) {
  next if $opts{t} and $line !~ /:TID=$opts{t}:/;
  # Parse and process record
}
于 2013-04-20T19:16:01.883 回答
0

您的代码不起作用的主要原因是您使用的是'/S*',它匹配一个斜杠后跟零个或多个S字符,而不是'\S*',它是零个或多个空白字符。

但是,与其使用正则表达式,我认为最好将每条记录拆分为使用split /:/. 此外,前四个之后的所有字段都是 for name=value,因此可以方便地将这些字段放入哈希中以便于访问。那么你所要做的就是检查if ($ch{t} eq $params{TID}) { ... }

此代码演示。我曾经用来Data::Dump显示%params构建的哈希的内容。目前尚不清楚前四个字段中的信息是否重要,但我已将它们提取出来@params以备不时之需。

use strict;
use warnings;

use Data::Dump;

my %opts = (t => 888894343);

while (my $line = <DATA>) {
  chomp $line;
  my %params = $line =~ /([^:=]+)=([^:=]+)/g;
  ddx \%params;
  #next if $opts{t} and $params{TID} ne $opts{t};
  my @params = (split /:/, $line, 5)[0..3];
  ddx \@params;
  #print $line;
}

__DATA__
XEP.101     :1804 000000:I:XEPInfoFormat:Status=ok:TID=00000000516F6161-000874C3-00003E19-62F2B0C6:CallType=gprs:CallStart=20130415210553:CallDuration=4334:ServedParty=724044024363999:ServedLocation=724:OtherParty=TIM:OtherLocation=tim.br:ServedZone=ZO00001:OtherZone=ZP32363:TariffZone=ZN1261:CUST_ID=58922505:CO_ID=58891164:account=8327813:MSISDN=554599836655:theoretical_cost_value=33.323525:BA_Line_Main_value=NA:Tariff=TM_PL5PR:FU_Packs_used=FU_PLWI2:SNCODE_FU=1350_1250_1_BA_FU_PLWI2_Byt_Internet2:MCs_used=NO:bcd=20100319,bcp=P1M:InputFilename=201304172345.000020:EipFilename=/gold/rte/data/RatedEvents/EIP/10/101201304172345.000020:RtxFilename=/gold/rte/data/RatedEvents/RTX/01/OPSCGOLD_20130418000000_1011917.xml:BadrateFilename=/gold/rte/data/BadRate/bad_rate_xep10.201304172345.000020.tmp:FILE=/gold/rte/data/IncomingCDRs/ASN1/010/GPRS99+GPRS99-46299-1304172357-SA.TTF;TICKET=660

输出

# para.pl:11: {
#   account                => 8327813,
#   BA_Line_Main_value     => "NA",
#   BadrateFilename        => "/gold/rte/data/BadRate/bad_rate_xep10.201304172345.000020.tmp",
#   bcd                    => "20100319,bcp",
#   CallDuration           => 4334,
#   CallStart              => 20130415210553,
#   CallType               => "gprs",
#   CO_ID                  => 58891164,
#   CUST_ID                => 58922505,
#   EipFilename            => "/gold/rte/data/RatedEvents/EIP/10/101201304172345.000020",
#   FILE                   => "/gold/rte/data/IncomingCDRs/ASN1/010/GPRS99+GPRS99-46299-1304172357-SA.TTF;TICKET",
#   FU_Packs_used          => "FU_PLWI2",
#   InputFilename          => "201304172345.000020",
#   MCs_used               => "NO",
#   MSISDN                 => 554599836655,
#   OtherLocation          => "tim.br",
#   OtherParty             => "TIM",
#   OtherZone              => "ZP32363",
#   RtxFilename            => "/gold/rte/data/RatedEvents/RTX/01/OPSCGOLD_20130418000000_1011917.xml",
#   ServedLocation         => 724,
#   ServedParty            => 724044024363999,
#   ServedZone             => "ZO00001",
#   SNCODE_FU              => "1350_1250_1_BA_FU_PLWI2_Byt_Internet2",
#   Status                 => "ok",
#   Tariff                 => "TM_PL5PR",
#   TariffZone             => "ZN1261",
#   theoretical_cost_value => 33.323525,
#   TID                    => "00000000516F6161-000874C3-00003E19-62F2B0C6",
# }
# para.pl:14: ["    XEP.101     ", "1804 000000", "I", "XEPInfoFormat"]
于 2013-04-20T18:41:02.573 回答
0

如果您尝试在命令行参数等变量上使用 quotemeta,则需要执行以下操作:

$foo = quotemeta($ARGV[0]);
于 2013-04-20T18:23:50.747 回答