0

我正在尝试从输入字符串创建 Perl 哈希,但我遇到了原始“拆分”的问题,因为值可能包含引号。下面是一个示例输入字符串,以及我的(期望的)结果哈希:

my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,MOB,123,KEY,VALUE":TIME,"08:01:59":FIN,0';

my %hash = 
  (
   CREATE     => '',
   USER       => '',
   TEL        => '12345678',
   MOB        => '444001122',
   Type       => 'Whatever',
   ATTRIBUTES => 'ID,0,MOB,123,KEY,VALUE',
   TIME       => '08:01:59',
   FIN        => '0',
  );

输入字符串为任意长度,未设置键数。

谢谢!

-hq

4

4 回答 4

5

使用Text::CSV。它正确处理逗号分隔的值文件。

更新

标准模块似乎无法解析您的输入格式,即使使用sep_charand 也是如此allow_loose_quotes。因此,您必须自己完成繁重的工作,但您仍然可以使用 Text::CSV 来解析每个键值对:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw(say);

use Data::Dumper;

use Text::CSV;

my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';

my @fields = split /:/, $command;
my %hash;
my $csv = Text::CSV->new();

my $i = 0;
while ($i <= $#fields) {
    if (1 == $fields[$i] =~ y/"//) {
        my $j = $i;
        $fields[$i] .= ':' . $fields[$j] until 1 == $fields[++$j] =~ y/"//;
        $fields[$i] .= ':' . $fields[$j];
        splice @fields, $i + 1, $j - $i, ();
    }
    $csv->parse($fields[$i]);
    my ($key, $value) = $csv->fields;
    $hash{$key} = "$value"; # quotes turn undef to q()
    $i++;
}

print Dumper \%hash;
于 2013-02-02T13:33:12.190 回答
3

据我所见,最明显的候选者 - Text::CSV- 不能正确处理这种格式,所以只有一个本土的正则表达式解决方案。

use strict;
use warnings;

my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';

my %config;
for my $field ($command =~ /(?:"[^"]*"|[^:])+/g) {
  my ($key, $val) = split /,/, $field, 2;
  ($config{$key} = $val // '') =~ s/"([^"]*)"/$1/;
}

use Data::Dumper;
print Data::Dumper->Dump([\%config], ['*config']);

输出

%config = (
            'TIME' => '08:01:59',
            'MOB' => '444001122',
            'Type' => 'Whatever',
            'CREATE' => '',
            'TEL' => '12345678',
            'ATTRIBUTES' => 'ID,0,KEY,VALUE',
            'USER' => '',
            'FIN' => '0'
          );

如果你有 Perl v5.10 或更高版本,那么你有方便的(?| ... )正则表达式组,它允许你写这个

use 5.010;
use warnings;

my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';

my %config = $command =~ /(\w+) (?| , " ([^"]*) " | , ([^:"]*) | () )/gx;

use Data::Dumper;
print Data::Dumper->Dump([\%config], ['*config']);

这会产生与上面的代码相同的结果。

于 2013-02-02T14:02:58.700 回答
2

这看起来像是Text::ParseWords可以处理的。该quotewords子例程将拆分分隔符上的输入:,忽略引号内的分隔符。这将为我们提供基本的项目列表,在输出中首先显示为$VAR1. 之后,使用正则表达式解析逗号分隔的项目是一件简单的事情,该正则表达式将处理可选的第二次捕获以容纳空标签,例如 forCREATEUSER.

use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;

while (<DATA>) {
    chomp;
    my @list = quotewords(':', 0, $_);
    my %hash = map { my ($k, $v) = /([^,]+),?(.*)/; $k => $v; } @list;
    print Dumper \@list, \%hash;
}

__DATA__
CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0

输出:

$VAR1 = [
          'CREATE',
          'USER',
          'TEL,12345678',
          'MOB,444001122',
          'Type,Whatever',
          'ATTRIBUTES,ID,0,KEY,VALUE',
          'TIME,08:01:59',
          'FIN,0'
        ];
$VAR2 = {
          'TIME' => '08:01:59',
          'MOB' => '444001122',
          'Type' => 'Whatever',
          'CREATE' => '',
          'TEL' => '12345678',
          'ATTRIBUTES' => 'ID,0,KEY,VALUE',
          'USER' => '',
          'FIN' => '0'
        };
于 2013-02-02T15:13:03.697 回答
0
my %hash = $command =~ /([^:,]+)(?:,((?:[^:"]|"[^"]*")*))?/g;
s/"([^"]*)"/$1/g
   for grep defined, values %hash;
于 2013-02-02T14:18:51.813 回答