2

我想解析文件文本然后将其放入哈希中。我的文件如下所示:

key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val

我的键在空格之前,我的值是空格之后和每个逗号之前的元素列表。我有一些没有键的行,因为值在几行上继续。

所以我想要一个这样的哈希(我最熟悉 Python):

hash={'key1':[val,val,...],'key2':[val,val,...]} 

我的代码:`

my %hashNames;
open INFILE, "./file.txt" or die $!;
my @temp = ();

while (my $line = <INFILE>)
{

    my @names = split /[\t,]/, $line;
    my $ID = $names[0];
    if ( $line =~ /\t/ )
    {

        my @temp=();
        for (my $i = 1; $i < @names; $i +=1)
        {
            push (@temp, $names[$i]);
        }

    }
    else
    {   

        for (my $i = 0; $i < @names; $i +=1)
        {
            push (@temp, $names[$i]);
        }       
    }
}`
4

5 回答 5

3

您的问题是换行不再分隔您的记录。所以处理它的一种方法是禁用无效的默认输入记录分隔符$/并模拟一个有效的分隔符:

use strict;
use warnings;
use Data::Dumper;

my %hash;
my $file;
{
    local $/;         # disable input record separator
    $file = <DATA>;   # entire file here now!
}

for my $line (split /^(?=\S+ )/m, $file) {  # records begin this way now
    $line =~ s/\n//g;                       # remove newlines
    my ($key, $val) = split ' ', $line, 2;  # divide into two fields
    $hash{$key} = [ split /,/, $val ];      # store the data
}

print Dumper \%hash;

__DATA__
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val

说明:

  • /^(?=\S+ )/m使用/m修饰符进行拆分意味着^现在将匹配字符串中的换行符,这将模拟输入记录分隔符。
  • 通过将 LIMIT 2 添加到split
  • 我们使用一个匿名数组直接拆分为哈希,[ ... ]其中包含一个拆分语句。
于 2013-03-18T12:56:16.607 回答
2

使用Parse::RecDescent模块

#! /usr/bin/env perl

use strict;
use warnings;

use Parse::RecDescent;

our %hash;
my $p = Parse::RecDescent->new(q!
  hash: entry(s?)
  entry: key value(s /,/)  { $::hash{$item[1]} = [ @{ $item[2] } ] }
  key: /\S+/
  value: /([^,\n]|\\,])+/
!);
die "$0: failed to create parser" unless defined $p;

my $text = do {{ local $/; <DATA> }};
$p->hash($text) or die "$0: parse failed";

for (sort keys %hash) {
  print "$_ => val x ", scalar @{ $hash{$_} }, "\n";
}

__DATA__
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val

输出:

key1 => 值 x 22
key2 => 值 x 22
key3 => 验证 x 1
key4 => 验证 x 2
key5 => 值 x 52
于 2013-03-18T13:53:54.023 回答
1

这里的困难在于您的记录以“前面没有逗号的换行符”终止。不幸的是,输入记录分隔符$/不能设置为正则表达式。这留下了三个舒适的解决方案:

  1. 将整个文件加载到内存中。这并不像听起来那么糟糕,因为我们稍后在哈希中拥有相同数量的信息。然后我们可以split /(?<!,)\n/得到实际的记录。

    my %hash = do {
      local $/; # set to undef, for slurp
      map {
        my ($key, $vals) = split /\s+/, $_, 2; # split on first whitespace, into two strings
        $key => [ split /\s*,\s*/, $vals ];    # return a list of a key and a value array
      } split /(?<!,)\n/, <FILE>;              # split the file into records
    };
    
  2. 我们可以编写一个readline替代品来缓冲输入并可以用正则表达式终止行。

  3. 我们可以将尾随逗号视为一个续行字符。

    my %hash;
    while(<FILE>) {
      $_ .= <FILE> while /,\n\z/;
      my ($key, $value) = split /\s+/, $_, 2;
      push @{ $hash{$key} }, split /\s*,\s*/, $value; # allow multiple occurrences of one key, simply append values to list.
    }
    
于 2013-03-18T13:00:52.540 回答
0

给你:

my %results;
my $key;
while(my $line = <INFILE>) {
    chomp($line);
    my @items = split(/, */, $line);
    $key = shift @items;
    $results{$key} = \@items;
}

除了您的陈述之外,这适用于简单的情况:

我有一些没有键的行,因为值在几行上继续。

但是,要处理这个问题,您必须解释如何检测下一行是键还是值。如果您知道,那么您可以将其放在 if 语句中,并使用先前的键将新值添加到哈希中:

my %results;
my $key;
while(my $line = <INFILE>) {
    chomp($line);
    my @items = split(/, */, $line);
    my $tmpkey = shift @items;
    if (is_real_key($tmpkey)) {
        $key = shift @items;
        $results{$key} = \@items;
    } else {
        push (@{$results{$key}}, $tmpkey, @items);
    }
}
于 2013-03-18T12:55:54.327 回答
0
#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my $res_hash = {};
my ($current_key, $values);
my $push_again;
while ( my $line = <DATA>) {
  chomp $line;
  push ( @{ $res_hash->{$current_key} }, split(/,/, $values) ) if ( $current_key and $values and ( index($line, ' ') > 0) );
  if ( index($line, ' ') > 0 ){
    $push_again = 0;
    ($current_key, $values) = split( /\s/, $line);    
  } else {
    $values .= $line;
    $push_again = 1;
  }

};
push ( @{ $res_hash->{$current_key} }, split(/,/, $values) ) if $push_again;

say "result:".Dumper($res_hash);



__DATA__
key1 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key2 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val
key3 val
key4 val,val
key5 val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,val,
val,val,val,val,val,val,val,val,val,val,val,val,val,val,val
于 2013-03-18T13:08:01.120 回答