perl - Perl：在复杂哈希中生成数组

Question

为了使我的数据更易于访问，我想将我的表格数据存储在一个复杂的哈希中。当脚本循环遍历我的数据时，我正在尝试增加一个“HoHoHoA”。根据“perldsc”中的指南：

push @ { $hash{$column[$i]}{$date}{$hour} }, $data[$i];

该脚本编译并运行没有问题，但不向哈希添加任何数据：

print $hash{"Frequency Min"}{"09/07/08"}{"15"};

即使键应该存在，也不返回任何内容。在哈希上运行“存在”表明它不存在。

我正在阅读的数据文件如下所示：

DATE       TIME     COLUMN1 COLUMN2 COLUMN3...    
09/06/2008 06:12:56 56.23   54.23   56.35...
09/06/2008 06:42:56 56.73   55.28   54.52...
09/06/2008 07:12:56 57.31   56.79   56.41...
09/06/2008 07:42:56 58.24   57.30   58.86...
.
.
.

我想将任何给定日期和时间的数组中每一列的值组合在一起，因此 {COLUMN}、{DATE} 和 {HOUR} 的三个哈希值。

生成的结构将如下所示：

%monthData = (
               "COLUMN1" => {
                                    "09/06/2008" => {
                                                      "06" => [56.23,56.73...],
                                                      "07" => [57.31,58.24...]
                                                    }
                            },
               "COLUMN2" => {
                                    "09/06/2008" => {
                                                      "06" => [54.23,55.28...],
                                                      "07" => [56.79,57.30...]
                                                    }
                            },
               "COLUMN3" => {
                                    "09/06/2008" => {
                                                      "06" => [56.35,54.52...],
                                                      "07" => [56.41,58.86...]
                                                    }
                            }
             );

看看我的代码：

use feature 'switch';
open DATAFILE, "<", $fileName or die "Unable to open $fileName !\n";

    my %monthData;

    while ( my $line = <DATAFILE> ) {

        chomp $line;

        SCANROWS: given ($row) {

            when (0) { # PROCESS HEADERS

                @headers = split /\t\t|\t/, $line;
            }

            default {

                @current = split /\t\t|\t/, $line;
                my $date =  $current[0];
                my ($hour,$min,$sec) = split /:/, $current[1];

                # TIMESTAMP FORMAT: dd/mm/yyyy\t\thh:mm:ss

                SCANLINE: for my $i (2 .. $#headers) {

                    push @{ $monthData{$headers[$i]}{$date}{$hour} }, $current[$i];

                }
            }
        }
    }

    close DATAFILE;

    foreach (@{ $monthData{"Active Power N Avg"}{"09/07/08"}{"06"} }) {
        $sum += $_;
        $count++;
    }

    $avg = $sum/$count; # $sum and $count are not initialized to begin with.
    print $avg; # hence $avg is also not defined.

Hope my need is clear enough. How can I append values to an array inside these sub-hashes?

score 7 · Accepted Answer

This should do it for you.

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw/sum/;
sub avg { sum(@_) / @_ }

my $fileName = shift;

open my $fh, "<", $fileName
    or die "Unable to open $fileName: $!\n";

my %monthData;

chomp(my @headers = split /\t+/, <$fh>);

while (<$fh>) {
    chomp;
    my %rec;
    @rec{@headers} = split /\t+/;
    my ($hour) = split /:/, $rec{TIME}, 2;

    for my $key (grep { not /^(DATE|TIME)$/ } keys %rec) {
        push @{ $monthData{$key}{$rec{DATE}}{$hour} }, $rec{$key};
    }
}

for my $column (keys %monthData) {
    for my $date (keys %{ $monthData{$column} }) {
        for my $hour (keys %{ $monthData{$column}{$date} }) {
            my $avg = avg @{ $monthData{$column}{$date}{$hour} };
            print "average of $column for $date $hour is $avg\n";
        }
    }
}

Things to pay attention to:

strict and warnings pragmas
List::Util module to get the sum function
putting an array in scalar context to get the number of items in the array (in the avg function)
the safer three argument version of open
the lexical filehandle (rather than the old bareword style filehandle)
reading the headers first outside the loop to avoid having to have special logic inside it
using a hash slice to get the file data into a structured record
avoiding splitting the time more than necessary with the third argument to split
avoiding useless variables by only specifying the variable we want to catch in the list assignment
using grep to prevent the DATE and TIME keys from being put in %monthData
the nested for loops each dealing with a level in the hash

score 2 · Accepted Answer

I hope the following program populates the data structure you want:

#!/usr/bin/perl                        

use strict;
use warnings;
use Data::Dumper;

open my $fh, '<', 'input' or die $!;

my @headers;
for ( split /\t/, ~~ <$fh> ) {
    chomp;
    push @headers, $_ unless /^\t?$/;
}

my %monthData;
while (<$fh>) {
    my @line;
    for ( split /\t/ ) {
        chomp;
        push @line, $_ unless /^\t?$/;
    }

    for my $i ( 2 .. $#headers ) {
        my ($hour) = split /:/, $line[1];
        push @{ $monthData{ $headers[$i] }->{ $line[0] }->{$hour} }, $line[$i];
    }
}

print Dumper \%monthData;

score 1 · Accepted Answer

Here's how I would write a program to do that.

#! /usr/bin/env perl
use strict;
use warnings;
use 5.010; # for say and m'(?<name>)'

use YAML;
use Data::Dump 'dump';

my(%data,%original);
while( my $line = <> ){
  next unless $line =~ m'
    ^ \s*
      (?<day>   0?[1-9] | [12][0-9] | 3[0-1] ) /
      (?<month> 0?[1-9] | 1[0-2] ) /
      (?<year>  [0-9]{4} )
      \s+
      (?<hour>   0?[1-9] | 1[0-9] | 2[0-4] ) :
      (?<minute> 0?[1-9] | [1-5][0-9] ) :
      (?<second> 0?[1-9] | [1-5][0-9] )
      \s+
      (?<columns> .* )
  'x;
  my @columns = split ' ', $+{columns};

  push @{
    $data{ $+{year}  }
         { $+{month} }
         { $+{day}   }
         { $+{hour}  }
  }, \@columns; # or [@columns]

  # If you insist on having it in that data structure you can do this:
  my $count = 1;
  my $date = "$+{day}/$+{month}/$+{year}";
  for my $column ( @columns ){
    my $col = 'COLUMN'.$count++;
    push @{ $original{$col}{$date}{$+{hour}} }, $column;
  }
}

say Dump \%data, \%original; # YAML
say dump \%data, \%original; # Data::Dump

Given this input

DATE       TIME     COLUMN1 COLUMN2 COLUMN3
09/06/2008 06:12:56 56.23   54.23   56.35
09/06/2008 06:42:56 56.73   55.28   54.52
09/06/2008 07:12:56 57.31   56.79   56.41
09/06/2008 07:42:56 58.24   57.30   58.86

Either "perl program.pl datafile" or "perl program.pl < datafile"

YAML

---
2008:
  06:
    09:
      06:
        -
          - 56.23
          - 54.23
          - 56.35
        -
          - 56.73
          - 55.28
          - 54.52
      07:
        -
          - 57.31
          - 56.79
          - 56.41
        -
          - 58.24
          - 57.30
          - 58.86
---
COLUMN1:
  09/06/2008:
    06:
      - 56.23
      - 56.73
    07:
      - 57.31
      - 58.24
COLUMN2:
  09/06/2008:
    06:
      - 54.23
      - 55.28
    07:
      - 56.79
      - 57.30
COLUMN3:
  09/06/2008:
    06:
      - 56.35
      - 54.52
    07:
      - 56.41
      - 58.86

Data::Dump

(
  {
    2008 => {
          "06" => {
                "09" => {
                      "06" => [["56.23", "54.23", "56.35"], ["56.73", "55.28", "54.52"]],
                      "07" => [["57.31", "56.79", "56.41"], ["58.24", "57.30", "58.86"]],
                    },
              },
        },
  },
  {
    COLUMN1 => {
                 "09/06/2008" => { "06" => ["56.23", "56.73"], "07" => ["57.31", "58.24"] },
               },
    COLUMN2 => {
                 "09/06/2008" => { "06" => ["54.23", "55.28"], "07" => ["56.79", "57.30"] },
               },
    COLUMN3 => {
                 "09/06/2008" => { "06" => ["56.35", "54.52"], "07" => ["56.41", "58.86"] },
               },
  },
)

perl - Perl：在复杂哈希中生成数组

3 回答 3

Given this input

YAML

Data::Dump

Related

Reference