0

我有一个 file.txt,其中包含如下数据:

word = blabla
a = 1
b = 2
c = 3
word = blabla_b
a = 11
b = 22
c = 33

(基本上,这个文件是由一个java代码创建的,它将“a”包含的数字写入这个文件,然后在某个循环中的“b”和“c”之后,显然a,b,c 1,2,3只是一个例子,我有不同的名字和不同的数字:-))

我需要做的是:

  1. 从 CMD 提供的文件中读取
  2. 将其写入 CSV 文件(由 excel 打开)并在每列的底部(“word”除外,因为它不是数字)以获得平均值:(我知道如何在 excel 文件中计算平均值,但我想要脚本自行完成)

最后应该是这样的:

在此处输入图像描述

我做了一些不通用的事情,但它非常愚蠢,我相信可能会有更优雅的方式!

my $in_file = shift;  
my $fileName = "CSV_file";
open $out_file, ">$fileName.csv" or die "can't open $fileName: $!";
my @fields = ("word","a","b","c");
foreach (@fields)
{   
    #first write down the line of the headlines per each column
    print $out_file "$_,";  
}
print $out_file "\n";
open STATS, "<$in_file" or die "Error opening file \$in_file";
local($counter) = 0;
#creating avg and sum variables for each column - not clever once i'll have much more columns!!!!!!!!!
my $avgA = 0 , $avgB = 0 ,$avgC = 0;
my $sumA = 0 , $sumB = 0 ,$sumC = 0;

my $numOfRows = 0;
while (<STATS>)
{
    chop;
    ($name, $number) = split("=");  
    print $out_file "$number";

    if ($counter == $#fields) #end of row
    {
        print $out_file "\n";
        $sumC += $number;
        $counter = 0;
        $numOfRows++;
    }
    else
    {
        print $out_file ",";
        $counter++;
    }
    #adding to the Sum of each column (in order for future Avg calc)
    if ($counter == 2)
    {
        $sumA += $number;
    }
    elsif ($counter == 3)
    {
        $sumB += $number;
    }
}
$avgA = $sumA/$numOfRows;
$avgB = $sumB/$numOfRows;
$avgC = $sumC/$numOfRows;


print $out_file "AVG:,$avgA ,$avgB,$avgC \n";
close (FILE);
4

2 回答 2

1

这是 Dave Sherohman 的解决方案,其中包含正确结果的“修复”。(@column_sum 更改为哈希 (%column_sum)。

use warnings;
use strict;
use 5.010;

my @word;
my %raw_values;
my $record_number = -1;
my %column_sum;# changed to hash (was array)

while (my $line = <DATA>) {
  my ($col, $val) = $line =~ /(\w+)\s*=\s*(.*)/;
  if ($col eq 'word') {
    $record_number++;
    $word[$record_number] = $val;
  } else {
    $raw_values{$col}[$record_number] = $val;
    $column_sum{$col} += $val if $val =~ /^\d+$/; #changed from array to hash
  } 
}

say 'word,', join(',', sort keys %raw_values);

for my $rec (0 .. $#word) {
  my @row = ($word[$rec]);
  for my $col (sort keys %raw_values) {
    push @row, ($raw_values{$col}[$rec] || '---');
  }
  say join ',', @row;
}

say join(',', 'AVG', map { $column_sum{$_} / @word } sort keys %column_sum);

使用哈希散列的解决方案:

#!/usr/bin/perl
use strict;
use warnings;

my (%data, %sum, %seen, @cols, $word);
while (<DATA>) {
    my ($col, $val) = /^(\w+)\s*=\s*(\w+)$/;
    if ($col eq 'word') {
        $word = $val;   
    }
    else {
        push @cols, $col unless $seen{$col}++;
        $data{$word}{$col} = $val;
        $sum{$col} += $val;
    }
}

print join(",", 'word', @cols), "\n";

for my $word (sort keys %data) {
    print join(",", $word, map {$data{$word}{$_} || 0} @cols), "\n";    
}

print join(",", 'AVE', map {$sum{$_} / keys %data} @cols), "\n";

输出:

word,a,b,c
blabla,1,2,3
blabla_b,11,22,33
blabla_c,111,0,333
xyzzy,42,42,42
AVE,41.25,16.5,102.75
于 2013-05-28T19:55:48.790 回答
1

我知道我需要一个数据结构,我不确定是哪一个.. 列表。哈希或数组。– 用户 1584314

就我个人而言,我会为主要数据使用数组哈希,并使用几个辅助数组来跟踪单词列表(以便它们保持顺序)和每列中值的总和。类似于以下内容:

#!/usr/bin/env perl    

use warnings;
use strict;
use 5.010;

my @word;
my %raw_values;
my $record_number = -1;

while (my $line = <DATA>) {
  my ($col, $val) = $line =~ /(\w+)\s*=\s*(.*)/;
  if ($col eq 'word') {
    $record_number++;
    $word[$record_number] = $val;
  } else {
    $raw_values{$col}[$record_number] = $val;
  }
}

say 'word,', join(',', sort keys %raw_values);

for my $rec (0 .. $#word) {
  my @row = ($word[$rec]);
  for my $col (sort keys %raw_values) {
    push @row, ($raw_values{$col}[$rec] || '---');
  }
  say join ',', @row;
}

my @column_sum;
for my $col (sort keys %raw_values) { 
  my $sum = 0;
  for my $val (@{$raw_values{$col}}) {
    $sum += $val if defined $val && $val =~ /^\d+$/;
  }
  push @column_sum, $sum;
} 

say 'AVG:', join(',', map { $_ / scalar @word } @column_sum);

__DATA__
word = blabla
a = 1
b = 2
c = 3
word = blabla_b
a = 11
b = 22
c = 33
word=blabla_c
a=111
c=333
word=xyzzy
a=42
b=42
c=42

输出:

word,a,b,c
blabla,1,2,3
blabla_b,11,22,33
blabla_c,111,---,333
xyzzy,42,42,42
AVG:41.25,16.5,102.75

编辑:更正了平均计算,但是,老实说,现在我已经看到了(在编写了我自己的更正版本之后),我不得不说我喜欢 Chris Charley 的解决方案,它通过@column_sum变成哈希来计算平均值,而不是我自己的解决方案循环一个额外的时间%raw_values来正确计算@column_sum。我想我一定是太专注于将@column_sum其视为一个数组而没有考虑将其更改为哈希。

于 2013-05-28T14:28:16.550 回答