perl - Summing a column of numbers in a text file using Perl

Question

Ok, so I'm very new to Perl. I have a text file and in the file there are 4 columns of data(date, time, size of files, files). I need to create a small script that can open the file and get the average size of the files. I've read so much online, but I still can't figure out how to do it. This is what I have so far, but I'm not sure if I'm even close to doing this correctly.

#!/usr/bin/perl

open FILE, "files.txt";
#@array = File;

while(FILE){
    #chomp;

    ($date, $time, $numbers, $type) = split(/ /,<FILE>);

    $total += $numbers;

}
print"the total is $total\n";

This is how the data looks in the file. These are just a few of them. I need to get the numbers in the third column.

12/02/2002  12:16 AM              86016 a2p.exe
10/10/2004  11:33 AM               393 avgfsznew.pl
11/01/2003  04:42 PM             38124 c2ph.bat

score 16 · Accepted Answer

您的程序相当接近工作。通过这些更改，它将完全按照您的意愿行事

始终在程序开始时使用use strict和use warnings，并使用 . 声明所有变量my。这将帮助您找到许多您可能会忽略的简单错误
使用词法文件句柄，即的三参数形式open，并始终检查任何open调用的返回状态
$total在循环外声明变量。在循环内声明它意味着它将在每次循环周围被创建和销毁，并且无法累积总数
以同样的方式声明一个$count变量。您将需要它来计算平均值
仅使用while (FILE) {...}测试FILE是正确的。您需要从中读取，因此您必须使用readline类似的运算符<FILE>
您希望默认调用split（不带任何参数），它将所有非空格字段$_作为列表返回
您需要在分配中添加一个变量以允许每行中的 athe AMorPM字段

这是对您的代码的修改，效果很好

use strict;
use warnings;

open my $fh, '<', "files.txt" or die $!;

my $total = 0;
my $count = 0;

while (<$fh>) {

    my ($date, $time, $ampm, $numbers, $type) = split;

    $total += $numbers;
    $count += 1;

}

print "The total is $total\n";
print "The count is $count\n";
print "The average is ", $total / $count, "\n";

输出

The total is 124533
The count is 3
The average is 41511

score 14 · Accepted Answer

awk使用 Perl 的类似自动拆分选项很诱人。有5列；三包含日期和时间信息，然后是大小，然后是名称。

我写的脚本的第一个版本也是最冗长的：

perl -n -a -e '$total += $F[3]; $num++; END { printf "%12.2f\n", $total / ($num + 0.0); }'

( -aauto-split) 选项将空白处的一行拆分为数组@F。结合-n选项（使 Perl 在循环中运行，依次读取文件名参数或标准输入，而不打印每一行），代码将$F[3]（第四列，从 0 开始计数）添加到$total，自动初始化为首次使用为零。它还计算$num. 当读取所有输入时执行该END块；它用于printf()格式化值。+ 0.0确保算术是在浮点数中完成的，而不是整数算术。awk这与脚本非常相似：

awk '{ total += $4 } END { print total / NR }'

程序的初稿很少是最佳的——或者，至少，我不是一个好的程序员。修订有帮助。

Perl 在某种程度上被设计为awk杀手。仍然有一个a2p随 Perl 分发的程序，用于将awk脚本转换为 Perl（也有s2p用于将sed脚本转换为 Perl）。Perl 确实有一个自动（内置）变量来跟踪读取的行数。它有几个名字。最有趣的是$.；如果您在脚本中，助记符名称$NR可用；use English;也是$INPUT_LINE_NUMBER。所以，使用$num是没有必要的。事实证明，Perl 无论如何都会进行浮点除法，所以这+ 0.0部分是不必要的。这导致了下一个版本：

perl -MEnglish -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $NR; }'

或者：

perl -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $.; }'

您可以调整打印格式以适应您的奇思妙想。这基本上是我长期使用的脚本；这是相当清楚的，没有以任何方式冗长。如果需要，脚本可以分成多行。这是一个足够简单的任务，单行的易读性不是问题，IMNSHO。这样做的美妙之处在于，您不必自己处理split数组和读取循环；Perl 为您完成了大部分工作。（当然，它确实在空输入时爆炸；该修复是微不足道的；见下文。）

推荐版本

perl -n -a -e '$total += $F[3]; END { printf "%12.2f\n", $total / $. if $.; }'

测试读取的if $.行数是否为零；如果为零，则省略和除法，因此在没有输入时脚本不输出任何内容printf。$.

在 Stack Overflow 的早期，有一种名为“Code Golf”的高贵（或卑鄙）游戏，但 Code Golf 问题不再被认为是好问题。Code Golf 的目标是编写一个以尽可能少的字符完成特定任务的程序。如果您不太担心输出的格式并且您至少使用 Perl 5.10，您可以使用它来玩 Code Golf 并进一步压缩它：

perl -Mv5.10 -n -a -e '$total += $F[3]; END { say $total / $. if $.; }'

而且，很明显，里面有很多不必要的空格和字母：

perl -Mv5.10 -nae '$t+=$F[3];END{say$t/$.if$.}'

然而，这并不像推荐的版本那样清晰。

score 2 · Accepted Answer

#!/usr/bin/perl

use warnings;
use strict;

open my $file, "<", "files.txt";
my ($total, $cnt);
while(<$file>){
        $total += (split(/\s+/, $_))[3];
        $cnt++;
}
close $file;
print  "number of files: $cnt\n";
print  "total size: $total\n";
printf "avg: %.2f\n", $total/$cnt;

或者您可以使用awk：

awk '{t+=$4} END{print t/NR}' files.txt

score 1 · Accepted Answer

尝试这样做：

#!/usr/bin/perl -l

use strict; use warnings;

open my $file, '<', "my_file" or die "open error [$!]";

my ($total, $count);

while (<$file>){
    chomp;
    next if /^$/;
    my ($date, $time, $x, $numbers, $type) = split;
    $total += $numbers;
    $count++;
}

print "the average is " . $total/$count . " and the total is $total";

close $file;

score 0 · Accepted Answer

此解决方案打开文件并循环遍历文件的每一行。然后，它通过拆分 1 个或多个空格将文件拆分为行中的五个变量。

打开文件进行读取，"<"如果失败，则引发错误or die "..."
my ($total, $cnt)是我们的列总数和添加的文件数
while(<FILE>) { ... }使用文件句柄循环遍历文件的每一行并将该行存储在$_
chomp删除 . 中的输入记录分隔符$_。在 unix 中，默认分隔符是换行符\n
split(/\s+/, $_)$_用分隔符分割由 , 表示的当前行\s+。\s代表一个空格，+后面的意思是“1个或多个”。因此，我们将下一行拆分为 1 个或多个空格。

接下来我们更新$total和$cnt

#!/usr/bin/perl

open FILE, "<", "files.txt" or die "Error opening file: $!";
my ($total, $cnt);

while(<FILE>){
  chomp;
  my ($date, $time, $am_pm, $numbers, $type) = split(/\s+/, $_); 
  $total += $numbers;
  $cnt++; 
}
close FILE;

print"the total is $total and count of $cnt\n";`

score 0 · Accepted Answer

就这么简单：

perl -F -lane '$a+=$F[3];END{print "The average size is ".$a/$.}' your_file

测试如下：

> cat temp
12/02/2002  12:16 AM              86016 a2p.exe
10/10/2004  11:33 AM               393 avgfsznew.pl
11/01/2003  04:42 PM             38124 c2ph.bat

现在执行：

> perl -F -lane '$a+=$F[3];END{print "The average size is ".$a/$.}' temp
The average size is 41511
>

解释： -F -a 表示以数组格式存储行。默认分隔符为空格或制表符。所以 nopw $F[3] 有你文件的大小。总结第 4 列中的所有大小，直到处理完所有行。END 将在处理完文件中的所有行后执行。

所以$。最后将给出行数。所以$a/$。将给出平均值。

perl - Summing a column of numbers in a text file using Perl

6 回答 6

推荐版本

Related

Reference