perl - Perl中文件的字数和行数以及字节大小

Question

我正在尝试编写一个Perl程序，该程序从命令行获取文件，计算行数（正在运行），以字节为单位显示文件的大小（仅当我将 print 语句放入while循环时才有效，但是我认为会打印行的大小，因为每个循环都不同），单词的总数，并从命令行计算特定搜索词的出现次数。

我怎样才能让它按预期工作？

#!/usr/bin/perl

use strict;
use warnings;

my $linecount = 0;
my $wordcount = 0;
my $filesize = 0;
my $search = <>;

while (defined(my $file = <>)) {
    chomp($file);

    my $filesize = length $file;

    if (m/$search/){
        my $wordcount++;
    }

    $linecount = 1;
    $linecount++;
}

print "Size of file in bytes: $filesize\n";
print "Number of line(s): $linecount\n";
print "Number of occurences of $search: $wordcount\n";

score 1 · Accepted Answer

我正在使用 ./perl1.pl testfile.txt 你好。你好是我的搜索词

Reading from<>告诉 perl 从命令行上声明的所有文件中读取行（或者来自的行STDIN）。

由于 cmd 行中的参数存储在中@ARGV，并且hello在您的示例中不是文件，因此应将其删除并存储在其他位置（my $search = pop @ARGV;）

use strict;
use warnings;

my $linecount = 0;
my $wordcount = 0;
my $search = pop @ARGV;
my ($file) = @ARGV;
my $filesize = -s $file;

while (my $line = <>) {
    chomp($line);

    $wordcount++ while $line =~ /$search/g;

    $linecount++;
}

print "Size of file in bytes: $filesize\n";
print "Number of line(s): $linecount\n";
print "Number of occurrences of $search: $wordcount\n";

score 1 · Accepted Answer

Quick code review:

use strict;
use warnings;

A very good choice. These pragmas provide information about your code, and help you avoid mistakes.

my $search = <>;

Here you take the first line of the input as the search string. This is probably not what you want. If you are searching through a file, I am guessing the file does not contain the search word in the first line. What you probably are trying to do is access the command line arguments, which are found in @ARGV.

my $search = shift;

This is the idiomatic way to access @ARGV. It is short for shift @ARGV, which takes the first argument off @ARGV.

while (defined(my $file = <>)) {
    chomp($file);
    my $filesize = length $file;

I get the impression that you think that $file is actually the file name. You said you tried -s $file, which would have worked, if $file had contained the file name. However, the while loop reads from the input file handle <> and assigns the lines of the file to $file. If you want to access the file name, you probably want $ARGV. And you only want to do this once, after the while loop:

my $filesize = -s $ARGV;

Keep in mind that if you use more than one file, $ARGV will change as it refers to the name of the file currently being read with <>. (Technically <ARGV>)

The while loop itself should probably use a different variable name:

while (my $line = <>)

Note that you do not technically need to use defined here.

Also, length returns the number of characters in a string. If you use it on a file name, it returns the number of characters in the file name. It has nothing to do with file size.

if (m/$search/){
    my $wordcount++;
}

This pattern match applies to the default variable $_. What you want is $file =~ m/..../. Also, do you want meta characters to be active in the regex? You might want, for example, to allow plural s with /apples?/. Meta characters can change the meaning of your pattern match, however, so if you just want to match literal strings, use the \Q ... \E escape to disable meta characters.

Another thing, you use my here to declare a new variable (which shadows the previously declared variable). This variable only has the scope of the surrounding if block, so it is quite pointless. Remove my.

Yet another thing is that this match only matches once per line, so you miss out on multiple matches. What you want is probably this:

$wordcount += () = $line =~ /\Q$search\E/g;

Note the use of the global /g modifier which makes the regex match as many times as possible (and not just once). Using () in scalar context returns the number of matches from the pattern match.

$linecount = 1;
$linecount++;

This sets the count to 2. No matter how many lines are in your file, this will never be more than 2. You want to remove the assignment.

score 0 · Accepted Answer

你知道模式匹配在 Perl 中是如何工作的吗？这是我要做的：

foreach $match ($line =~ /\w+/gi)
{
  chomp($match);

  if($match eq $search)
  {
     $wordcount++;
  }
}

我用“$line”替换了“$file”，因为它有点混乱。我想在你的情况下，chomp 并不是真的必要，因为你之前已经做过了。模式 [\w]+ 将从该行中搜索 1 个或多个“单词”字符的序列，并将结果匹配存储在 $match 中。=~ 运算符的工作原理是它会不断地在 $line 变量中移动，将找到的单词存储到 $match 中。匹配上的“g”标志是全局的，这意味着它将搜索整行。以下“i”用于不区分大小写的搜索（如果需要，您可以摆脱它）。然后，如果匹配与我们的搜索变量相同，我们增加我们的字数。

perl - Perl中文件的字数和行数以及字节大小

3 回答 3

Related

Reference