perl - 计算文件中字符串的出现次数

Question

我编写了一个 perl 脚本来计算文件中字符的出现次数。到目前为止，这就是我所得到的，

#!/usr/bin/perl -w

use warnings;
no warnings ('uninitialized', 'substr');

my $lines_ref;
my @lines;
my $count;

sub countModule()
{
my $file = "/test";
open my $fh, "<",$file or die "could not open $file: $!";
my @contents = $fh;
my @filtered = grep (/\// ,@contents);
return \@filtered;
}

@lines = countModule();
#@lines = $lines_ref;
$count = @lines;
print "###########\n $count \n###########\n";

我的测试文件如下所示：

10.0.0.1/24
192.168.10.0/24
172.16.30.1/24

我基本上是在尝试计算“/”的实例数

这是我得到的输出：

###########
 1
###########

我得到的是 1 而不是 3，这是出现的次数。仍在学习 perl，所以任何帮助将不胜感激..谢谢！

score 2 · Accepted Answer

以下是关于您的代码的几点

您应该始终 use strict处于程序的顶部，并且仅no warnings在有限范围内出于特殊原因使用。没有一般的理由说明一个工作的 Perl 程序需要全局禁用警告
声明你的变量接近它们的第一个使用点。在文件顶部声明所有内容的风格是不必要的，并且是 C 的遗产
永远不要在代码中使用原型。它们可用于非常特殊的目的，不应用于绝大多数 Perl 代码。sub countModule() { ... }坚持countModule可能永远不会使用任何参数调用，并且没有必要或有用。定义应该只是sub countModule { ... }
大功告成！用于使用词法文件句柄，open 的三参数形式，并放入$!你的 die 字符串
my @contents = $fh将设置@contents为仅包含文件句柄的单元素列表。要将整个文件读入您需要的数组my @contents = <$fh>
如果您使用不同的分隔符，您可以避免在正则表达式中转义斜杠。为此，您需要m明确使用运算符，例如my @filtered = grep m|/|, @contents)
您返回一个数组引用，但将返回的值分配给一个数组，因此@lines = countModule()设置@lines为一个仅包含数组引用的单元素列表。您应该返回一个列表return @filtered或取消引用赋值时的返回值@lines = @{ countModule }

如果您需要做的就是打印文件中包含斜杠字符的行数，那么您可以编写类似这样的内容

use strict;
use warnings;

my $count;

sub countModule {
  open my $fh, '<', '/test' or die "Could not open $file: $!";
  return [ grep m|/|, <$fh> ];
}

my $lines = countModule;
$count = @$lines;
print "###########\n $count \n###########\n";

score 1 · Accepted Answer

关闭，但有几个问题：

use strict;
use warnings;

sub countModule
{
    my $file = "/test";
    open my $fh, "<",$file or die "could not open $file: $!";
    my @contents = <$fh>;  # The <> brackets are used to read from $fh.
    my @filtered = grep (/\// ,@contents);
    return @filtered;      # Remove the reference.
}

my @lines = countModule();
my $count = scalar @lines;  # 'scalar' is not required, but lends clarity.
print "###########\n $count \n###########\n";

我对您的代码所做的每一项更改都带有#comment 注释，解释了所做的事情。

现在在列表上下文中，您的子例程将返回过滤后的行。在标量上下文中，它将返回过滤了多少行的计数。

您确实还提到了查找字符的出现（尽管脚本中的所有内容都是面向行的）。也许你的柜台潜艇看起来像这样：

sub file_tallies{
    my $file = '/test';
    open my $fh, '<', $file or die $!;
    my $count;
    my $lines;
    while( <$fh> ) {
        $lines++;
        $count += $_ =~ tr[\/][\/];
    }
    return ( $lines, $count );
}

my( $line_count, $slash_count ) = file_tallies();

score 0 · Accepted Answer

在列表上下文中，

return \@filtered;

返回一个包含一个元素的列表 - 对命名数组的引用@filtered。也许您想返回列表本身

return @filtered;

score -1 · Accepted Answer

这是一些更简单的代码：

sub countMatches {
    my ($file, $c) = @_;   # Pass parameters
    local $/;
    undef $/; # Slurp input
    open my $fh, "<",$file or die "could not open $file: $!";
    my $s = <$fh>;  # The <> brackets are used to read from $fh.
    close $fh;
    my $ptn = quotemeta($c);  # So we can match strings like ".*" verbatim
    my @hits = $s =~ m/($ptn)/g;
    0 + @hits
}

print countMatches ("/test", '/') . "\n";

该代码将 Perl 推到了非常基础的范围之外，但并不过分。要点：

通过取消 $/ 您可以将输入读入一个字符串。如果您正在计算文件中字符串的出现次数，而不是包含该字符串的行的出现次数，这通常更容易做到。
m/(...)/g 会找到所有的命中，但如果你想计算像“。”这样的字符串。您需要引用其中的元字符。
将结果存储在数组中以在列表上下文中评估 m//
将 0 添加到列表中会给出其中的项目数。

perl - 计算文件中字符串的出现次数

4 回答 4

Related

Reference