perl - Perl - 计算每行文件的特定单词的出现次数

Question

做了很多搜索，没有我想要的。Perl 菜鸟在这里。

我有一个文本文件已经整齐地组织成数据行。假设我感兴趣的两个字符串是“hello”和“goodbye”。我想编写一个快速的 Perl 脚本，它会查看第一行并计算“hello”和“goodbye”出现的次数。然后它将转到下一行并进行计数，并添加到先前的计数中。因此，在脚本结束时，我可以打印文件中每个字符串的总计数。逐行方法很重要的原因是因为我想使用几个计数，所以我可以打印两个单词在同一行的次数，一行只包含一个单词而不是其他，一行包含“hello”一次但“goodbye”多次等的次数。真的

到目前为止，我在想：

#!/usr/bin/perl
use strict; use warnings;

die etc (saving time by not including it here)

my $word_a = "hello";
my $word_b = "goodbye";
my $single_both = 0; # Number of lines where both words appear only once.
my $unique_hello = 0; # Number of lines where only hello appears, goodbye doesn't.
my $unique_goodbye = 0; # Number of lines where goodbye appears, hello doesn't.
my $one_hello_multiple_goodbye = 0; # Number of lines where hello appears once and goodbye appears multiple times.
my $one_goodbye_multiple_hello = 0; # Number of lines where goodbye appears once and hello appears multiple times.
my $multiple_both = 0; = # Number of lines where goodbye and hello appear multiple times.

while (my $line = <>) {

Magic happens here

};

# then the results for each of those variables can be printed at the end.

正如我所说，我是一个菜鸟。我对如何计算每行中的出现次数感到困惑。即使我知道我确信我可以找出上面列出的所有不同条件。我应该使用数组吗？哈希？或者考虑到我想要什么，我是否在完全错误的方向上处理了这个问题。我需要计算具有不同条件的行数，这些条件我在这些变量之后列为注释。非常感谢任何帮助！

score 6 · Accepted Answer

您可以通过正则表达式计算某些单词的出现次数，例如在它的工作原理$hello = () = $line =~ /hello/g;中计算hello出现次数？$line

perl -n -E '$hello = () = /hello/g; $goodbye = () = /goodbye/g; say "line $.: hello - $hello, goodbye - $goodbye"; $hello_total += $hello; $goodbye_total += $goodbye;}{say "total: hello - $hello_total, goodbye - $goodbye_total";' input.txt

一些文件的输出：

line 1: hello - 0, goodbye - 0
line 2: hello - 1, goodbye - 0
line 3: hello - 1, goodbye - 1
line 4: hello - 3, goodbye - 0
line 5: hello - 0, goodbye - 0
line 6: hello - 1, goodbye - 1
line 7: hello - 0, goodbye - 0
total: hello - 6, goodbye - 2

score 0 · Accepted Answer

Perl 有一个绑定操作符=~来测试一个字符串是否匹配一个模式。您可以将此与两个 if 语句结合使用，以从所有行中提取计数：

# only gathers counts
while (my $line = <STDIN>) {
   $hello_cnt++  if $line =~ /hello/;
   $goobye_cnt++ if $line =~ /goodbye/;
}

但似乎您想逐行推理您的输入，并且您可以维护所有这些变量：$unique_hello,$unique_goodbye等...但这对我来说似乎需要做很多额外的工作，您可以做的是哈希到总数计数：

my %seen;
while (my $line = <STDIN>) {
   chomp $line;                   # remove trailing \n

   map {
      $seen{lc $_}++;
   } split /\s+/, $line;          # split on whitespace
}

现在你有了这个结构的哈希：

{ 
  word1 => cnt1,
  word2 => cnt2,
  etc ...
}

现在您可以打印总数：

print "Hello seen " . $seen{hello} . " times";
# etc ...

我为您完成了逐行分析，希望这是一个很好的起点。

perl - Perl - 计算每行文件的特定单词的出现次数

2 回答 2

Related

Reference