perl - 使用哈希 perl 存储每个单词的行号和出现次数

Question

我正在逐字读取文件（其中文件包含单词行）并将每个单词存储到哈希中。我想存储出现次数以及在哪一行找到单词（注意：我将根据单词本身对哈希进行排序，如代码所示）

我有（不工作）（假设单词数组正确存储了单词，没有特殊字符，并且是小写的）：

my %wordlist;
my $line = 0;

foreach my $word (@words) {
  $line++;

  if (exists $wordlist{$word}) {
      $wordlist{$word} += 1;
      $wordlist{$line} = $wordlist{$line} . ", $line";
  }
  else {
      $wordlist{$word} = 1;
      $wordlist{$line} = "$line";
  }  
}

后来我尝试将 $wordlist{$line} 打印为字符串，在包含以下内容的循环内：

printf "%${length}s: %4d times, on lines %s\n", $key, $wordlist{$key}, $wordlist{$line};

运行时，我收到错误：

Use of uninitialized value in printf at ./wc.pl line 105, <FILE> line 20.
someWord:    2 time(s), line(s)

其中第 20 行是退出语句

score 0 · Accepted Answer

你可以试试下面的例子，它应该给你一个很好的开始和修改的基础。

use strict;
use warnings;

my @words = <>;
my %wordlist;
my $line = 0;

foreach my $word (@words) {
        chomp($word);
        push (@{$wordlist{$word}}, ++$line);
}

foreach my $word (keys %wordlist){
        my $count = @{$wordlist{$word}};
        my $lines = join (', ',@{$wordlist{$word}});
        printf ("%-10s: %4d times, on lines %s\n", $word, $count, $lines);
}

如果尚未定义，此示例使用 perls autovivification 动态创建数据结构。本质上，它读取的每个单词都会将行号推送到哈希中该单词键的数组中。如果该词从未见过，那么 autovivifaction 将在散列中创建键，并在散列值中类似地创建数组。

然后对于输出我们可以得到这个词，因为它是键，我们可以通过计算哈希值数组中存在的行号的数量来得到它被看到的次数，我们可以制作一个字符串的行使用连接的数字。

然后我们可以用 printf 打印出这些值。所以一个单词列表

cat
house
stair
chari
stair
mouse
stool
cat
hat

将产生一个输出

mouse     :    1 times, on lines 6
cat       :    2 times, on lines 1, 8
hat       :    1 times, on lines 9
stool     :    1 times, on lines 7
chari     :    1 times, on lines 4
stair     :    2 times, on lines 3, 5
house     :    1 times, on lines 2

score 0 · Accepted Answer

$wordlist{$line}   # Line data for each line

应该

$wordline{$word}   # Line data for each word

在输出之前格式化输出通常是一种不好的做法。这里也不例外。

if (exists $wordlist{$word}) {
    ++$wordlist{$word};
    push @{ $wordline{$word} }, $line;
}
else {
    ++$wordlist{$word};
    push @{ $wordline{$word} }, $line;
}

这当然简化为

++$wordlist{$word};
push @{ $wordline{$word} }, $line;

在中printf，你会使用

join(', ', @{ $wordline{$word} })

但这$wordlist{$word}只是中的元素数量@{ $wordline{$word} }，因此完全不需要。只需使用

0+@{ $wordline{$word} }

代替

$wordlist{$word}

所以你最终得到

use strict;
use warnings;

use List::Util qw( max );

my %wordlines;
while (<>) {
   chomp;
   push @{ $wordlines{$_} }, $.;
}

my $max_len_p1 = 1 + max map length, keys %wordlines;
my $max_count_len = max map length(0+@$_), values %wordlines;
my $format = "%-${max_len_p1}s %${max_count_len}d times, on lines %s\n";

for my $word (
   sort { @{ $wordlines{$b} } <=> @{ $wordlines{$a} } || $a cmp $b }
      keys %wordlines
) {
   printf($format,
      "$word:",
      0+@{ $wordlines{$word} },
      join(', ', @{ $wordlines{$word} }),
   );
}

输入：

cat
house
stair
chari
stair
mouse
stool
cat
hat

输出：

cat:   2 times, on lines 1, 8
stair: 2 times, on lines 3, 5
chari: 1 times, on lines 4
hat:   1 times, on lines 9
house: 1 times, on lines 2
mouse: 1 times, on lines 6
stool: 1 times, on lines 7

perl - 使用哈希 perl 存储每个单词的行号和出现次数

2 回答 2

Related

Reference