另一个 Perl 尝试:
#!/usr/bin/perl -w
use strict;
use File::Slurp;
use Tie::File;
# Usage:
#
# $ perl WordCount.pl <Files>
#
# Example:
#
# $ perl WordCount.pl *.text
#
# Counts words in all files given as arguments.
# The words are taken from the file "WordList".
# The output is appended to the file "WordCount.out" in the format implied in the
# following example:
#
# File,Word1,Word2,Word3,...
# File1,0,5,3,...
# File2,6,3,4,...
# .
# .
# .
#
### Configuration
my $CaseSensitive = 1; # 0 or 1
my $OutputSeparator = ","; # another option might be "\t" (TAB)
my $RemoveHyphenation = 0; # 0 or 1. Careful, may be too greedy.
###
my @WordList = read_file("WordList");
chomp @WordList;
tie (my @Output, 'Tie::File', "WordCount.out");
push (@Output, join ($OutputSeparator, "File", @WordList));
for my $InFile (@ARGV)
{ my $Text = read_file($InFile);
if ($RemoveHyphenation) { $Text =~ s/-\n//g; };
my %Count;
for my $Word (@WordList)
{ if ($CaseSensitive)
{ $Count{$Word} = ($Text =~ s/(\b$Word\b)/$1/g); }
else
{ $Count{$Word} = ($Text =~ s/(\b$Word\b)/$1/gi); }; };
my $OutputLine = "$InFile";
for my $Word (@WordList)
{ if ($Count{$Word})
{ $OutputLine .= $OutputSeparator . $Count{$Word}; }
else
{ $OutputLine .= $OutputSeparator . "0"; }; };
push (@Output, $OutputLine); };
untie @Output;
当我将您的问题放入文件wc-test
并将 Robert Gamble 的答案放入wc-ans-test
时,输出文件如下所示:
File,linux,frequencies,science,words
wc-ans-test,2,2,2,12
wc-test,1,3,1,3
这是一个逗号分隔值 (csv) 文件(但您可以在脚本中更改分隔符)。它应该对任何电子表格应用程序都是可读的。对于绘图,我建议使用gnuplot
完全可编写脚本的 ,因此您可以独立于输入数据调整输出。