我安装了Wordnet::Similarity和Wordnet::QueryData作为计算这些模块附带的信息内容分数和概率的简单方法。但我被困在这个基本问题上:给定一个单词,打印 n 个与之相似的单词——迭代同义词集并做join
.
使用wn
命令并将其与大量的管道一起使用tr
,sort | uniq
我可以获得所有单词:
wn cat -synsn | grep -v Sense | tr '=' ' ' | tr '>' ' ' | tr '\t' ' ' | tr ',' '\n' | sort | uniq
输出
8 senses of cat
adult female
adult male
African tea
Arabian tea
big cat
bozo
cat
cat
CAT
Caterpillar
cat-o'-nine-tails
computed axial tomography
computed tomography
computerized axial tomography
computerized tomography
CT
excitant
felid
feline
gossip
gossiper
gossipmonger
guy
hombre
kat
khat
man
newsmonger
qat
quat
rumormonger
rumourmonger
stimulant
stimulant drug
Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun cat
tracked vehicle
true cat
whip
woman
X-radiation
X-raying
但它有点讨厌,需要进一步清理。
我的脚本如下所示,我想要得到的是 cat#n1...8 中的所有单词。
脚本
use WordNet::QueryData;
my $wn = WordNet::QueryData->new( noload => 1);
print "Senses: ", join(", ", $wn->querySense("cat#n")), "\n";
print "Synset: ", join(", ", $wn->querySense("cat", "syns")), "\n";
print "Hyponyms: ", join(", ", $wn->querySense("cat#n#1", "hypo")), "\n";
输出:
Senses: cat#n#1, cat#n#2, cat#n#3, cat#n#4, cat#n#5, cat#n#6, cat#n#7, cat#n#8
Synset: cat#n, cat#v
Hyponyms: domestic_cat#n#1, wildcat#n#3
脚本
use WordNet::QueryData;
my $wn = WordNet::QueryData->new;
foreach $word (qw/cat#n/) {
@senses = $wn->querySense($word);
foreach $wps (@senses) {
@gloss = $wn -> querySense($wps, "syns");
print "$wps : @gloss\n";
}
}
输出:
cat#n#1 : cat#n#1 true_cat#n#1
cat#n#2 : guy#n#1 cat#n#2 hombre#n#1 bozo#n#2
cat#n#3 : cat#n#3
cat#n#4 : kat#n#1 khat#n#1 qat#n#1 quat#n#1 cat#n#4 Arabian_tea#n#1 African_tea#n#1
cat#n#5 : cat-o'-nine-tails#n#1 cat#n#5
cat#n#6 : Caterpillar#n#2 cat#n#6
cat#n#7 : big_cat#n#1 cat#n#7
cat#n#8 : computerized_tomography#n#1 computed_tomography#n#1 CT#n#2 computerized_axial_tomography#n#1 computed_axial_tomography#n#1 CAT#n#8
PS 我以前从未写过 perl,但从早上起就一直在研究 perl 脚本——现在可以理解基本的东西了。只需要知道是否有使用 api 文档的更简洁的方法来执行此操作 - 无法从 api 或用户组档案中找出。
更新:
我想我会解决:
wn cat -synsn | sed '1,6d' |sed 's/Sense [[:digit:]]//g' | sed 's/[[:space:]]*=> //' | sed '/^$/d'
sed 岩石!