1

我已经阅读了一个文件并将它们拆分为一个单词数组:

file1 = File.open("spam1.txt","rb")
file1_contents = file1.read
file1 = file1_contents.split(' ')

我可以使用哈希计算单词的频率,并根据单词的频率对它们进行排序:

freqs1 = Hash.new(0)
file1.each { |word| freqs1[word] +=1}
freqs1 = freqs1.sort_by {|x,y| y}
freqs1.reverse!

也可以像这样向用户输出结果:

freqs.each{|word, freq| puts word + ' ' + freq.to_s}

file1如果数组或哈希多次freqs1包含某些单词,我想向用户显示一条消息。

我有一个(坏的)想法来遍历freqs1哈希并向用户显示适当的消息:

freqs1.each{|word,freq|
    if ((word == ('business' || 'fund' || 'funds' || 'account' ||'transfer' || 'money')) && freq > 2)  || (word == 'Iraq' && freq > 1 )  then
      puts "File 1 is suspected as spam mail - suspicious word frequency"
    else
      puts "File 1 does not appear to be spam email"
    end
}

然而,这对我来说很愚蠢,因为这对hash.

business, fund, funds, account如果诸如etc 之类的词出现两次以上,如何向用户显示特定消息?

谢谢你的帮助。

4

2 回答 2

1

如果你只是想改进最后的陈述,试试这个(未经测试,但应该去):

bad_words = %w{business fund funds account transfer money}
is_spam = freqs1.any? do |word, freq| 
  (freq > 2 && bad_words.include?(word)) || (word == 'Iraq' && freq > 1)
end

if is_spam
  puts "File 1 is suspected as spam mail - suspicious word frequency"
else
  puts "File 1 does not appear to be spam email"
end

Enumerable#any?将为您完成大部分工作,同时提取坏词列表有助于提高可读性。

于 2013-10-30T21:20:05.350 回答
1

我会做这样的事情:

word_filter = [
 {count: 2, words: ['business','fund','funds','account','transfer','money']},
 {count: 1, words: ['iraq']}
]

alert        = "File 1 is suspected as spam mail - suspicious word frequency"
calm_message = "File 1 does not appear to be spam email"

grouped_words = file1.group_by{|x|x}.map{|x,array|[x,array.size]}

appears_to_be_spam = grouped_words.any?{ |word,count|
  word_filter.any? do |filter|
    filter[:words].include?(word.downcase) &&  count > filter[:count]
  end
}

puts appears_to_be_spam ? alert : calm_message
于 2013-10-30T21:24:15.723 回答