我有一种方法可以搜索文本文件中的行并将它们存储在基于单词列表的哈希中。
该方法做了两个简单的事情:
如果匹配,则使用正则表达式将该行存储在“找到”类别中,否则将结果存储在“未找到”类别中。
我的问题涉及“未找到”部分:每一行都未分类。我需要的是未分类的交易只能是不在单词列表中的行。
这是我的单词表:
words_to_check = ['BUILDING','LAKE','TREE']
这是我的文本路径:
path_to_file = "/Users/name/Desktop/path_to_file"
文件内容示例:
07/08/2013,"BUILDING",,100.00
07/08/2013,"LAKE",,50.00
07/08/2013,"TREE",,5.50
07/08/2013,"CAT",,10.50
07/08/2013,"DOG",,-19.87
这是构建哈希的方法:
def build_hash(path_to_file, words_to_check)
trans_info = {
:found => {},
:unfound => {}
}
found = trans_info[:found]
unfound = trans_info[:unfound]
words_to_check.each do |word|
found[word] = []
unfound[:unfound] = []
File.foreach(path_to_file) do |line|
if line.include?(word)
date = /(?<Month>\d{1,2})\D(?<Day>\d{2})\D(?<Year>\d{4})/.match(line).to_s
transaction = /(?<transaction>)#{word}/.match(line).to_s
amount =/-+(?<dollars>\d+)\.(?<cents>\d+)/.match(line).to_s.to_f.round(2)
# found word on list now push to array with hash keys
found[word] << {
date: date,
transaction: transaction,
amount: amount
}
else
date = /(?<Month>\d{1,2})\D(?<Day>\d{2})\D(?<Year>\d{4})/.match(line).to_s
transaction = /(?<Middle>)".*"/.match(line).to_s
amount =/-*(?<dollars>\d+)\.(?<cents>\d+)/.match(line).to_s.to_f.round(2)
# push to unfound part of hash
unfound[:unfound] << {
date: date,
transaction: transaction,
amount: amount
}
end
end
end
#found and unfound key/values will be returned
return trans_info
end
如果你运行它,你会看到 'BUILDING'、'LAKE'、'TREE'、'CAT'、'DOG' 在:unfound
. 只有 'CAT' 和 'DOG' 应该在:unfound
.
这可能看起来像一个简单的else
或有条件的逻辑,但我已经研究并考虑了其他数据结构,但无法弄清楚这一点。非常感谢任何建议或新想法!