0

我有一种方法可以搜索文本文件中的行并将它们存储在基于单词列表的哈希中。

该方法做了两个简单的事情:

如果匹配,则使用正则表达式将该行存储在“找到”类别中,否则将结果存储在“未找到”类别中。

我的问题涉及“未找到”部分:每一行都未分类。我需要的是未分类的交易只能是不在单词列表中的行。

这是我的单词表:

words_to_check = ['BUILDING','LAKE','TREE']

这是我的文本路径:

path_to_file = "/Users/name/Desktop/path_to_file" 

文件内容示例:

07/08/2013,"BUILDING",,100.00
07/08/2013,"LAKE",,50.00
07/08/2013,"TREE",,5.50
07/08/2013,"CAT",,10.50
07/08/2013,"DOG",,-19.87

这是构建哈希的方法:

def build_hash(path_to_file, words_to_check)
  trans_info = {
    :found => {},
    :unfound => {}
  }

  found = trans_info[:found]
  unfound = trans_info[:unfound]

  words_to_check.each do |word|
    found[word] = []
    unfound[:unfound] = []

      File.foreach(path_to_file) do |line|              
        if line.include?(word)
      date = /(?<Month>\d{1,2})\D(?<Day>\d{2})\D(?<Year>\d{4})/.match(line).to_s
      transaction = /(?<transaction>)#{word}/.match(line).to_s
      amount =/-+(?<dollars>\d+)\.(?<cents>\d+)/.match(line).to_s.to_f.round(2)

          # found word on list now push to array with hash keys
      found[word] << { 
        date: date, 
        transaction: transaction, 
        amount: amount 
      }

        else

      date = /(?<Month>\d{1,2})\D(?<Day>\d{2})\D(?<Year>\d{4})/.match(line).to_s
      transaction = /(?<Middle>)".*"/.match(line).to_s
      amount =/-*(?<dollars>\d+)\.(?<cents>\d+)/.match(line).to_s.to_f.round(2)     

      # push to unfound part of hash
          unfound[:unfound] << { 
        date: date, 
        transaction: transaction, 
        amount: amount
      } 

       end
      end
   end
    #found and unfound key/values will be returned
  return trans_info
 end

如果你运行它,你会看到 'BUILDING'、'LAKE'、'TREE'、'CAT'、'DOG' 在:unfound. 只有 'CAT' 和 'DOG' 应该在:unfound.

这可能看起来像一个简单的else或有条件的逻辑,但我已经研究并考虑了其他数据结构,但无法弄清楚这一点。非常感谢任何建议或新想法!

4

1 回答 1

0

这与您如何设置循环有关。由于您要独立检查每个单词,因此您基本上要求列表中的所有单词都必须排成一行以避免进入该:unfound类别。

举个例子,看看数据文件的第一行。

07/08/2013,"BUILDING",,100.00

在第一次通过words_to_check.each循环时,您将该行与列表中的第一个单词进行比较,即BUILDING. 这显然是一个匹配,所以该行被添加到:found类别中。但是,还有两个词要比较。在第二次循环中,您将同一行与 word 进行比较LAKE,因此匹配失败,并且该行被添加到:unfound类别中。然后同样的事情发生在这个词TREE上。现在程序终于开始检查下一行了。

由于文件循环位于单词循环内,因此您还必须多次读取文件。由于读取文件非常慢,我会颠倒这些循环的顺序。也就是说,我会把循环这个词放在里面。

您可能希望将循环结构更像这样:

File.foreach(path_to_file) do |line|
  line_does_match = false # assume that we start without a match
  words_to_check.each do |word| # check the current line against all words
    if line.include? word
      line_does_match = true # record that we have a match
      break # stop the words_to_check.each loop
    end
  end
  # Now that we've determined whether the line matches ANY of the 
  # words in the list we can deal with it accordingly.
  if line_does_match
    # add it to the :found list
  else
    # add it to the :unfound list
  end
end
于 2013-10-08T22:59:36.367 回答