0

我正在使用 ruby​​ 尝试解析具有以下形式的文本文件...

AAB eel bbc 
ABA did eye non pap mom ere bob nun eve pip gig dad nan ana gog aha
    mum sis ada ava ewe pop tit gag tat bub pup
    eke ele hah huh pep sos tot wow aba ala
    bib dud tnt 
ABB all see off too ill add lee ass err xii ann fee vii inn egg odd bee dee goo
    woo cnn pee fcc tee wee ebb edd gee ott ree vee ell orr rcc att boo cee cii
    coo kee moo mss soo doo faa hee icc iss itt kii loo mee nee nuu ogg opp pii
    tll upp voo zee

我需要能够按第一列进行搜索,例如“AAB”,然后搜索与该键关联的所有值。我试图将文本文件导入到数组的哈希中,但永远无法获得超过要存储的第一个值。我对如何搜索文件没有偏好,无论是将数据存储到某个数据结构中还是每次都搜索文本文件,我只需要能够做到。我不知道如何进行此操作,任何帮助将不胜感激。谢谢

-amc25114

4

3 回答 3

3

这将读取您的字典文件。我将内容存储在一个字符串中,然后将其转换为 StringIO 对象,让我假装它是一个文件。您可以使用 File.readlines直接从文件本身读取:

require 'pp'
require 'stringio'

text = 'AAB eel bbc 
ABA did eye non pap mom ere bob nun eve pip gig dad nan ana gog aha
    mum sis ada ava ewe pop tit gag tat bub pup
    eke ele hah huh pep sos tot wow aba ala
    bib dud tnt 
ABB all see off too ill add lee ass err xii ann fee vii inn egg odd bee dee goo
    woo cnn pee fcc tee wee ebb edd gee ott ree vee ell orr rcc att boo cee cii
    coo kee moo mss soo doo faa hee icc iss itt kii loo mee nee nuu ogg opp pii
    tll upp voo zee
'

file = StringIO.new(text)

dictionary = Hash[
  file.readlines.slice_before(/^\S/).map{ |ary| 
    key, *values = ary.map(&:strip).join(' ').split(' ')
    [key, values] 
  }
]

dictionary是一个看起来像这样的哈希:

{
  "AAB"=>[
    "eel", "bbc"
  ],
  "ABA"=>[
    "did", "eye", "non", "pap", "mom", "ere", "bob", "nun", "eve", "pip",
    "gig", "dad", "nan", "ana", "gog", "aha", "mum", "sis", "ada", "ava",
    "ewe", "pop", "tit", "gag", "tat", "bub", "pup", "eke", "ele", "hah",
    "huh", "pep", "sos", "tot", "wow", "aba", "ala", "bib", "dud", "tnt"
  ],
  "ABB"=>[
    "all", "see", "off", "too", "ill", "add", "lee", "ass", "err", "xii",
    "ann", "fee", "vii", "inn", "egg", "odd", "bee", "dee", "goo", "woo",
    "cnn", "pee", "fcc", "tee", "wee", "ebb", "edd", "gee", "ott", "ree",
    "vee", "ell", "orr", "rcc", "att", "boo", "cee", "cii", "coo", "kee",
    "moo", "mss", "soo", "doo", "faa", "hee", "icc", "iss", "itt", "kii",
    "loo", "mee", "nee", "nuu", "ogg", "opp", "pii", "tll", "upp", "voo", "zee"
  ]
}

您可以使用以下键查找:

字典['AAB']
=> [“鳗鱼”,“英国广播公司”]

并使用以下方法在数组内搜索include?

字典['AAB'].include?('eel')
=> 真
字典['AAB'].include?('foo')
=> 假的
于 2013-02-20T03:41:33.740 回答
0
class A

  def initialize
    @h, key = readlines.inject({}) do |m, s|
      a = s.split
      m[key = a.shift] = [] if s =~ /^[^\s]/
      m[key] += a
      m
    end
  end

  def lookup k, v # not sure what you really want to do here
    p [k, v, (@h[k].index v)]
  end

  self
end.new.lookup 'ABA', 'wow'
于 2013-02-19T23:34:58.017 回答
0

我的 2 美分:

file = File.open("/path_to_file_here")
recent_key = ""
results = Hash.new
while (line = file.gets)
  key = line[/[A-Z]+/]
  recent_key = key if key
  line.scan(/[a-z]+/).each do |val|
    results[recent_key.to_sym] = [] if !results[recent_key.to_sym]
    results[recent_key.to_sym] << val
  end
end
puts results 

这会给你这个输出:

{

:AAB=>["鳗鱼", "bbc"],

:ABA=>[“did”、“eye”、“non”、“pap”、“mom”、“ere”、“bob”、“nun”、“eve”、“pip”、“gig”、“ dad”、“nan”、“ana”、“gog”、“aha”、“mum”、“sis”、“ada”、“ava”、“ewe”、“pop”、“tit”、“gag” ,“tat”,“bub”,“pup”,“eke”,“ele”,“hah”,“huh”,“pep”,“sos”,“tot”,“wow”,“aba”,“ ala”、“围兜”、“哑弹”、“tnt”]、

 :ABB=>["all", "see", "off", "too", "ill", "add", "lee", "ass", "err", "xii", "ann", " fee”、“vii”、“inn”、“egg”、“odd”、“bee”、“dee”、“goo”、“woo”、“cnn”、“pee”、“fcc”、“tee” ,“wee”,“落潮”,“edd”,“gee”,“ott”,“ree”,“vee”,“ell”,“orr”,“rcc”,“att”,“boo”,“ cee”、“cii”、“coo”、“kee”、“moo”、“mss”、“soo”、“doo”、“faa”、“hee”、“icc”、“iss”、“itt” , “纪”、“loo”、“mee”、“nee”、“nuu”、“ogg”、“opp”、“pii”、“tll”、“upp”、“voo”、“zee”]

}
于 2013-02-19T23:54:58.923 回答