2

使用 ruby​​ 在文本文件中查找单词或短语捕获单词跳过一行然后阅读该行直到空白(重复)

这是上一篇文章的一个变体,用正则表达式回答了我想看看它是否可以在没有正则表达式的情况下完成。下面是一个文本示例:

  MATCH ME 1234

3940393  $100.00   FORTY THOUSAND THIEVES
3455     $ 00.10   ONLY 1% OF THE THIEVES

GOBBLEY GOOK: 344959904       3948820   333333333

MATCH ME

3940321  $110.00   FORTY THOUSAND RICHER PEOPLE
3        $ 00.11   ONLY 1% OF THE RICHER PEOPLE

我想要的输出是这样的:

MATCH ME,1234,3940393,$100.00,FORTY THOUSAND THIEVES
MATCH ME,1234,3455,$00.10,ONLY 1% OF THE THIEVES
MATCH ME,,3940393,$110.00,FOURTY THOUSAND RICHER PEOPLE
MATCH ME,,3,$00.11,ONLY 1% OF THE RICHER PEOPLE

我在下面的代码只能让我部分实现。它找到了匹配我,但只返回:

MATCH ME,1234,3940393 ,$100.00,FORTY THOUSAND THIEVES
MATCH ME,1234,3940393 ,$100.00,FORTY THOUSAND THIEVES
MATCH ME,not here,3940321 ,$110.00,FORTY THOUSAND RICHER PEOPLE

我确信我的方法对于嵌套 if 是错误的,但需要替代方法的帮助:

def is_numeric?(object)
  true if Float(object) rescue false
end


def is_match_me_line?(object)
true if object == "MATCH ME" rescue false
end

 def load_file
 raw_records = []
infile = File.open("match_me.txt", "r") 
while line = infile.gets

 possible_match_me = line[0,18]
  match_me_words = line[4,8]


 if is_match_me_line?(match_me_words)

 possible_match_me_number_present = possible_match_me[13,4]   
  if is_numeric?(possible_match_me_number_present)  
   fis_match_me_number = possible_match_me_number_present
   else fis_match_me_number = "not here"  
 end          

line=infile.gets  
line=infile.gets

account = line[0,8] 
amount =  line[9,7] 
description = line[19,40]
record = [match_me_words, fis_match_me_number, account, amount,description]  
raw_records << record
puts raw_records.map {|record| record*','} 

end    
end


end
load_file

正如建议的那样,我正在尝试使用正则表达式解决方案,但我没有从这段代码中得到所需的响应:

File.open("text_2.txt", "r").each_line do |data|

data.scan(/(MATCH ME)(.*?)\n\n((?:(?!\n\n).)*)/m).each do |m, n, lines|
lines.each_line do |line|
puts [m, n, *line.unpack('A9A10A*')].map(&:strip).join(',')
end  
end
end
4

1 回答 1

2

这是我的:

data.scan(/(MATCH ME)(.*?)\n\n((?:(?!\n\n).)*)/m).each do |m, n, lines|
  lines.each_line do |line|
    puts [m, n, *line.unpack('A9A10A*')].map(&:strip).join(',')
  end  
end

那个正则表达式很丑,但仍然比看 30 行要好。(?:(?!\n\n).)* 表示匹配任何没有后跟 2 个换行符的字符。(?:) 是所以它也不会捕获“。”

于 2012-05-15T03:22:27.017 回答