我正在使用 prawn gem 阅读计算机生成的 60 页 pdf 报告,其中包含数十个人的财务和人口统计数据。我面临的挑战是我希望能够在扫描每一行时捕获名称/特殊 ID(在同一行上)以及与该人相关的后续行。使用 ruby 的字符串扫描方法,我已经能够以这种方式捕获每个匹配返回行的财务:
[<invoice no.>, <service type>, <modifier (if any)>, <service_date>, <units>, <amount>]
我尝试将 ID 与财务数据关联几行,然后在 ID 更改但没有任何效果时更改它。我会以一种倒退的方式来解决这个问题吗?我对正则表达式的经验很少(一般是编程)。
以下是仅适用于财务数据的代码:
PDF::Reader.new(file).pages.each do |page|
page.raw_content.scan(/^\(\s(\d{6})\s+\d\s+(\w\d{4})\s+(0580|TT|1C|1C\s+1F)?\s+(\d+\/\d+\/\d+)\s+\d+\/\d+\/\d+\s+(\d+\.\d+)\s+(\d+\.\d+)/) do |line|
line.collect {|x| x.strip! if !x.nil?}
print "#{line.join(' ')}\n"
Cycle.check_details(line)
end
end
这是puts page.raw_content
产生的样本(这些行中包含很多空白)。
(REG LOC CLIENT SERVICE NAME BIRTH DATE RECIPIENT ID PRIOR AUTHORIZATION #)'
(xx xxx xxxxx xxxxxxx LANNISTER, JAIME xx/xx/xxxx xxxx <special ID>)'
(DIAGNOSIS CODES: 887.0)'
( )'
( INV # LINE # PROCEDURE CODE REVENUE CD FROM DT THRU DT UNITS AMOUNT)'
( <inv num> 1 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( <inv num> 2 <service_code> <modifier> xx/xx/13 xx/xx/13 2.50 41.00)'
( <inv num> 3 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( <inv num> 4 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( <inv num> 5 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( <inv num> 6 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( <inv num> 7 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( CLAIM TOTAL
434.60 CLAIM ACCOUNT REF. xxxxxxxxxxxxxxxSUP)'
(REG LOC CLIENT SERVICE NAME BIRTH DATE RECIPIENT ID PRIOR AUTHORIZATION #)'
(xx xxx xxxxx xxxxxxx LANNISTER, JOFFREY xx/xx/xxxx xxxx <special ID>)'
(DIAGNOSIS CODES: 259.0)'
( )'
( INV # LINE # PROCEDURE CODE REVENUE CD FROM DT THRU DT UNITS AMOUNT)'
( <inv num> 1 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( <inv num> 2 <service_code> <modifier> xx/xx/13 xx/xx/13 2.50 41.00)'
( <inv num> 3 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( <inv num> 4 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( <inv num> 5 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( <inv num> 6 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( <inv num> 7 <service_code> <modifier> xx/xx/13 xx/xx/13 4.00 65.60)'
( CLAIM TOTAL
434.60 CLAIM ACCOUNT REF. xxxxxxxxxxxxxxxSUP)'