比较两个文件的问题是老问题。在 40 年前打孔卡的时代,我们已经不得不解决它来打印每天销售的物品的账单。一个文件是客户文件(主要文件),第二个是从交货单上打孔的卡片组(次要文件)。此辅助文件中的每条记录(卡)都包含客户编号和项目编号。两个文件都按客户编号排序,算法称为匹配。它包括从每个文件中读取一条记录,比较公共密钥,并选择三种可能的情况之一:
- primary key < secondary key : 跳过这个客户(正常,客户档案中的客户比今天的销售额多)
读取下一条主记录
- 主键 = 辅助键:打印账单
读取下一个客户记录
从辅助文件读取并打印项目,直到客户编号更改
- 主键 > 辅助键:辅助文件或新客户中的错字,尚未添加到客户文件中
打印错误消息(不是有效客户)
读取下一条辅助记录
只要有要读取的记录,即只要两个文件都不在 EOF(文件末尾),读取循环就会继续。我用 Ruby 编写的一个更大的匹配模块的核心部分是:
def matching(p_actionSmaller, p_actionEqual, p_actionGreater)
read_primary
read_secondary
while ! @eof_primary || ! @eof_secondary
case
when @primary_key < @secondary_key
p_actionSmaller.call(self)
read_primary
when @primary_key == @secondary_key
p_actionEqual.call(self)
read_primary
read_secondary
when @primary_key > @secondary_key
p_actionGreater.call(self)
read_secondary
end
end
end
这是适用于您的阵列问题的简化版本:
# input "files" :
x = [ [2,'a2','b20'], [3, 'a3', 'b3'], [4,'a4','b4'] ]
y = [[1,'a1','b1'], [2,'a2','b2' ], [3, 'a30', 'b3'], [5, 'a5', 'b5']]
puts '--- input --- :'
print 'x='; p x
print 'y='; p y
xh = Hash.new
yh = Hash.new
# converted to hash for easy extraction of data :
x.each do |a|
key, *value = a
xh[key] = value
end
y.each do |a|
key, *value = a
yh[key] = value
end
puts '--- as hash --- :'
print 'xh='; p xh
print 'yh='; p yh
# sort keys for matching both "files" on the same key :
@xkeys = xh.keys.sort
@ykeys = yh.keys.sort
print '@xkeys='; p @xkeys
print '@ykeys='; p @ykeys
# simplified algorithm, where EOF is replaced by HIGH_VALUE :
@x_index = -1
@y_index = -1
HIGH_VALUE = 255
def read_primary
@x_index += 1 # read next record
# The primary key is extracted from the record.
# At EOF it is replaced by HIGH_VALUE, usually x'FFFFFF'
@primary_key = @xkeys[@x_index] || HIGH_VALUE
# @xkeys[@x_index] returns nil if key does not exist, nil || H returns H
end
def read_secondary
@y_index += 1
@secondary_key = @ykeys[@y_index] || HIGH_VALUE
end
puts '--- matching --- :'
read_primary
read_secondary
while @x_index < @xkeys.length || @y_index < @ykeys.length
case
when @primary_key < @secondary_key
puts "case < : #{@primary_key} < #{@secondary_key}"
puts "x #{xh[@primary_key].inspect} has no equivalent in y"
read_primary
when @primary_key == @secondary_key
puts "case = : #{@primary_key} = #{@secondary_key}"
puts "compare #{xh[@primary_key].inspect} with #{yh[@primary_key].inspect}"
read_primary
read_secondary
when @primary_key > @secondary_key
puts "case > : #{@primary_key} > #{@secondary_key}"
puts "y #{yh[@secondary_key].inspect} has no equivalent in x"
read_secondary
end
end
执行 :
$ ruby -w t.rb
--- input --- :
x=[[2, "a2", "b20"], [3, "a3", "b3"], [4, "a4", "b4"]]
y=[[1, "a1", "b1"], [2, "a2", "b2"], [3, "a30", "b3"], [5, "a5", "b5"]]
--- as hash --- :
xh={2=>["a2", "b20"], 3=>["a3", "b3"], 4=>["a4", "b4"]}
yh={5=>["a5", "b5"], 1=>["a1", "b1"], 2=>["a2", "b2"], 3=>["a30", "b3"]}
@xkeys=[2, 3, 4]
@ykeys=[1, 2, 3, 5]
--- matching --- :
case > : 2 > 1
y ["a1", "b1"] has no equivalent in x
case = : 2 = 2
compare ["a2", "b20"] with ["a2", "b2"]
case = : 3 = 3
compare ["a3", "b3"] with ["a30", "b3"]
case < : 4 < 5
x ["a4", "b4"] has no equivalent in y
case > : 255 > 5
y ["a5", "b5"] has no equivalent in x
我将差异的介绍留给您。
高温高压