3

我正在尝试比较具有非常相似哈希结构(相同且始终存在的键)的两个哈希数组并返回两者之间的增量 - 具体来说,我想捕获以下内容:

  • array1其中不存在的哈希部分array2
  • array2其中不存在的哈希部分array1
  • 出现在两个数据集中的哈希值

这通常可以通过简单地执行以下操作来实现:

deltas_old_new = (array1-array2)
deltas_new_old = (array2-array1)

对我来说问题(这已经变成了 2-3 小时的斗争!)是我需要根据散列中 3 个键的值('id'、'ref'、'name')来识别增量——这 3 个键的值实际上构成了我的数据中的唯一条目——但我必须保留散列的其他键/值对(例如'extra',为简洁起见,未显示许多其他键/值对。

示例数据:

array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
          {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

预期结果(3 个独立的哈希数组):

array1包含数据但不包含数据的对象array2--

[{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
 {'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

array2包含数据但不包含数据的对象array1--

[{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
 {'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
 {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

包含数据的对象array1array2--

[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
 {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'}]

我已经尝试过无数次尝试比较迭代数组和使用Hash#keep_if基于 3 个键以及将两个数据集合并到一个数组中,然后尝试基于重复数据进行重复,array1但我一直空手而归。提前感谢您的时间和帮助!

4

3 回答 3

1

对于这种类型的问题,通常最容易使用索引。

代码

def keepers(array1, array2, keys)
  a1 = make_hash(array1, keys)
  a2 = make_hash(array2, keys)
  common_keys_of_a1_and_a2 = a1.keys & a2.keys
  [keeper_idx(array1, a1, common_keys_of_a1_and_a2),
   keeper_idx(array2, a2, common_keys_of_a1_and_a2)]
end

def make_hash(arr, keys)
  arr.each_with_index.with_object({}) do |(g,i),h|
    (h[g.values_at(*keys)] ||= []) << i
  end
end

def keeper_idx(arr, a, common_keys_of_a1_and_a2)
  arr.size.times.to_a - a.values_at(*common_keys_of_a1_and_a2).flatten
end

例子

array1 =
  [{'id' =>  '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
   {'id' =>  '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
   {'id' =>  '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
   {'id' =>  '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 8'},
   {'id' =>  '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

array2 =
  [{'id' =>  '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
   {'id' =>  '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
   {'id' =>  '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
   {'id' =>  '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
   {'id' =>  '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 12'},
   {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

请注意,这两个数组与问题中给出的数组略有不同。该问题没有指定每个数组是否可以包含两个散列,它们对于指定的键具有相同的值。因此,我为每个数组添加了一个哈希值,以显示该情况已得到处理。

keys = ['id', 'ref', 'name']

idx1, idx2 = keepers(array1, array2, keys)
  #=> [[1, 4], [2, 3, 4, 5]]

idx1( ) 是( idx2) 的元素的索引,这些元素在匹配项被删除后仍然存在。(并且没有修改,但是。)array1array2array1array2

因此,这两个数组映射到

array1.values_at(*idx1)
  #=> [{"id"=> "2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
  #    {"id"=> "7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]

array2.values_at(*idx2)
  #=> [{"id"=> "8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
  #    {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"},
  #    {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 12"},
  #    {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]

被移除的哈希的索引如下所示。

array1.size.times.to_a - idx1
  #=> [0, 2, 3]
array2.size.times.to_a - idx2
  #[0, 1]

解释

步骤如下。

a1 = make_hash(array1, keys)
  #=> {["1", "1001", "CA"]=>[0], ["2", "1002", "NY"]=>[1],
  #    ["3", "1003", "WA"]=>[2, 3], ["7", "1007", "OR"]=>[4]}    
a2 = make_hash(array2, keys)
  #=> {["1", "1001", "CA"]=>[0], ["3", "1003", "WA"]=>[1],
  #    ["8", "1002", "NY"]=>[2], ["5", "1005", "MT"]=>[3, 4],
  #    ["12", "1012", "TX"]=>[5]}
common_keys_of_a1_and_a2 = a1.keys & a2.keys
  #=> [["1", "1001", "CA"], ["3", "1003", "WA"]]
keeper_idx(array1, a1, common_keys_of_a1_and_a2)
  #=> [1, 4] (for array1)
keeper_idx(array2, a2, common_keys_of_a1_and_a2)
  #=> [2, 3, 4, 5]· (for array2)
于 2017-08-29T23:01:24.550 回答
0

Array#-Array#&

array1 - array2   #data in array1 but not in array2
array2 - array1   #data in array2 but not in array1
array1 & array2   #data in both array1 and array2

由于您已标记此问题,因此您可以类似地使用集合:

require 'set'

set1 = array1.to_set
set2 = array2.to_set

set1 - set2
set2 - set1
set1 & set2
于 2017-07-26T20:24:18.487 回答
0

这不是很漂亮,但它有效。它创建了第三个数组,其中包含两者中的所有唯一值,array1array2对其进行迭代。

然后,由于include?不允许自定义匹配器,我们可以通过使用detect并在数组中查找具有自定义子哈希匹配的项目来伪造它。我们将把它包装在一个自定义方法中,这样我们就可以直接调用它,array1或者array2不用写两次。

最后,我们循环遍历array3并确定是item来自array1array2还是两者,并添加到相应的输出数组中。

array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
          {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

# combine the arrays into 1 array that contains items in both array1 and array2 to loop through
array3 = (array1 + array2).uniq { |item| { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } }

# Array#include? doesn't allow a custom matcher, so we can fake it by using Array#detect
def is_included_in(array, object)
  object_identifier = { 'id' => object['id'], 'ref' => object['ref'], 'name' => object['name'] }

  array.detect do |item|
    { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } == object_identifier
  end
end

# output array initializing
array1_only = []
array2_only = []
array1_and_array2 = []

# loop through all items in both array1 and array2 and check if it was in array1 or array2
# if it was in both, add to array1_and_array2, otherwise, add it to the output array that
# corresponds to the input array
array3.each do |item|
  in_array1 = is_included_in(array1, item)
  in_array2 = is_included_in(array2, item)

  if in_array1 && in_array2
    array1_and_array2.push item
  elsif in_array1
    array1_only.push item
  else
    array2_only.push item
  end
end


puts array1_only.inspect        # => [{"id"=>"2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
puts array2_only.inspect        # => [{"id"=>"8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"}, {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
puts array1_and_array2.inspect  # => [{"id"=>"1", "ref"=>"1001", "name"=>"CA", "extra"=>"Not Sorted On 5"}, {"id"=>"3", "ref"=>"1003", "name"=>"WA", "extra"=>"Not Sorted On 9"}]
于 2017-07-26T20:47:35.160 回答