ruby - 如何对数组中的重复项进行排序和删除？

Question

我必须比较两个由电子商务填充的 Csv 文件。这些文件总是相似的，除了较新的文件有不同数量的项目，因为目录每周都在变化。

CSV 文件示例：

sku_code, description, price, url    
001, product one, 100, www.something.com/1 
002, prouct two, 150, www.something.com/2

通过比较在不同日期提取的两个文件，我想生成一个已停产的产品列表和另一个已添加的产品列表。

我的索引应该是 Sku_code，它在目录中是唯一的。

我一直在使用stackoverflow 中的这段代码：

#old file
f1 = IO.readlines("oldfeed.csv").map(&:chomp)
#new file
f2 = IO.readlines("newfeed.csv").map(&:chomp)

#find new products
File.open("new_products.txt","w"){ |f| f.write((f2-f1).join("\n")) }

#find old products
File.open("deleted_products.txt","w"){ |f| f.write((f1-f2).join("\n")) }

我的问题

它运行良好，除了在一种情况下：当sku_code更改之后的字段之一时，产品被认为是“新的”（例如：价格的变化），即使对于我的需要，它是相同的产品。

sku_code仅比较而不是整行的最聪明的方法是什么？

score 2 · Accepted Answer

无需使用 CSV 库，因为您对实际值不感兴趣（除了sku_code）。我将每一行放入一个散列中，并sku_code作为键，比较sku_codes，然后他们从这些散列中检索值。

#old file
f1 = IO.readlines("oldfeed.csv").map(&:chomp)
f1_hash = f1[1..-1].inject(Hash.new) {|hash,line| hash[line[/^\d+/]] = line; hash}
#new file
f2 = IO.readlines("newfeed.csv").map(&:chomp)
f2_hash = f2[1..-1].inject(Hash.new) {|hash,line| hash[line[/^\d+/]] = line; hash}

#find new products
new_product_keys = f2_hash.keys - f1_hash.keys
new_products = new_product_keys.map {|sku_code| f2_hash[sku_code] }

#find old products
old_product_keys = f1_hash.keys - f2_hash.keys
old_products = old_product_keys.map {|sku_code| f1_hash[sku_code] }

# write new products to file
File.open("new_products.txt","w") do |f|
  f.write "#{f2.first}\n"
  f.write new_products.join("\n")
end

#write old products to file
File.open("deleted_products.txt","w") do |f|
  f.write "#{f1.first}\n"
  f.write old_products.join("\n")
end

每个 csv 文件的第一行仅包含列名。所以我跳过了每个 csv 文件的第一行 ( f1[1..-1]) 并在稍后写入新文件 ( f.write "#{f1.first}\n") 时添加它。

测试了两个虚构的 csv 文件。

old_products编辑：使用意外计算new_product_keys，这是一个错字。感谢那些试图编辑我的答案（但不幸被拒绝）的人。

score 0 · Accepted Answer

 require 'csv'
 #I'm really hungover
 DOA = 'oldfeed.csv'
 DOB = 'newfeed.csv'
 #^this is where your files are located

DOC = 'finished_product.csv'
#this little guy here is a csv file that has the unique values
#you dont need to create this file, ruby will make it for you


holder_1 = CSV.read(DOA)
holder_2 = CSV.read(DOB)
#we just put both csv files into an array
#way too early to be up
#assuming the Sku_code is the first number '001'
#holder_1[0][0] = 001
#holder_1[1][0] = 002

这应该会让你动起来，你需要两个 while 循环和一个 if 语句，你需要更多信息吗？或者你对此还好吗？

如果你想要一个 csv 文件来显示你的结果，使用 csv gem 会更容易。

score 0 · Accepted Answer

假设您没有太大的性能问题，我认为您希望争取最少的代码量。即使性能是一个问题，我也会从最简单的方法开始，然后根据您的需要进行改进。

我认为使用 CSV gem 是一个好主意，因为它是您必须为其编写代码的一件事。也就是说，这是解决此问题的另一种方法。请注意，diff下面的函数适用于数组或哈希，并且与键的定义方式无关。它在内部使用数组进行键查找，但将其更改为使用散列很简单。

l1a = "001, product one, 100, www.something.com/1"
l2 = "002, prouct two, 150, www.something.com/2"
l1b = "001, product one, 120, www.something.com/1"
l3 = "003, product three, 100, www.something.com/1"
l4 = "004, product four, 100, www.something.com/1"

file_old = [l1a, l2, l3]
file_new = [l1b, l2, l4]

sku = -> (record) do
  record.split(',')[0]
end

def diff(set1, set2, keyproc)
  set2_keys = set2.collect {|e| keyproc.call(e)}
  set1.reject {|e| set2_keys.include?(keyproc.call(e))}
end

puts diff(file_old, file_new, sku)
# => "003, product three, 100, www.something.com/1"
puts diff(file_new, file_old, sku)
# => "004, product four, 100, www.something.com/1"

ruby - 如何对数组中的重复项进行排序和删除？

3 回答 3

Related

Reference