2

我有一个包含许多列的“主”文件:1 2 3 4 5。我还有一些其他文件,其行数少于主文件,每个文件的列数:1 6。我想合并这些文件匹配在第 1 列字段上并将第 6 列添加到主字段。我见过一些 python/UNIX 解决方案,但如果它适合的话,我更喜欢使用 ruby​​/fastercsv。我将不胜感激任何帮助入门。

4

3 回答 3

2

FasterCSV 现在是 Ruby 1.9 中的默认 CSV 实现。此代码未经测试,但应该可以工作。

require 'csv'
master = CSV.read('master.csv') # Reads in master
master.each {|each| each.push('')} # Adds another column to all rows
Dir.glob('*.csv').each do |each| #Goes thru all csv files
  next if each == 'master.csv' # skips the master csv file
  file = CSV.read(each) # Reads in each one
  file.each do |line| #Goes thru each line of the file
    temp = master.assoc(line[0]) # Finds the appropriate line in master
    temp[-1] = line[1] if temp #updates last column if line is found
  end
end

csv = CSV.open('output.csv','wb') #opens output csv file for writing
master.each {|each| csv << each} #Goes thru modified master and saves it to file
于 2011-10-30T20:18:46.260 回答
1
$ cat j4.csv
how, now, brown, cow, f1
now, is, the, time, f2
one, two, three, four, five
xhow, now, brown, cow, f1
xnow, is, the, time, f2
xone, two, three, four, five
$ cat j4a.csv
how, b
one, d
$ cat hj.rb
require 'pp'
require 'rubygems'
require 'fastercsv'

pp(
  FasterCSV.read('j4a.csv').inject(
    FasterCSV.read('j4.csv').inject({}) do |m, e|
      m[e[0]] = e
      m
    end) do |m, e|
    k = e[0]
    m[k] << e.last if m[k]
    m
  end.values)
$ ruby hj.rb
[["now", " is", " the", " time", " f2"],
 ["xhow", " now", " brown", " cow", " f1"],
 ["xone", " two", " three", " four", " five"],
 ["how", " now", " brown", " cow", " f1", " b"],
 ["one", " two", " three", " four", " five", " d"],
 ["xnow", " is", " the", " time", " f2"]]

这通过将您的主文件映射到以第一列作为键的哈希值来工作,然后它只是从您的其他文件中查找键。正如所写的那样,当键匹配时,代码会附加最后一列。由于您有多个非主文件,因此您可以通过替换FasterCSV.read('j4a.csv')为读取每个文件并将它们全部连接到单个数组数组中的方法来适应这个概念,或者您可以只保存内部的结果inject(主哈希) 并循环应用其他文件。

于 2011-10-30T19:49:40.917 回答
0
temp = master.assoc(line[0]) 

以上是一个超级慢的过程。整个复合体至少为 O(n^2)。

我会使用以下过程:

  1. 对于 1 6 csv,将其转换为以 1 为键、6 为值的大哈希,命名为 1_to_6_hash
  2. 逐行循环 1 2 3 4 5 csv,设置 row[6] = 1_to_6_hash[row[1]]

它将显着降低复杂度到 O(n)

于 2018-09-08T14:53:40.450 回答