0

我有两个 CSV 文件:“用户”和“注册”:

001.csv:

user_id,user_name,state
12345,test_account,active

002.csv:

course_id,user_id,state
67890,12345,active

我需要创建一个像 active_enrollments.csv 这样的文件:

course_id,user_name
67890,test_account

在不多次循环文件的情况下,如何解析这些文件以生成文件 active_enrollments.csv?

这是我到目前为止所拥有的,但我得到了很多重复:

require 'csv'

CSV.open("active-enrollments.csv", "wb") do |csv|
  csv << ["course_id", "user_name", "user_id","course_name", "status"]
end
Dir["csvs/*.csv"].each do |file|
  #puts file
CSV.foreach(file, :headers => true) do |row|
if row['user_id'] && row ['course_id'] #finds enrollment csvs
  if row['state'] == "active" #checks for active enrollments
    state = row['state']
    course_id = row['course_id']
    user_id = row['user_id']
    Dir["csvs/*.csv"].each do |files|
      CSV.foreach(files, :headers => true) do |user|
        if user['user_name']
          if user_id == user['user_id']
            user_name = user['user_name']
            Dir["csvs/*.csv"].each do |file|
              CSV.foreach(file, :headers => true) do |courses|
                if course_id == courses['course_id']
                  course_name = courses['course_name']
                  CSV.open("active-enrollments.csv", "a") do |csv|
                    csv << [course_id, user_name, user_id, course_name, state]
                  end
                end 
              end
            end
          end
        end
      end
    end
  end
end
end
end

我知道这很简单,但如果不多次循环文件并生成大量重复项,我似乎无法获得它。

4

3 回答 3

2

代替使用数据库或一堆成熟的模型,我建议使用简单的哈希作为查找。

以下内容未经测试,我省略了所有过滤器。

按名称将用户与注册 csvs 分开,并在用户 csvs 上迭代一次以创建查找user_id

users_csvs = Dir['csvs/users-*.csv']
enrollment_csvs = Dir['csvs/enrollment-*.csv']

users = {} 
users_csvs.each do |user_file|
  CSV.foreach(user_file, :headers => true) do |row|
    # Put in whatever data you will need later
    users[row['user_id']] = {:user_name => row['user_name'], :state => row['state']}
  end
end

consolidated_csv = []
enrollment_csvs.each do |enrollment_file|
  CSV.foreach(enrollment_file, :headers => true) do |row|
    user_id = row['user_id']
    if user = users[user_id]
      # Put in whatever you want from the two objects
      consolidated_csv << {:course_id => row['course_id'], :user_name => row['user_name']}
    end
  end
end

CSV.open("active-enrollments.csv", "wb") do |csv|
   csv << ['course_id', 'user_name']
   consolidated_csv.each { |row| csv << [row[:course_id], row[:user_name]] }
end
于 2013-07-09T05:15:26.100 回答
1

使用 Sqlite 可能会更容易,从 CSV 文件中提取数据,将其保存在临时数据库中,然后查询 db 以生成最终输出。

于 2013-07-09T03:37:33.097 回答
0

下面是一些示例代码,展示了如何使用简单的 SQLite 数据库和 Sequel ORM 来做到这一点:

require 'csv'
require 'sequel'

DB = Sequel.sqlite(File.dirname(__FILE__) + '/temp.db')

# user_id,user_name,state
# 12345,test_account,active
DB.create_table :csv1 do
  primary_key :id
  Integer :user_id
  String :user_name
  String :state
end

TABLE_001 = DB[:csv1]
CSV.foreach('001.csv', :headers => :first_row) do |row|
  TABLE_001.insert(
    :user_id   => row['user_id'],
    :user_name => row['user_name'],
    :state     => row['state']
  )
end

# course_id,user_id,state
# 67890,12345,active
DB.create_table :csv2 do
  primary_key :id
  Integer :course_id
  Integer :user_id
  String :state
end

# I need to create one file like active_enrollments.csv:
#
#     course_id,user_name
#     67890,test_account
TABLE_002 = DB[:csv2]
CSV.foreach('002.csv', :headers => :first_row) do |row|
  TABLE_002.insert(
    :course_id => row['course_id'],
    :user_id   => row['user_id'],
    :state     => row['state']
  )
end

CSV.open('active_enrollments.csv', 'w') do |csv_out|
  TABLE_001.each do |row_001|
    row_002 = TABLE_002.where(:user_id => row_001[:user_id]).first
    csv_out << [row_002[:course_id], row_001[:user_name]]
  end
end

运行后,“active_enrollments.csv”包含:

67890,test_account

这是一个非常可扩展的解决方案。

运行两次会出错,因为 Sequel 会尝试在数据库中生成新表。擦除文件,或为两个create_table块添加异常处理程序。

于 2013-07-09T06:51:31.287 回答