1

我有成千上万条记录的数据库

Code  | Name  | Price
00106 | Water | 9.99
00107 | Onion | 8.99

编码在GES如下文件中:

  • 00F表示列标题
  • 00I表示插入一行

还有其他喜欢(00D删除行或00U更新)

00F
0101
02Code
031
00F
0102
02Name
031
00F
0103
02Price
030
00I
0100106
02Water
030999
00I
0100107
02Onion
030899

我想创建处理此文件并将其推送到我的数据库中的导入器。所以我开始实现:

class Importer
  CONN = ActiveRecord::Base.connection
  F = "00F"
  I = "00I"

  def extract_to_database(collection)
    add       = true
    tmp       = []
    type      = F
    inserts   = []

    collection.each_with_index do |line, i|
      _type    = line.strip
      _changed = [F,I].include? _type

      if _changed && i > 0
        case type
        when F then @f << tmp
        when I
          group_id = Group.find_by(code: tmp[1]).id
          inserts.push "(group_id,'#{tmp[2]}','#{tmp[3]}')"
        end

        tmp  = []
        type = _type
      end

      tmp << line
    end
    sql = "INSERT INTO products (`group_id`, `name`, `price`) VALUES #{inserts.join(", ")}"
    CONN.execute sql
  end
end

有一个问题,我想使用函数式编程来重构它。

而且我将不得不找到其他模型code并将其放入与products表相关的some_model_id列中,这样会使整个过程复杂化。因为现在导入这些数据需要我几个小时。

也许使用 Ruby 并不是最好的选择。

4

1 回答 1

2

这里没有什么是 Ruby 无法处理的。尚不清楚“函数式编程”如何对此有所帮助,因为这是一个经典的状态机问题,正在进行一些简单的数据转换。

脚手架示例:

class SomethingImporter
  FIELD_MARKER = "00F"
  INSERT_MARKER = "00I"

  COLUMNS = %w[ group_id name price ]

  # Performs the insert into a given model. This should probably be a class
  # method on the model itself.
  def bulk_insert(model, rows)
    sql = [
      "INSERT INTO `#{model.table_name}` (#{columns.collect { |c| }}"
    ]

    # Append the placeholders: (?,?,?),(?,?,?),...
    sql[0] += ([ '(%s)' % ([ '?' ] * COLUMNS.length).join(',') ] * rows.length).join(',')

    sql += rows.flatten

    model.connection.execute(model.send(:sanitize_sql, sql))
  end

  # Resolve a group code to a group_id value, and cache the result so that
  # subsequent look-ups for the same code are valid.
  def group_id(group_code)
    @find_group ||= { }

    # This tests if any value has been cached for this code, including one
    # that might be nil.
    if (@find_group.key?(group_code))
      return @find_group[group_code]
    end

    group = Group.find_by(code: group_code)

    @find_group[group_code] = group && group.id
  end

  # Call this with the actual collection, lines stripped, and with any header
  # lines removed (e.g. collection.shift)
  def extract_rows(collection)
    state = nil
    rows = [ ]
    row = [ ]

    collection.each_with_index do |line|
      case (line)
      when FIELD_MARKER
        # Indicates field data to follow
        state = :field
      when INSERT_MARKER
        case (state)
        when :insert
          rows << [ row[0], row[1], (row[2].sub(/^0+/, '').to_f / 100) ]
        end

        state = :insert
        row = [ ]
      else
        case (state)
        when :field
          # Presumably you'd pay attention to the data here and establish
          # a mapping table.
        when :insert
          row << line.sub(/^\d\d/, '')
          # puts row.inspect
        end
      end
    end

    case (state)
    when :insert
      rows << [ row[0], row[1], (row[2].sub(/^0+/, '').to_f / 100) ]
    end

    rows
  end
end


data = <<END
00F
0101
02Code
031
00F
0102
02Name
031
00F
0103
02Price
030
00I
0100106
02Water
030999
00I
0100107
02Onion
030899
END

importer = SomethingImporter.new

puts importer.extract_rows(data.split(/\n/)).inspect

根据您的数据,此示例输出如下所示:

[["00106", "Water", 9.99], ["00107", "Onion", 8.99]]

在编写这样的代码时,一定要公开中间结果,以便能够测试正在发生的事情。您的实现一次性获取数据并将其直接转储到数据库中,如果无法正常运行,就很难判断哪里出了问题。此版本由多种方法组成,每种方法都有更具体的用途。

在您的原始示例中不清楚您为什么要解析group_id,您的示例输出与此无关,但作为示例,我包含了一个解决它们并保持它们缓存的方法,避免重复查找相同的东西. 对于更大规模的导入,您可能会加载多行,提取不同的 group_id 值,一次加载它们,并在插入之前重新映射它们。

于 2013-08-07T15:06:48.460 回答