kiba-etl - 如何将参数传递到您的 ETL 作业中？

Question

我正在构建一个 ETL，它将通过一个变量在不同的源上运行。

我如何执行我的工作（rake 任务）

Kiba.run(Kiba.parse(IO.read(etl_file),etl_file))

并为我传递参数etl_file然后用于其来源？

source MySourceClass(variable_from_rake_task)

score 3 · Accepted Answer

Kiba的作者在这里。

编辑：下面的解决方案仍然适用，但如果您需要更大的灵活性，您可以将 Kiba.parse 与块一起使用以获得更大的灵活性。有关详细说明，请参阅https://github.com/thbar/kiba/wiki/Considerations-for-running-Kiba-jobs-programmatically-(from-Sidekiq,-Faktory,-Rake,-...)。

由于您使用的是 Rake 任务（而不是在 Resque 或 Sidekiq 等并行环境中调用 Kiba），您现在可以做的是利用ENV变量，如下所示：

CUSTOMER_IDS=10,11,12 bundle exec kiba etl/upsert-customers.etl

或者，如果您正在使用您编写的 rake 任务，您可以执行以下操作：

task :upsert_customers => :environment do
  ENV['CUSTOMER_IDS'] = [10, 11, 12].join(',)
  etl_file = 'etl/upsert-customers.etl'
  Kiba.run(Kiba.parse(IO.read(etl_file),etl_file))
end

然后在upsert-customers.etl：

# quick parsing
ids = ENV['CUSTOMER_ID'].split(',').map { |c| Integer(c) }

source Customers, ids: ids

正如我之前所说，这仅适用于ENV可以安全利用的命令行模式。

对于并行执行，请确实跟踪https://github.com/thbar/kiba/issues/18，因为我要处理它。

让我知道这是否正确地满足了您的需求！

score 0 · Accepted Answer

看起来这是在这里跟踪https://github.com/thbar/kiba/issues/18并且已经在这里询问了将参数传递给 Kiba 运行方法

kiba-etl - 如何将参数传递到您的 ETL 作业中？

2 回答 2

Related

Reference