2

Firstly - Thibaut, thank you for Kiba. It goes toe-to-toe with 'enterprise' grade ETL tools and has never let me down.

I'm busy building an ETL pipeline that takes a numbers of rows, and reduces them down into a single summary row. I get the feeling that this should be a simple thing, but I'm a little stumped on how to approach this problem.

We have a number of CDR's from a voice switch, and need to condense them under some simple criteria into a a handful of summary records. So, the problem is; I have many thousands of records coming in from a Source, and need to transform them into only a few records based on some reduce criteria.

Kiba is really simple when there's a one-to-one Source -> Destination ETL, or even a one-to-many Source -> Destination with the new enumerable exploder in V3, but I don't see a clear path to many-to-one ETL pipelines.

Any suggestions or guidance would be greatly appreciated.

4

1 回答 1

2

Glad you find Kiba useful! There are various solutions to this use case.

I'm making some assumptions here (if these are incorrect, the solutions will exist, but be different, e.g. boundaries detections & external storage):

  • You are working with finite batches (rather than a continuous stream of updates).
  • The handful of summary records you are referring to can be held in memory.

My advice here is to leverage Kiba v3 ability to yield record in transform's close method (described in more depth in this article):

class InMemoryReduceTransform
  attr_reader :buffer, :summarize_cb

  def initialize(summarize_cb:)
    @buffer = []
    @summarize_cb = summarize_cb
  end

  def process(row)
    buffer << row
    nil # do not forward the row to the rest of the pipeline
  end

  def close
    summarize_cb(buffer).each do |row|
      yield row
    end
  end
end

In essence, you'll just stack up the input rows, until the source is out of data, at which point the close method will be called, and then you summarise the data you have and yield N summary rows.

Note: this is a simplistic implementation to put you on the right track. The next iteration of Kiba Pro will include a more scalable & generic version of this is (with vendor support). Please reach out if you are interested in it!

Let me know if this properly answers your question!

于 2020-03-31T12:56:01.937 回答