我正在运行一个管道,在该管道中循环遍历 INFORMATION.SCHEMA.TABLES 中的所有表并将其复制到 Azure Data Lake 存储中。我的问题是,只有在任何表无法复制时,我才如何为失败的表运行此管道?
问问题
3452 次
1 回答
1
我发现的最佳方法是将您的流程编码为:
0. Yes, root cause the failure and identify if it is something wrong with the pipeline or if it is a “feature” of your dependency you have to code around.
1. Be idempotent. If your process ensures a clean state as the very first step, similar to Command Design pattern’s undo (but more naive), then your process can re-execute.
* with #1, you can safely use “retry” in your pipeline activities, along with sufficient time between retries.
* this is an ADFv1 or v2 compatible approach
2. If ADFv2, then you have more options and can have more complex logic to handle errors:
* for the activity that is failing, wrap this in an until-success loop, and be sure to include a bound on execution.
* you can add more activities in the loop to handle failure and log, notify, or resolve known failure conditions due to externalities out of your control.
3. You can also use asynchronous communication to future process executions that save success to a central store. Then later executions “if” I already was successful then stop processing before the activity.
* this is powerful for more generalized pipelines, since you can choose where to begin
4. Last resort I know (and I would love to learn new ways to handle) is manual re-execution of failed activities.
希望这会有所帮助,J
于 2018-03-27T15:28:26.923 回答