1

我的问题是创建一个复制管道,将表和数据从 MySql RDS 复制到 Redshift,我不能使用任何托管服务。此外,RDS 中的任何新更新也应复制到红移表中。

在查看了我的许多解决方案之后,我了解了以下步骤:

  1. 从 MySql RDS 创建平面文件/CSV 并将它们保存在 S3 中。
  2. 使用 Redshift 的 COPY 命令复制 staging 表中的数据,最后保存到主表中。
  3. 现在,对于更新部分,每次我将 CSV 推送到 S3 并重复第 2 步。

所以,我只是想确认上述方法是否可行?因为,每次更新发生时,旧数据是否会被完全删除并被新数据替换,或者是否可以只更新必要的记录。如果是,那么如何?

任何帮助将不胜感激。提前致谢。

4

1 回答 1

0

Yes, above strategy is not just fine, its good. I use it in production system and it works great, though you have to careful and craft this strategy to make sure that it solves your use case effectively and efficiently.

Here is few points, what I mean by effectively and efficiently.

  1. Make sure you have most efficient way to identify the records to be pushed to Redshift, meaning identify the potential records with optimized queries that includes CPU, Memory.
  2. Make sure to use optimized way to send the identified to redshift that includes data size optimization, so that it uses minimum storage and network bandwidth. e.g. compress and gzip CSV files, so that it takes minimum size in S3 storage and save network bandwidth.
  3. Try to run copy redshift queries in a way that it executes in parallel.

Hope this will help.

于 2020-04-13T05:05:22.853 回答