Yes, above strategy is not just fine, its good
. I use it in production system
and it works great, though you have to careful and craft this strategy to make sure that it solves your use case effectively
and efficiently
.
Here is few points, what I mean by effectively and efficiently.
- Make sure you have most efficient way to identify the records to be pushed to
Redshift
, meaning identify the potential records with optimized queries that includes CPU
, Memory
.
- Make sure to use optimized way to send the identified to
redshift
that includes data size optimization, so that it uses minimum storage
and network bandwidth
. e.g. compress and gzip
CSV files, so that it takes minimum size in S3
storage and save network
bandwidth.
- Try to run
copy redshift
queries in a way that it executes in parallel.
Hope this will help.