我有一个迭代输入并将数据吐出到我已配置为上传到我创建的红移表的 AWS Firehose 的过程。一个问题是,有时行可能会重复,因为该过程需要重新评估数据。就像是:
Event_date, event_id, event_cost
2015-06-25, 123, 3
2015-06-25, 123, 4
http://docs.aws.amazon.com/redshift/latest/dg/t_updating-inserting-using-staging-tables-.html
看那里,我想用新值替换旧行,比如:
insert into event_table_staging
select event_date,event_id, event_cost from <s3 location>;
delete from event_table
using event_table_staging
where event_table.event_id = event_table_staging.event_id;
insert into target
select * from event_table_staging;
delete from event_table_staging
select * from event_table_staging;
是否可以执行以下操作:
Redshift columns: event_date,event_id,cost
copy event_table from <s3>
(update event_table
select c_source.event_date,c_source.event_id,c_source.cost from <s3 source> as c_source join event_table on c_source.event_id = event_table.event_id)
CSV
copy event_table from <s3>
(insert into event_table
select c_source.event_date,c_source.event_id,c_source.cost from event_table left outer join<s3 source> as c_source join on c_source.event_id = event_table.event_id where c_source.event_id is NULL)
CSV