postgresql - Amazon Redshift 如何从 s3 复制并设置 job_id

Question

Amazon Redshift 提供了使用“复制”命令从 s3 对象加载表数据的能力。是他们使用复制命令的一种方式，而且还为每个插入的行设置了额外的“col=CONSTANT”。

我想在每个复制的行上设置一个 job_id （不在源数据中），我认为当“复制”获取时，必须执行几百万次插入以使每一行都有一个作业属性，这将是一种耻辱我 99% 的路都有更好的表现。

也许有更聪明的解决方案？

score 12 · Accepted Answer

If you want all your rows added in a single COPY command to have the same value of job_id, then you may COPY data into staging table, then add job_id column into that table, then insert all data from the staging table into final table like:

CREATE TABLE destination_staging (LIKE destination);
ALTER TABLE destination_staging DROP COLUMN job_id;
COPY destination_staging FROM 's3://data/destination/(...)' (...)
ALTER TABLE destination_staging ADD COLUM job_id INT DEFAULT 42;
INSERT INTO destination SELECT * FROM destination_staging ORDER BY sortkey_column;
DROP TABLE destination_staging;
ANALYZE TABLE destination;
VACUUM destination;

ANALYZE and VACUUM are not necessary, but highly recommended in order to update query analyzer and put all new data into correct positions.

score 0 · Accepted Answer

似乎没有选项可以使用COPY命令本身进行后/预处理。因此，您最好的选择似乎是对您打算COPY放入 Redshift 的文件进行预处理，添加 jobid 然后将它们加载到 Redshift 中。

postgresql - Amazon Redshift 如何从 s3 复制并设置 job_id

2 回答 2

Related

Reference