8

Amazon Redshift 提供了使用“复制”命令从 s3 对象加载表数据的能力。是他们使用复制命令的一种方式,而且还为每个插入的行设置了额外的“col=CONSTANT”。

我想在每个复制的行上设置一个 job_id (不在源数据中),我认为当“复制”获取时,必须执行几百万次插入以使每一行都有一个作业属性,这将是一种耻辱我 99% 的路都有更好的表现。

也许有更聪明的解决方案?

4

2 回答 2

12

If you want all your rows added in a single COPY command to have the same value of job_id, then you may COPY data into staging table, then add job_id column into that table, then insert all data from the staging table into final table like:

CREATE TABLE destination_staging (LIKE destination);
ALTER TABLE destination_staging DROP COLUMN job_id;
COPY destination_staging FROM 's3://data/destination/(...)' (...)
ALTER TABLE destination_staging ADD COLUM job_id INT DEFAULT 42;
INSERT INTO destination SELECT * FROM destination_staging ORDER BY sortkey_column;
DROP TABLE destination_staging;
ANALYZE TABLE destination;
VACUUM destination;

ANALYZE and VACUUM are not necessary, but highly recommended in order to update query analyzer and put all new data into correct positions.

于 2013-05-30T12:31:36.673 回答
0

似乎没有选项可以使用COPY命令本身进行后/预处理。因此,您最好的选择似乎是对您打算COPY放入 Redshift 的文件进行预处理,添加 jobid 然后将它们加载到 Redshift 中。

于 2013-07-07T06:51:39.080 回答