36

在 AWS Redshift 中,我想将排序键添加到已创建的表中。有没有可以添加列并将其用作排序键的命令?

4

8 回答 8

31

正如 Yaniv Kessler 所说,创建表后无法添加或更改 distkey 和排序键,您必须重新创建表并将所有数据复制到新表中。您可以使用以下 SQL 格式重新创建具有新设计的表。

ALTER TABLE test_table RENAME TO old_test_table;
CREATE TABLE new_test_table([new table columns]);
INSERT INTO new_test_table (SELECT * FROM old_test_table);
ALTER TABLE new_test_table RENAME TO test_table;
DROP TABLE old_test_table;

根据我的经验,这个 SQL 不仅用于更改 distkey 和 sortkey,还用于设置编码(压缩)类型。

于 2013-11-21T07:27:38.583 回答
30

要添加到 Yaniv 的答案,理想的方法可能是使用 CREATE TABLE AS 命令。您可以明确指定 distkey 和 sortkey。IE

CREATE TABLE test_table_with_dist 
distkey(field) 
sortkey(sortfield) 
AS 
select * from test_table

其他示例:

http://docs.aws.amazon.com/redshift/latest/dg/r_CTAS_examples.html

编辑

我注意到这种方法不保留编码。Redshift 仅在复制语句期间自动编码。如果这是一个持久表,您应该重新定义该表并指定编码。

create table test_table_with_dist(
    field1 varchar encode row distkey
    field2 timestam pencode delta sortkey);

insert into test_table select * from test_table;

您可以通过运行找出要使用的编码analyze compression test_table;

于 2015-04-15T17:30:31.267 回答
28

更新:

Amazon Redshift 现在允许用户添加和更改现有 Redshift 表的排序键,而无需重新创建表。新功能简化了用户在 Redshift 中维护最佳排序顺序的体验,以在查询模式不断发展时实现高性能,并且不会中断对表的访问。

来源:https ://aws.amazon.com/about-aws/whats-new/2019/11/amazon-redshift-supports-changeing-table-sort-keys-dynamically/

目前我认为这是不可能的(希望将来会改变)。过去当我遇到这种情况时,我创建了一个新表并将旧表中的数据复制到其中。

来自http://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html

ADD [ COLUMN ] column_name 将具有指定名称的列添加到表中。您只能在每个 ALTER TABLE 语句中添加一列。

您不能添加作为表的分布键 (DISTKEY) 或排序键 (SORTKEY) 的列。

您不能使用 ALTER TABLE ADD COLUMN 命令修改以下表和列属性:

独特

首要的关键

REFERENCES(外键)

身份

最大列名长度为 127 个字符;较长的名称被截断为 127 个字符。您可以在单个表中定义的最大列数为 1,600。

于 2013-07-27T16:40:40.360 回答
12

AWS 现在允许您添加 sortkeys 和 distkeys 而无需重新创建表:

添加排序键(或更改排序键):

ALTER TABLE data.engagements_bot_free_raw ALTER SORTKEY (id)

要更改 distkey 或添加 distkey:

ALTER TABLE data.engagements_bot_free_raw ALTER DISTKEY id

有趣的是,括号在 SORTKEY 上是强制性的,但在 DISTKEY 上不是。

您仍然无法就地更改表的编码 - 这仍然需要您必须重新创建表的解决方案。

于 2019-10-30T22:37:03.653 回答
1

我遵循这种方法将排序列添加到我的表 table_transactons 它或多或少相同的方法只是更少的命令数量。

alter table table_transactions rename to table_transactions_backup;
create table table_transactions compound sortkey(key1, key2, key3, key4) as select * from table_transactions_backup;
drop table table_transactions_backup;
于 2018-01-24T06:10:46.013 回答
1

赶上这个查询有点晚。
我发现使用 1=1 是在 redshift 中创建并将数据复制到另一个表中的最佳方法,例如: CREATE TABLE NEWTABLE AS SELECT * FROM OLDTABLE WHERE 1=1;

然后您可以在验证数据已被复制后删除 OLDTABLE

(如果将 1=1 替换为 1=2,它只会复制结构 - 这对于创建临时表很有用)

于 2019-10-31T03:10:18.523 回答
1

现在可以更改排序键:

Amazon Redshift 现在支持动态更改表排序键

Amazon Redshift now enables users to add and change sort keys of existing Redshift tables without having to re-create the table. The new capability simplifies user experience in maintaining the optimal sort order in Redshift to achieve high performance as their query patterns evolve and do it without interrupting the access to the tables.

Customers when creating Redshift tables can optionally specify one or more table columns as sort keys. The sort keys are used to maintain the sort order of the Redshift tables and allows the query engine to achieve high performance by reducing the amount of data to read from disk and to save on storage with better compression. Currently Redshift customers who desire to change the sort keys after the initial table creation will need to re-create the table with new sort key definitions.

With the new ALTER SORT KEY command, users can dynamically change the Redshift table sort keys as needed. Redshift will take care of adjusting data layout behind the scenes and table remains available for users to query. Users can modify sort keys for a given table as many times as needed and they can alter sort keys for multiple tables simultaneously.

For more information ALTER SORT KEY, please refer to the documentation.

文件

至于文档本身:

ALTER DISTKEY column_name 或 ALTER DISTSTYLE KEY DISTKEY column_name 更改用作表的分布键的列的子句。考虑以下:

VACUUM and ALTER DISTKEY cannot run concurrently on the same table.

If VACUUM is already running, then ALTER DISTKEY returns an error.

If ALTER DISTKEY is running, then background vacuum doesn't start on a table.

If ALTER DISTKEY is running, then foreground vacuum returns an error.

You can only run one ALTER DISTKEY command on a table at a time.

The ALTER DISTKEY command is not supported for tables with interleaved sort keys.

When specifying DISTSTYLE KEY, the data is distributed by the values in the DISTKEY column. For more information about DISTSTYLE, see CREATE TABLE.

ALTER [COMPOUND] SORTKEY ( column_name [,...] ) 更改或添加用于表的排序键的子句。考虑以下:

You can define a maximum of 400 columns for a sort key per table.

You can only alter a compound sort key. You can't alter an interleaved sort key.

When data is loaded into a table, the data is loaded in the order of the sort key. When you alter the sort key, Amazon Redshift reorders the data. For more information about SORTKEY, see CREATE TABLE.
于 2019-12-12T12:25:16.803 回答
0

根据更新的文档,现在可以使用以下命令更改排序键类型:

ALTER [COMPOUND] SORTKEY ( column_name [,...] )

供参考(https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html):

  • “您可以将交错排序键更改为复合排序键或无排序键。但是,您不能将复合排序键更改为交错排序键。”
于 2021-11-23T00:30:36.753 回答