amazon-web-services - Amazon Redshift at 100% disk usage due to VACUUM query

Question

Reading the Amazon Redshift documentatoin I ran a VACUUM on a certain 400GB table which has never been vacuumed before, in attempt to improve query performance. Unfortunately, the VACUUM has caused the table to grow to 1.7TB (!!) and has brought the Redshift's disk usage to 100%. I then tried to stop the VACUUM by running a CANCEL query in the super user queue (you enter it by running "set query_group='superuser';") but although the query didn't raise an error, this had no effect on the vaccum query which keeps running.

What can I do?

score 9 · Accepted Answer

我已经多次停止真空操作。也许那个时候该功能不可用。
运行以下查询，它会为您提供 Vacuum 查询的进程 ID。

select * from stv_recents where status='Running';

获得进程 ID 后，您可以运行以下查询来终止进程。

select pg_terminate_backend( pid );

score 8 · Accepted Answer

显然，目前您无能为力。我与亚马逊支持通了一个小时的电话，他们没有停止真空操作的工具。他们打开了一张关于 CANCEL 查询的票，默默地不处理 VACUUM 查询。

他们建议我拍摄集群的快照（如果您已经制作了以前的快照，通常应该需要几分钟），然后我重新启动集群。它有点工作，这意味着真空停止，一些磁盘空间被清除（600GB），但表仍然是原始大小的两倍多。因为再次清理它太冒险了，我求助于创建它的深层副本，这应该创建表的清理副本。（您可以在此处阅读有关深拷贝的信息 - http://docs.aws.amazon.com/redshift/latest/dg/performing-a-deep-copy.html）。

score 4 · Accepted Answer

提示：运行此查询：（取自此处）以查看您应该清理哪些表。

注意：这仅在您想知道哪些表很大以及您可以通过vacuum每个表获得什么的情况下才有帮助。

select trim(pgdb.datname) as Database,
    trim(a.name) as Table,  ((b.mbytes/part.total::decimal)*100)::decimal(5,2) as pct_of_total, b.mbytes, b.unsorted_mbytes
    from stv_tbl_perm a
    join pg_database as pgdb on pgdb.oid = a.db_id
    join (select tbl, sum(decode(unsorted, 1, 1, 0)) as unsorted_mbytes, count(*) as mbytes
    from stv_blocklist group by tbl) b on a.id=b.tbl
    join ( select sum(capacity) as  total
      from stv_partitions where part_begin=0 ) as part on 1=1
    where a.slice=0
    order by 3 desc, db_id, name;

然后用高真空表unsorted_mbytes：VACUUM your_table;

score 1 · Accepted Answer

吸尘要定期安排，如果每天对桌子进行吸尘，应该很快，不会有明显的副作用；
在您描述的情况下，将集群扩展到更大的配置会更安全，然后进行清理，然后您可以缩小到原始配置。请记住，可用磁盘空间对于 RedShift 集群上的计算至关重要，当可用磁盘空间下降时，集群上的所有读/写操作都会变得非常缓慢。

amazon-web-services - Amazon Redshift at 100% disk usage due to VACUUM query

4 回答 4

Related

Reference