3

In Redshift, there's an STL_QUERY table that stores queries that were run over the last 5 days. I'm trying to find a way to keep more than 5 days worth of records. Here are some things that I've considered:

  1. Is there a Redshift setting for this? It would appear not.
  2. Could I use a trigger? Triggers are not available in Redshift, so this is a no-go.
  3. Could I create an Amazon Data Pipeline job to periodically "scrape" the STL_QUERY table? I could, so this is an option. Unfortunately, I would have to give the pipeline some EC2 instance to use to run this work. It seems like a waste to have an instance sitting around to scrape this table once a day.
  4. Could I use an Amazon Simple Work Flow job to scrape the table? I could, but it suffers from the same issues as 3.

Are there any other options/ideas that I'm missing? I would prefer some other option that does not involve me dedicating an EC2 instance, even if it means paying for an additional service (provided that it's cheaper than the EC2 instance I would have used in it's stead).

4

1 回答 1

3

保持简单,一切都在 Redshift 中完成。

首先,使用“CREATE TABLE ... AS”将所有当前历史记录保存到永久表中。

CREATE TABLE admin.query_history AS SELECT * FROM stl_query;

其次,使用psql来运行它,在您控制的机器上安排一个作业来每天运行它。

INSERT INTO admin.query_history SELECT * FROM stl_query WHERE query > (SELECT MAX(query) FROM admin.query_history);

完毕。:)

笔记:

  • psql如果您尚未设置,则需要 8.x 版本。
  • 即使您的工作几天没有运行,stl_query 也会保留足够的历史记录,您将被覆盖。
  • 根据您的评论,使用 starttime 而不是 query 作为条件可能更安全。
于 2013-11-08T14:25:43.663 回答