sql - PostgreSQL 从 500 万行表中选择

Question

我有大约 500 万行的表

CREATE TABLE audit_log
(
  event_time timestamp with time zone NOT NULL DEFAULT now(),
  action smallint, -- 1:modify, 2:create, 3:delete, 4:security, 9:other
  level smallint NOT NULL DEFAULT 20, -- 10:minor, 20:info, 30:warning, 40:error
  component_id character varying(150),
  CONSTRAINT audit_log_pk PRIMARY KEY (audit_log_id)
)
WITH (
  OIDS=FALSE
);

我需要使用类似的东西获取所有组件 ID，完成查询SELECT component_id from audit_log GROUP BY component_id大约需要20 秒。我该如何优化呢？

更新：

我在 component_id 上有索引

CREATE INDEX audit_log_component_id_idx
  ON audit_log
  USING btree
  (component_id COLLATE pg_catalog."default");

UPD 2：嗯，我知道一种解决方案是将组件名称移动到单独的表中，但希望有一个更简单的解决方案。多谢你们。

score 1 · Accepted Answer

在列 component_id 上创建索引

由于它是查询中使用的唯一列，因此您可以直接从索引中访问信息。

您可能还想考虑将组件（当前为字符串）移动到单独的表中，通过整数或类似类型的 ID 引用它。

score 0 · Accepted Answer

为您的表创建一个非聚集索引 (component_id)。或者为您用作 where 类的一部分的所有字段定义非聚集。尝试查看执行时间差或执行计划。赌注是将所有扫描转换为搜索操作。

score 0 · Accepted Answer

如果您在另一个表中有一个有效组件 ID 的列表，并且只想检查它们是否存在于审计表中，可以选择使用某些条件，那么您可以：

select
  component_id
from
  components
where
  exists (
    select null
    from   audit_log
    where  audit_log.component_id = components.component_id)

如果不同的component_id 的数量明显小于audit_log 中的行数并且audit_log.component_id 被索引，那么这将执行得更好。

sql - PostgreSQL 从 500 万行表中选择

3 回答 3

Related

Reference