postgresql - 在postgres中通过查询表提高分组速度

Question

我有一个具有以下结构的连接表：

CREATE TABLE adjectives_friends
(
  adjective_id integer,
  friend_id integer
)
WITH (
  OIDS=FALSE
);
ALTER TABLE adjectives_friends
  OWNER TO rails;


CREATE UNIQUE INDEX index_adjectives_friends_on_adjective_id_and_friend_id
  ON adjectives_friends
  USING btree
  (adjective_id , friend_id );

CREATE UNIQUE INDEX index_adjectives_friends_on_friend_id_and_adjective_id
  ON adjectives_friends
  USING btree
  (friend_id , adjective_id );
ALTER TABLE adjectives_friends CLUSTER ON index_adjectives_friends_on_friend_id_and_adjective_id;

该表包含大约 5000 万条记录。

形容词表是约 150 个条目的查找表。我想做的是找到与形容词列表最匹配的朋友。假设朋友拥有的形容词的最大数量是 10。所以，我尝试了这个查询：

SELECT count(friend_id) count, friend_id
  FROM adjectives_friends
  where adjective_id in (1,2,3,4,5,6,7,8,9,10)
  group by friend_id
  order by count desc
  limit 100

这在我的开发机器上大约需要 10 秒，带有查询计划

"Limit  (cost=831652.00..831652.25 rows=100 width=4)"
"  ->  Sort  (cost=831652.00..831888.59 rows=94634 width=4)"
"        Sort Key: (count(friend_id))"
"        ->  GroupAggregate  (cost=804185.31..828035.16 rows=94634 width=4)"
"              ->  Sort  (cost=804185.31..811819.81 rows=3053801 width=4)"
"                    Sort Key: friend_id"
"                    ->  Bitmap Heap Scan on adjectives_friends  (cost=85958.72..350003.24 rows=3053801 width=4)"
"                          Recheck Cond: (adjective_id = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))"
"                          ->  Bitmap Index Scan on index_adjectives_friends_on_adjective_id_and_friend_id  (cost=0.00..85195.26 rows=3053801 width=0)"
"                                Index Cond: (adjective_id = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))"

顺序是杀死我的原因，但我不知道避免它的好方法。无法预先计算计数，因为要选择的形容词完全是任意的，并且有 > 150 选择 10 种组合。现在，我认为最好的选择是在创建好友时获取 100 个最佳结果，保存结果，然后每隔 n 时间间隔更新一次。这是可以接受的，因为预计形容词会经常切换，而且我没有确切的 100 个最佳结果。但是，如果我可以将查询速度提高到 1 - 2 秒左右，那就没有必要了。有什么建议么？

score 1 · Accepted Answer

我认为您使用该查询计划不会做得更好。我相信你的话，计数不能预先计算。

我认为你最好的选择是

表调优
服务器调优
更快的硬件

如果您可以使用 smallint 而不是整数，您的表和索引将更窄，更多内容将适合页面，并且您的查询应该运行得更快。但是 smallint 是一个 2 字节的整数，范围从 -32768 到 +32767。如果您需要比这更多的 id 号码，smallint 将不起作用。

postgresql - 在postgres中通过查询表提高分组速度

1 回答 1

Related

Reference