简而言之:我们面临一个问题,即在远程 Oracle DB 上执行全表扫描而不是使用索引。
设置:
Postgres 12.3 在带有 oracle 基本客户端的丰富 docker 容器中,连接到版本 19c 中的远程 Oracle DB。访问的表有 2M 个条目。安装的 oracle_fdw 是 2.30 版本。
问题:
似乎外部表上的选择没有使用外部表的索引。我们要根据本地表中的数据来选择外部表中的数据。我们尝试了不同的方法,例如连接或子选择,但未使用外表上的索引。我们尝试使用函数生成不可变数据,这确实有效。对于这个单一的 id,语句在 12 毫秒内返回,解释计划显示使用了索引。
CREATE FUNCTION f_single()
RETURNS text LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT id FROM local_table';
SELECT r.* FROM remote_table r WHERE r.id IN (SELECT f_ single ());
"Insert on another_local_table (cost=10000.00..10010.00 rows=1 width=5981) (actual time=11.855..11.855 rows=0 loops=1)"
" -> Foreign Scan on remote_table r (cost=10000.00..10010.00 rows=1 width=5981) (actual time=11.095..11.793 rows=1 loops=1)"
" Output: r.id"
" Oracle query: SELECT /*fcb71071ce9258eac9244f42c3067c30*/ r3."ID"FROM " REMOTE_TABLE " r3 WHERE (r3."ID" = '2351923')"
" Oracle plan: SELECT STATEMENT"
" Oracle plan: TABLE ACCESS BY INDEX ROWID REMOTE_TABLE "
" Oracle plan: INDEX UNIQUE SCAN PK_REMOTE_TABLE (condition "R3"."ID"='2351923')"
"Planning Time: 5.128 ms"
"Execution Time: 11.998 ms"
但它实际上是行不通的,如果我们使用这里看到的函数返回多行:
CREATE FUNCTION f_multi()
RETURNS setof text LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT id FROM local_table';
SELECT r.* FROM remote_table r WHERE r.id IN (SELECT f_ multi ());
"Insert on another_local_table (cost=10022.26..20451397.84 rows=1000 width=5981) (actual time=264112.346..264112.346 rows=0 loops=1)"
" -> Hash Join (cost=10022.26..20451397.84 rows=1000 width=5981) (actual time=17482.841..264112.267 rows=1 loops=1)"
" Output: r.id "
" Inner Unique: true"
" Hash Cond: ((r.id)::text = (f_multi()))"
" -> Foreign Scan on remote_table r (cost=10000.00..20446000.00 rows=2043600 width=5981) (actual time=319.042..263161.299 rows=1981851 loops=1)"
" Output: r.id"
" Oracle query: SELECT /*ceeb047d793530c693667f5f6fada4d8*/ r3."ID FROM " REMOTE_TABLE" r3"
" Oracle plan: SELECT STATEMENT"
" Oracle plan: TABLE ACCESS FULL REMOTE_TABLE "
" -> Hash (cost=19.77..19.77 rows=200 width=32) (actual time=419.881..419.881 rows=1 loops=1)"
" Output: (f_multi())"
" Buckets: 1024 Batches: 1 Memory Usage: 9kB"
" -> HashAggregate (cost=17.77..19.77 rows=200 width=32) (actual time=419.878..419.878 rows=1 loops=1)"
" Output: (f_multi())"
" Group Key: f_multi()"
" -> ProjectSet (cost=0.00..5.27 rows=1000 width=32) (actual time=419.867..419.870 rows=1 loops=1)"
" Output: f_multi()"
" -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=419.744..419.745 rows=1 loops=1)"
"Planning Time: 4.804 ms"
"JIT:"
" Functions: 11"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 1.896 ms, Inlining 3.663 ms, Optimization 82.373 ms, Emission 333.437 ms, Total 421.368 ms"
"Execution Time: 264114.529 ms"
在这种情况下,即使是一行,语句也需要大约 4 分钟才能返回。解释计划显示执行了全表扫描。
为什么不使用索引?我们可以做些什么来强制使用索引?
如果需要有关设置或表格的更多信息,我们将更新问题。
我们基本上追踪到了这个需求(WHERE 语句似乎没有被强制到 Oracle):
SELECT r.* FROM remote_table r
INNER JOIN local_table l
ON l.id = r.id;
任何帮助表示赞赏。谢谢!