4

将 Exposed 与 DataSource 一起使用时,我遇到了意外的行为(我尝试了 apache DBCP 和 HikariCP)。

设置:带有索引的单个表 ( test)idflag带有索引的字段flag

询问:SELECT * from test where flag=1 limit 1;

手动运行时,使用索引,查询速度快。当通过暴露重复运行时,经过 9 次调用,性能会下降。该索引不再使用 - 请参阅下面的查询计划。

这是示例代码:


object TestTable : IntIdTable() {
    val flag = integer("flag").index()
}

fun createNRows(n: Int) = repeat(n) {
    TestTable.insert { it[flag] = 0 }
}

fun main(args: Array<String>) {
    val ds = HikariDataSource(HikariConfig().apply {
        jdbcUrl = "jdbc:postgresql://localhost:5432/testdb"
        username = ...
        password = ...
        setDriverClassName("org.postgresql.Driver")
    })

    Database.connect(ds)


    transaction {
        // only run the first time:
        // SchemaUtils.create(TestTable)
        // createNRows(1000000) 
        println("total ${TestTable.selectAll().count()} elements")
    }

    repeat(10000) {
        transaction {
            val startedAt = System.currentTimeMillis()
            TestTable.select { TestTable.flag.eq(1) }.limit(1).toList()
            println("Query took ${System.currentTimeMillis() - startedAt}")
        }
    }
}

输出:

total 1000000 elements
Query took 6
Query took 1
Query took 1
Query took 1
Query took 1
Query took 1
Query took 1
Query took 1
Query took 0
Query took 79
Query took 64
Query took 63
Query took 62
Query took 63
....

以下是EXPLAIN (ANALYZE, BUFFERS)启用的 postgres 日志:

这是最后一个快速查询:

2020-03-10 23:03:00.596 CET [71012] LOG:  duration: 0.021 ms  bind S_2: 
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.083 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.013 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:00.597 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.025 ms
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.011 ms  bind S_3: BEGIN
2020-03-10 23:03:00.597 CET [71012] LOG:  execute S_3: BEGIN
2020-03-10 23:03:00.597 CET [71012] LOG:  duration: 0.015 ms
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.159 ms  bind S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:00.598 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:00.598 CET [71012] LOG:  execute S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:00.598 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.028 ms
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.015 ms  plan:
    Query Text: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
    Limit  (cost=0.42..4.44 rows=1 width=8) (actual time=0.013..0.013 rows=0 loops=1)
      Buffers: shared hit=3
      ->  Index Scan using test_flag on test  (cost=0.42..4.44 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=1)
            Index Cond: (flag = 1)
            Buffers: shared hit=3
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.072 ms  bind S_1: COMMIT
2020-03-10 23:03:00.598 CET [71012] LOG:  execute S_1: COMMIT
2020-03-10 23:03:00.598 CET [71012] LOG:  duration: 0.017 ms
2020-03-10 23:03:00.599 CET [71012] LOG:  duration: 0.022 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:00.599 CET [71012] LOG:  duration: 0.007 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:00.599 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:00.599 CET [71012] LOG:  duration: 0.013 ms

这是第一个“慢”的:

2020-03-10 23:03:01.601 CET [71012] LOG:  duration: 0.022 ms  bind S_2: 
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.052 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.011 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:01.602 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.023 ms
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.012 ms  bind S_3: BEGIN
2020-03-10 23:03:01.602 CET [71012] LOG:  execute S_3: BEGIN
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.015 ms
2020-03-10 23:03:01.602 CET [71012] LOG:  duration: 0.192 ms  bind S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:01.602 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:01.602 CET [71012] LOG:  execute S_4: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
2020-03-10 23:03:01.602 CET [71012] DETAIL:  parameters: $1 = '1'
2020-03-10 23:03:01.678 CET [71012] LOG:  duration: 75.889 ms
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 75.868 ms  plan:
    Query Text: SELECT test.id, test.flag FROM test WHERE test.flag = $1 LIMIT 1
    Limit  (cost=0.00..0.02 rows=1 width=8) (actual time=75.864..75.864 rows=0 loops=1)
      Buffers: shared hit=96 read=4329
      ->  Seq Scan on test  (cost=0.00..16925.00 rows=1000000 width=8) (actual time=75.862..75.862 rows=0 loops=1)
            Filter: (flag = $1)
            Rows Removed by Filter: 1000000
            Buffers: shared hit=96 read=4329
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.054 ms  bind S_1: COMMIT
2020-03-10 23:03:01.679 CET [71012] LOG:  execute S_1: COMMIT
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.014 ms
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.025 ms  parse <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.004 ms  bind <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:01.679 CET [71012] LOG:  execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
2020-03-10 23:03:01.679 CET [71012] LOG:  duration: 0.009 ms

Postgres 版本(自制):

postgres (PostgreSQL) 11.5

客户端版本:


dependencies {
    implementation 'org.jetbrains.exposed:exposed:0.17.7'
    implementation "org.postgresql:postgresql:42.2.8"
    implementation 'org.jetbrains.kotlin:kotlin-stdlib'
    implementation 'com.zaxxer:HikariCP:2.3.2'
}

postgres 配置是默认的(日志是使用自动解释生成的,但没有它会重现问题)

以下是示例的来源:https ://github.com/RomanBrodetski/kotlin-exposed-issue

观察:

  • 如果.limit(1)删除,则不会重现该问题
  • 如果不使用数据源(Database.connect("jdbc:postgresql://localhost:5432/testdb", driver = "org.postgresql.Driver")而不是Database.connect(ds)),则不会重现该问题
  • 如果事务中有附加语句,则该问题不复现。
4

2 回答 2

2

完全删除.limit(1)使其始终使用索引。问题是在几次(5)次执行后为准备好的语句创建的总体计划是错误的。这limit 1就是它的原因。制作1绑定变量可以解决问题。不幸的是,我在 Exposed 库中找不到这样做的方法 - 它将数字内联到准备好的语句中。

出于某种原因,它认为它可以在顺序扫描期间立即找到匹配的行,无论我做什么真空/分析/创建统计数据,我都无法改变它的想法。(我尝试改变标志值的分布,没有帮助)

从 SQL 重现问题:

create index test_flag_partial_idx on test (flag) include (id) where flag is not null and flag = 1;

vacuum  full  analyse  test;

PREPARE select_with_limit_as_value AS SELECT test.id, test.flag FROM test WHERE test.flag IS NOT NULL AND test.flag = $1 LIMIT 1;
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);
EXECUTE select_with_limit_as_value(1);


PREPARE select_with_limit_as_bind AS SELECT test.id, test.flag FROM test WHERE test.flag IS NOT NULL AND test.flag = $1 LIMIT $2;
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);
EXECUTE select_with_limit_as_bind(1, 1);

第一个准备好的语句使用 limit 作为硬编码值,在几次执行后切换到使用顺序扫描的一般计划。第二个准备好的语句使用限制作为绑定变量,一般计划使用索引。

您需要将标志参数硬编码到查询中或将限制设为绑定变量。

在 PostgreSQL 12 中,您可以禁用通用计划,您可以在查询之前和之后更改它:

set plan_cache_mode = force_custom_plan;

都在 PostgreSQL 12.2 上试过。

于 2020-03-25T00:40:52.893 回答
0

插入数据后尝试收集表统计信息。看起来 CBO 了解表结构的统计信息较少。实际上,postgres使用您创建的索引并不是一个坏主意,因为索引的所有值都是相同的。因此,接下来要尝试的是从代码中删除索引,或者创建更好的索引。

最后,它似乎与它本身无关,Exposed而是与Postgresql它本身有关。

(我想发表评论,但由于我的声誉,这是不可能的,因此写了一个答案)

于 2020-03-19T12:23:23.293 回答