sql - Columnstore index - slow performance on offset-fetch query

Question

We have Fact table around 35M rows on Azure database (premium tier), this table has cluster columnstore index enabled in order to boost query performance.

We did pagination (to index on Elastic Search) on Fact table using similar below code:

SELECT *
FROM [SPENDBY].[FactInvoiceDetail]
ORder by id
offset 1000000 rows fetch next 1000 rows only

But this query performs so slow, even over 10 minutes, it's not finished. If we change to use TOP, it works really well and take around 30 seconds:

SELECT TOP 1000 * 
FROM [SPENDBY].[FactInvoiceDetail]
WHERE ID > 1000000 
ORDER BY Id

The estimated execution plan for offset-fetch query:

I am not sure that I understand whether offset-fetch query performs very poorly on cluster columnstore index or not.

This table also have a lot of none-cluster B-tree indexes on foreign keys and one unique index on the Id of Fact table in order to boost performance

This execution plan for offset-fetch query:

https://pastebin.com/BM8MXQMg

score 4 · Accepted Answer

这里有几个问题。

1) Ordering BTree index is not a covering index for the paging query.

2) The rows must be reconstructed from the CCI.

3) The offset is large.

分页查询需要排序列上的 BTree 索引来计算应返回哪些行，如果该 BTree 索引不包括所有请求的列，则需要对每一行进行行查找。这是查询计划中的“嵌套循环”运算符。

但是这些行存储在 CCI 中，这意味着每一列都在一个单独的数据结构中，读取单行需要为每一列、每一行提供一个逻辑 IO。这就是为什么这个查询特别昂贵。以及为什么 CCI 不是分页查询的糟糕选择。排序列上的聚集索引，或排序列上包含剩余请求列的非聚集索引会好得多。

这里的次要和较小的问题是大偏移量。SQL 必须跳过偏移的行，并在其进行时对其进行计数。所以这将读取 BTree 叶级页面的前 N 页以跳过行。

score 2 · Accepted Answer

这个说法：

SELECT TOP 1000 * 
FROM [SPENDBY].[FactInvoiceDetail]
WHERE ID > 1000000 
ORDER BY Id

完全与 ID > 1000000 的（聚集？）ID 字段索引（是主键吗？）一起工作

另一个语句排序并搜索将满足偏移量 1000000 行的 ID 值

对于优化器，偏移量 1000000 行不等于 WHERE ID > 1000000，除非 ID 值没有间隙。

score 1 · Accepted Answer

这里的主要问题是OFFSET大值..

偏移 1000000 行仅获取下 1000 行

OFFSet 和 Fetch 效果很好，当 OFFSET 值较小时，请参阅下面的示例了解更多详细信息

SELECT orderid, orderdate, custid, filler
FROM dbo.Orders
ORDER BY orderdate DESC, orderid DESC
OFFSET 50 ROWS FETCH NEXT 10 ROWS ONLY;

我有按列的顺序作为关键列和选择中的列..这导致下面的计划..

这里要观察的关键点是 SQLServer 最终读取 Offset+fetch (50+10 ) 行，然后最终过滤 10 行

因此，使用大的偏移量，即使使用合适的索引，您也会以 1000000+1000 行读取结束，这是非常巨大的

如果您可以要求，sql server 在扫描后立即过滤掉 1000 行，这可以帮助您的查询..这可能是（未针对您的架构测试）通过重写您的查询来实现，如下所示

WITH CLKeys AS
(
 SELECT ID
 FROM yourtable
 ORDER BY ID desc
OFFSET 500000 ROWS FETCH FIRST 10 ROWS ONLY
)
SELECT K.*, O.rest of columns 
FROM CLKeys AS K
CROSS APPLY (SELECT columns needed other than id
FROM yourtable  AS A
WHERE A.id= K.id) AS O
ORDER BY Id desc;

参考资料：
http ://sqlmag.com/t-sql/offsetfetch-part-1#comment-25061

sql - Columnstore index - slow performance on offset-fetch query

3 回答 3

Related

Reference