mysql - 用于索引表的 OR 和 IN 运算符的替代方案

Question

我正在处理的mysql查询如下：

select line_item_product_code, line_item_usage_start_date, sum(line_item_unblended_cost) as sum
from test_indexing
force index(date)
where line_item_product_code in('AmazonEC2', 'AmazonRDS')
      and product_region='us-east-1'
      and line_item_usage_start_date between date('2019-08-01')
      and date('2019-08-31 00:00:00')
group by line_item_product_code, line_item_usage_start_date
order by sum;

我已经在列（“line_item_usage_start_date”）上应用了索引，但是在运行查询时索引不起作用并且解释类型是“ALL”并且没有使用键。仅当 where 子句采用“OR”或“IN”运算符时，索引才起作用。列的数据类型是： line_item_product_code : TEXT line_item_unblended_cost : DOUBLE product_region : TEXT line_item_usage_start_date : TIMESTAMP 我这个查询的主要目标是：优化仪表板中的快速响应查询，我有这个 192 列和 9m+ 行的 csv 大小的表13+ GB。我想索引将解决我处理这个查询的问题。是否有这些运营商的替代品或任何其他解决方案？

score 0 · Accepted Answer

x = 1  OR  x = 2

由优化器变成这样：

x IN (1,2)

函数的使用DATE()是不必要的date('2019-08-01')。字符串本身很好。为了这：

and line_item_usage_start_date between date('2019-08-01')
                                   AND date('2019-08-31 00:00:00')

我会写这个“范围”：

and line_item_usage_start_date >= '2019-08-01'
and line_item_usage_start_date  < '2019-08-01' + INTERVAL 1 MONTH

你有 3 个条件WHERE。建立一个索引

所有的=测试，然后
任何 IN 测试，然后
最多一个“范围”

因此，这可能是最佳索引：

INDEX(product_region,    -- first, because of '='
      line_item_product_code,
      line_item_usage_start_date)  -- last

大概会EXPLAIN说Using temporary, Using filesort。这些是由GROUP BY和引起的ORDER BY。尽管如此，一个不同的索引，专注于GROUP BY，可能会消除一种：

INDEX(line_item_product_code, line_item_usage_start_date) -- same order as the GROUP BY

事实证明，我的第一个索引推荐肯定更好——因为它可以同时=做GROUP BY.

糟糕，有一个问题：

line_item_product_code：文本

我怀疑“product_code”是否需要TEXT. 像这样的东西VARCHAR(30)不会很大吗？关键是，TEXT列不能在INDEX. 因此，还要更改该列的数据类型。

更多食谱：http: //mysql.rjweb.org/doc.php/index_cookbook_mysql

我有这个 192 列的表

那是相当大的。

不要使用FORCE INDEX-- 今天可能会有所帮助，但明天当数据分布发生变化时会受到伤害。

mysql - 用于索引表的 OR 和 IN 运算符的替代方案

1 回答 1

Related

Reference