我正在评估一个研究项目的日志文件并将它们插入 MySQL 数据库。现在我有一个查询,我需要在没有完全匹配值的情况下连接来自其他表的数据。
“logdata”表包含我要分析的移动单元的数据,“basepositions”保存基站的 GPS 坐标。在“logdata”的两个数据字段中,记录了相应基站的发送方位置。存在的问题是:基站的位置会随着时间的推移略有变化(GPS 波动,只是一些度数),所以我必须使用 BETWEEN 操作来寻找正确的条目,如下面的查询所示。这并不完美,但基站只有100个左右,所以这里的成本是可以忍受的。
第二个连接也存在同样的问题。在那里我必须从另一个表中获取一个有效性标志。这里的问题是:两个日志大约每秒写入一次,但不同步。所以我必须再次使用 BETWEEN 和 1 秒的时间范围扫描相应的行。
由于行数的原因,第二次扫描让我的执行时间爆炸式增长。我认为扩散相关是这里的问题。
这两个表都有下面概述中给出的索引。
有没有办法加快查询速度?由于性能问题,我的数据库设置现在需要 30 小时才能完成,才能返回大约 20000 行。
我很感激任何帮助。
日志数据(约 300.000.000 个条目):
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| unit | tinytext | YES | MUL | NULL | |
| timestamp | bigint(20) | YES | | NULL | |
| logid | int(11) | YES | | NULL | |
| d1 | bigint(20) | YES | | NULL | |
| d2 | bigint(20) | YES | | NULL | |
| d3 | bigint(20) | YES | | NULL | |
| d4 | bigint(20) | YES | | NULL | |
| d5 | bigint(20) | YES | | NULL | |
| d6 | bigint(20) | YES | | NULL | |
| d7 | bigint(20) | YES | | NULL | |
| d8 | bigint(20) | YES | | NULL | |
| d9 | bigint(20) | YES | | NULL | |
| d10 | bigint(20) | YES | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
基本位置(约 100 个条目):
+----------------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+--------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | NULL | |
| GPSLONGITUDE | varchar(50) | YES | | NULL | |
| LOCATION | varchar(100) | YES | | NULL | |
| GPSLATITUDE | varchar(50) | YES | | NULL | |
| GPSALTITUDE | varchar(50) | YES | | NULL | |
| ISUNDERTEST | tinyint(1) | YES | | 0 | |
+----------------------------+--------------+------+-----+---------+-------+
有效性(约 200.000.000 个条目):
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| unit | tinytext | YES | MUL | NULL | |
| timestamp | bigint(20) | YES | | NULL | |
| logid | int(11) | YES | | NULL | |
| d1 | bigint(20) | YES | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
到目前为止我的查询:
SELECT
logdata.unit,
logdata.timestamp,
logdata.d1,
logdata.d2,
cast(logdata.d3/10000000 as decimal(15, 10)),
cast(logdata.d4/10000000 as decimal(15, 10)),
logdata.d5,
logdata.d6,
logdata.d7,
logdata.d8,
cast(logdata.d9/10000000 as decimal(15, 10)),
cast(logdata.d10/10000000 as decimal(15, 10)),
BASEID,
validity.d1
FROM
logdata
JOIN
basepositions
ON
cast(GPSLATITUDE / 10000000 as decimal(15,10)) BETWEEN cast(d3 / 10000000 as decimal(15,10)) - 0.0001 AND cast(d3 / 10000000 as decimal(15,10)) + 0.0001
AND
cast(GPSLONGITUDE / 10000000 as decimal(15,10)) BETWEEN cast(d4 / 10000000 as decimal(15,10)) - 0.0001 AND cast(d4 / 10000000 as decimal(15,10)) + 0.0001
JOIN
validity
ON
validity.unit = logdata.unit
AND
validity.logid = 12345
AND
validity.timestamp BETWEEN logdata.timestamp - 500 AND logdata.timestamp + 499
WHERE
logdata.unit = "IVS${IVS}"
AND
logdata.logid = 111222
AND
BASEID = 012;
指数:
+-------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| logdata | 0 | PRIMARY | 1 | id | A | 301433830 | NULL | NULL | | BTREE | | |
| logdata | 1 | unit_logid_timestamp | 1 | unit | A | 18 | 6 | NULL | YES | BTREE | | |
| logdata | 1 | unit_logid_timestamp | 2 | logid | A | 18 | NULL | NULL | YES | BTREE | | |
| logdata | 1 | unit_logid_timestamp | 3 | timestamp | A | 301433830 | NULL | NULL | YES | BTREE | | |
+-------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
编辑(评论字段很小): 我认为问题在于构建的连接。EXPLAIN EXTENDED 显示,查询优化器正在将所有三个表连接在一起,这意味着要查看 300.000.000 * 200.000.000 * 100 行。当我将带有“有效性”的连接重写为子查询时,mysql 只是连接“logdata”和“basepositions”。我认为数据类型更改可能是以后优化的一个因素,但首先我认为我必须通过优化查询计划来降低一些运行时类。我没有足够的经验知道我可以做些什么来进一步优化这个查询。对“有效性”时间戳的单个查询将立即返回。基站位置的单次查询也非常快。我不
编辑2:
这是您要求的 idex。我让他们使用“显示索引”
“有效性”的索引:
+-------------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| validity | 0 | PRIMARY | 1 | id | A | 194863653 | NULL | NULL | | BTREE | | |
| validity | 1 | unit_logid_timestamp | 1 | unit | A | 18 | 6 | NULL | YES | BTREE | | |
| validity | 1 | unit_logid_timestamp | 2 | logid | A | 18 | NULL | NULL | YES | BTREE | | |
| validity | 1 | unit_logid_timestamp | 3 | timestamp | A | 194863653 | NULL | NULL | YES | BTREE | | |
+-------------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
“基本位置”的索引:
+----------------------+------------+---------------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------------+------------+---------------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| basepositions | 0 | PRIMARY | 1 | ID | A | 109 | NULL | NULL | | BTREE | | |
+----------------------+------------+---------------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
解释上面的查询:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE basepositions const PRIMARY PRIMARY 4 const 1 100.00
1 SIMPLE logdata ref unit_logid_timestamp unit_logid_timestamp 14 const,const 4150932 100.00 Using where
1 SIMPLE validity ref unit_logid_timestamp unit_logid_timestamp 14 const,const 3294136 100.00 Using where
解释(在为纬度/经度添加索引之后):
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE basepositions const PRIMARY,lat_lon,lat,lon PRIMARY 4 const 1 100.00
1 SIMPLE logdata ref unit_logid_timestamp unit_logid_timestamp 14 const,const 4150932 100.00 Using where
1 SIMPLE validity ref unit_logid_timestamp unit_logid_timestamp 14 const,const 3294136 100.00 Using where