1

我正在评估一个研究项目的日志文件并将它们插入 MySQL 数据库。现在我有一个查询,我需要在没有完全匹配值的情况下连接来自其他表的数据。

“logdata”表包含我要分析的移动单元的数据,“basepositions”保存基站的 GPS 坐标。在“logdata”的两个数据字段中,记录了相应基站的发送方位置。存在的问题是:基站的位置会随着时间的推移略有变化(GPS 波动,只是一些度数),所以我必须使用 BETWEEN 操作来寻找正确的条目,如下面的查询所示。这并不完美,但基站只有100个左右,所以这里的成本是可以忍受的。

第二个连接也存在同样的问题。在那里我必须从另一个表中获取一个有效性标志。这里的问题是:两个日志大约每秒写入一次,但不同步。所以我必须再次使用 BETWEEN 和 1 秒的时间范围扫描相应的行。

由于行数的原因,第二次扫描让我的执行时间爆炸式增长。我认为扩散相关是这里的问题。

这两个表都有下面概述中给出的索引。

有没有办法加快查询速度?由于性能问题,我的数据库设置现在需要 30 小时才能完成,才能返回大约 20000 行。

我很感激任何帮助。

日志数据(约 300.000.000 个条目):

+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| id        | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| unit      | tinytext            | YES  | MUL | NULL    |                |
| timestamp | bigint(20)          | YES  |     | NULL    |                |
| logid     | int(11)             | YES  |     | NULL    |                |
| d1        | bigint(20)          | YES  |     | NULL    |                |
| d2        | bigint(20)          | YES  |     | NULL    |                |
| d3        | bigint(20)          | YES  |     | NULL    |                |
| d4        | bigint(20)          | YES  |     | NULL    |                |
| d5        | bigint(20)          | YES  |     | NULL    |                |
| d6        | bigint(20)          | YES  |     | NULL    |                |
| d7        | bigint(20)          | YES  |     | NULL    |                |
| d8        | bigint(20)          | YES  |     | NULL    |                |
| d9        | bigint(20)          | YES  |     | NULL    |                |
| d10       | bigint(20)          | YES  |     | NULL    |                |
+-----------+---------------------+------+-----+---------+----------------+

基本位置(约 100 个条目):

+----------------------------+--------------+------+-----+---------+-------+
| Field                      | Type         | Null | Key | Default | Extra |
+----------------------------+--------------+------+-----+---------+-------+
| ID                         | int(11)      | NO   | PRI | NULL    |       |
| GPSLONGITUDE               | varchar(50)  | YES  |     | NULL    |       |
| LOCATION                   | varchar(100) | YES  |     | NULL    |       |
| GPSLATITUDE                | varchar(50)  | YES  |     | NULL    |       |
| GPSALTITUDE                | varchar(50)  | YES  |     | NULL    |       |
| ISUNDERTEST                | tinyint(1)   | YES  |     | 0       |       |
+----------------------------+--------------+------+-----+---------+-------+

有效性(约 200.000.000 个条目):

+-----------+---------------------+------+-----+---------+----------------+
| Field     | Type                | Null | Key | Default | Extra          |
+-----------+---------------------+------+-----+---------+----------------+
| id        | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| unit      | tinytext            | YES  | MUL | NULL    |                |
| timestamp | bigint(20)          | YES  |     | NULL    |                |
| logid     | int(11)             | YES  |     | NULL    |                |
| d1        | bigint(20)          | YES  |     | NULL    |                |
+-----------+---------------------+------+-----+---------+----------------+

到目前为止我的查询:

SELECT
    logdata.unit,
    logdata.timestamp,
    logdata.d1,
    logdata.d2,
    cast(logdata.d3/10000000 as decimal(15, 10)),
    cast(logdata.d4/10000000 as decimal(15, 10)),
    logdata.d5,
    logdata.d6,
    logdata.d7,
    logdata.d8,
    cast(logdata.d9/10000000 as decimal(15, 10)),
    cast(logdata.d10/10000000 as decimal(15, 10)),
    BASEID,
    validity.d1
FROM
    logdata
JOIN
    basepositions
ON
    cast(GPSLATITUDE / 10000000 as decimal(15,10)) BETWEEN cast(d3 / 10000000 as decimal(15,10)) - 0.0001 AND cast(d3 / 10000000 as decimal(15,10)) + 0.0001
    AND
    cast(GPSLONGITUDE / 10000000 as decimal(15,10)) BETWEEN cast(d4 / 10000000 as decimal(15,10)) - 0.0001 AND cast(d4 / 10000000 as decimal(15,10)) + 0.0001
JOIN
    validity
ON
    validity.unit = logdata.unit 
    AND
    validity.logid = 12345
    AND
    validity.timestamp BETWEEN logdata.timestamp - 500 AND logdata.timestamp + 499

WHERE
    logdata.unit = "IVS${IVS}"
    AND
    logdata.logid = 111222
    AND 
    BASEID = 012;

指数:

+-------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table             | Non_unique | Key_name             | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| logdata           |          0 | PRIMARY              |            1 | id          | A         |   301433830 |     NULL | NULL   |      | BTREE      |         |               |
| logdata           |          1 | unit_logid_timestamp |            1 | unit        | A         |          18 |        6 | NULL   | YES  | BTREE      |         |               |
| logdata           |          1 | unit_logid_timestamp |            2 | logid       | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| logdata           |          1 | unit_logid_timestamp |            3 | timestamp   | A         |   301433830 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

编辑(评论字段很小): 我认为问题在于构建的连接。EXPLAIN EXTENDED 显示,查询优化器正在将所有三个表连接在一起,这意味着要查看 300.000.000 * 200.000.000 * 100 行。当我将带有“有效性”的连接重写为子查询时,mysql 只是连接“logdata”和“basepositions”。我认为数据类型更改可能是以后优化的一个因素,但首先我认为我必须通过优化查询计划来降低一些运行时类。我没有足够的经验知道我可以做些什么来进一步优化这个查询。对“有效性”时间戳的单个查询将立即返回。基站位置的单次查询也非常快。我不

编辑2:

这是您要求的 idex。我让他们使用“显示索引”

“有效性”的索引:

+-------------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                   | Non_unique | Key_name             | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| validity                |          0 | PRIMARY              |            1 | id          | A         |   194863653 |     NULL | NULL   |      | BTREE      |         |               |
| validity                |          1 | unit_logid_timestamp |            1 | unit        | A         |          18 |        6 | NULL   | YES  | BTREE      |         |               |
| validity                |          1 | unit_logid_timestamp |            2 | logid       | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| validity                |          1 | unit_logid_timestamp |            3 | timestamp   | A         |   194863653 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

“基本位置”的索引:

+----------------------+------------+---------------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                | Non_unique | Key_name                              | Seq_in_index | Column_name                | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------------+------------+---------------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| basepositions        |          0 | PRIMARY                               |            1 | ID                         | A         |         109 |     NULL | NULL   |      | BTREE      |         |               |
+----------------------+------------+---------------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

解释上面的查询:

id      select_type     table           type    possible_keys           key                     key_len     ref         rows        filtered    Extra
1       SIMPLE          basepositions   const   PRIMARY                 PRIMARY                 4           const       1           100.00  
1       SIMPLE          logdata         ref     unit_logid_timestamp    unit_logid_timestamp    14          const,const 4150932     100.00      Using where
1       SIMPLE          validity        ref     unit_logid_timestamp    unit_logid_timestamp    14          const,const 3294136     100.00      Using where

解释(在为纬度/经度添加索引之后):

id      select_type     table         type    possible_keys            key                     key_len  ref             rows    filtered    Extra
1       SIMPLE          basepositions const   PRIMARY,lat_lon,lat,lon  PRIMARY                 4        const           1       100.00
1       SIMPLE          logdata       ref     unit_logid_timestamp     unit_logid_timestamp    14       const,const     4150932 100.00      Using where
1       SIMPLE          validity      ref     unit_logid_timestamp     unit_logid_timestamp    14       const,const     3294136 100.00      Using where
4

0 回答 0