0

我需要运行 MYSQL 程序,该程序将从表 pointValues 中选择单点的时间序列记录。当然,记录的数量可能很大——所以我只需要选择其中的 200 个(限制)来绘制图表。我决定按照以下逻辑划分所有记录:

a) 记录/(limit/2) -> 每个组中的行数
b) 从 a) 中定义的每个组中获取最小值和最大值。

  1. 我在高性能查询方面没有太多经验,所以我需要一些帮助来提高这个过程的性能。

    CREATE TABLE secChart 
    (
         id int(11) NOT NULL,
         dataPointId int(11) NOT NULL,
         dataType int(11),
         pointValue DOUBLE NOT NULL,
         ts bigint(20) NOT NULL 
    ) ENGINE=InnoDB;
    
    DROP PROCEDURE dataChart;
    DROP PROCEDURE IF EXISTS dataChart;
    
    DELIMITER //
    
    CREATE PROCEDURE dataChart(iter int, step int, pointId int, setStart int, 
    

    setStop int) BEGIN TRUNCATE TABLE secChart;

        SET @i = 0;
        SET @iter = iter;
        SET @pointId = pointId; myLoop: 
    
        WHILE (@i < @iter) 
        DO 
             IF @i = 0 THEN
                SET setStart = 0;
                SET setStop = step-1; 
             END IF; 
    
             IF @i > 0 THEN
                SET setStart = @i * step;
                SET setStop = setStart + (step-1);
                SET @start = setStart;
                SET @stop = setStop; 
             END IF; **
    
             INSERT INTO secChart
                 (SELECT *
                  FROM pointvalues
                  WHERE dataPointId = @pointId
                    AND (pointValue = (SELECT MIN(pointValue)
                                       FROM 
                                           (SELECT *
                                            FROM flex2.pointvalues
                                            WHERE dataPointId = @pointId
                                            ORDER BY id ASC
                                            LIMIT setStart, setStop) AS b)
                         OR pointValue = (SELECT MAX(pointValue)
                                          FROM
                                              (SELECT *
                                               FROM flex2.pointvalues
                                               WHERE dataPointId = @pointId
                                               ORDER BY id ASC
                                               LIMIT setStart, setStop) AS b2))
                 ORDER BY id
                 LIMIT 0, 2);**
    
         SET @i = @i + 1; 
    
         IF @i > @iter 
         THEN 
             LEAVE myLoop; 
         END IF; 
    END WHILE; 
    END//
    DELIMITER ;
    
    CALL dataChart(100, 80, 1, 0, 0);
    

    对于近 15 000 条记录,需要 158 秒...

我测试的另一个选择:

INSERT INTO idx
VALUES(@start, @stop , @i, step);

INSERT INTO stt
    (SELECT * 
     FROM
         ((SELECT * 
           FROM
               (SELECT id, pointValue, ts
                FROM flex2.pointvalues AS pv
                WHERE pv.dataPointId = 1
                ORDER BY id
                LIMIT setStart, setStop) AS minval
           ORDER BY pointValue DESC
           LIMIT 0, 1)
          UNION
           (SELECT *  
            FROM
                (SELECT id, pointValue, ts
                 FROM flex2.pointvalues AS pv
                 WHERE pv.dataPointId = 1
                 ORDER BY id
                 LIMIT setStart, setStop) AS maxval
            ORDER BY pointValue ASC
            LIMIT 0, 1)) AS selectScore);

对于将近 15 000 条记录,它需要 58 秒 - 更快但不够快。

第三个想法是选择 n 行(例如从 12 000 行中选择 200 行)

SELECT COUNT(*) 
FROM flex2.pointvalues 
WHERE dataPointId = 1 
  AND id IN (SELECT id 
             FROM flex2.pointvalues 
             WHERE dataPointId = 1 
               AND id BETWEEN 
                           (SELECT MIN(id) FROM flex2.pointvalues 
                            WHERE dataPointId = 1) AND 
                           (SELECT MAX(id) FROM flex2.pointvalues 
                            WHERE dataPointId = 1)) 
              AND id % 10 = 0;

最好的办法是修复想法 2 的性能。请帮忙!

4

2 回答 2

0

我将我的想法优化为以下解决方案:

`DROP procedure chartSelection;      
 DROP table chartSelectionTable;       
 Delimiter //        
 CREATE PROCEDURE chartSelection(iter int, step int, pointId int, setStart int, setStop int)
 BEGIN
 CREATE temporary TABLE if not exists chartSelectionTable(id int(11) NOT NULL, 
 pointValue double NOT NULL,ts bigint(20) NOT NULL) engine=InnoDB;

TRUNCATE TABLE chartSelectionTable;

SET @i=0;
SET @iter = iter; 
SET @pointId = pointId;
chart: WHILE (@i < @iter)  DO 
IF @i = 0 THEN
    SET setStart = 0; 
    SET setStop = step-1;
END IF;
IF @i >0 THEN
    SET setStart = @i*step;
    SET setStop = setStart + (step-1);
END IF;
     insert into chartSelectionTable(   

     select id, pointValue,ts from (

     select * from pointvalues where (
            id = (select id from( select * from flex2.pointvalues as pv where pv.dataPointId=pointId order by ts limit setStart,setStop) as minval  order by pointValue asc limit 0,1) 
            or
            id = (select id from( select * from flex2.pointvalues as pv where pv.dataPointId=pointId order by ts limit setStart,setStop) as maxval  order by pointValue desc limit 0,1)
      )) as b       
                );

SET @i = @i+1;
IF @i > @iter THEN
    LEAVE chart;
END IF;
END WHILE;
select * from     chartSelectionTable; 
drop table chartSelectionTable;
END//
delimiter ;

`

我这样称呼它: CALL chartSelection(100,90,1,0,0);

但是从 Java(服务器级别)的形式调用它:

`import java.sql.Statement;
  (...)
 Statement createProcedureStmt = conn.createStatement();`

`createProcedureStmt.execute( "CREATE PROCEDURE `chartSelection` "+
                "(iter int, step int, pointId int, setStart int, setStop int )"+
                " BEGIN "+
                " TRUNCATE TABLE chartSelectionTable;"+
                    " SET @i=0;"+
                    " SET @iter = iter; "+
                    " SET @pointId = pointId;"+
                    " chart: WHILE (@i < @iter)  DO "+
                    " IF @i = 0 THEN"+
                                "   SET setStart = 0;"+ 
                    " SET setStop = step-1;"+
                            " END IF;"+
                        " IF @i >0 THEN"+
                        " SET setStart = @i*step;"+
                        " SET setStop = setStart + (step-1);"+
                            " END IF;"+
                        " insert into chartSelectionTable("+                                 
                                     " select id, pointValue,ts from ("+                                 
                                     " select * from pointvalues where ("+
                                     " id = (select id from( select * from flex2.pointvalues as pv where pv.dataPointId=pointId order by ts limit setStart,setStop) as minval  order by pointValue asc limit 0,1)"+ 
                                            " or"+
                                    " id = (select id from( select * from flex2.pointvalues as pv where pv.dataPointId=pointId order by ts limit setStart,setStop) as maxval  order by pointValue desc limit 0,1)"+
                                    " )) as b       "+
                              "         );"+

                             " SET @i = @i+1;"+
                        " IF @i > @iter THEN"+
                        " LEAVE chart;"+
                        "   END IF;"+
                        " END WHILE;"+
                        " select * from     chartSelectionTable; "+
                    " END ");

        /* Call stored procedure */
        java.sql.CallableStatement stmt = conn
                .prepareCall("{call chartSelection(?,?,?,0,0)}");
        stmt.setInt(1, iteracje);
        stmt.setInt(2, step);
        stmt.setInt(3, dataPointId);`

评估时间减少了大约 50%,但仍然太长。更改的内容: - 临时表而不是真实表, - 仅选择所有列的 pointValue 和时间戳, - 选择方式。

如何将其转换为非循环解决方案?如何走出限制和秩序?

---- 总结---- 1)我有 1 000 000 条 id = x 的记录,2)我有限制 = 200 条图表的记录 3)所以在 Java 中我计算 - 在我调用 SQL 之前,1 000 000 / (limit/2) = 100 组中的 10 000 条记录。4)主要问题:逐步选择每个组(基于时间戳)并收集组中的最小值和最大值(pointValue和时间戳)。

于 2018-07-09T12:02:26.373 回答
0

首先是一些问题和评论。

  • 您想从时间序列数据集中选择均匀分布的点吗?
  • 你在做“烛台”(因此是最小值和最大值)吗?
  • 不要使用任何类型的循环;它会很慢。
  • 瞄准单个SELECT(无循环)来抓取所有所需的项目。SQL 经过优化可以做到这一点。
  • 为什么表中有 5 列而不是简单的 2 列(对于 x 和 y,也就是 ts 和 value)?
  • 您希望您的图表基于时间还是基于表格中的索引?出现数据丢失的情况,因此您需要按时绘制图表。
  • 避免OFFSET(即,LIMIT m,n),它必须扫描所有前面的行;因此很慢。

让我们退后一步。让我们先研究一种使用方法来AVG代替MINand ,而不是获取烛台MAX。一旦你掌握了这一点,那么也许烛台就可以完成了。

SELECT FLOOR(ts / 300000) AS '5-minute-intervals',  -- see below
       AVG(value)
    FROM tbl
    WHERE ts ...   -- limit the time span
    GROUP BY 1     -- shorthand, referring to the FLOOR(..)

300000 假定为毫秒ts(如 Java)。您根据时间跨度预先计算了该数字,基于您的“200 ...”讨论。

这里的所有都是它的。

现在,对于烛台:

SELECT FLOOR(ts / 300000) AS '5-minute-intervals',
       MIN(value),
       MAX(value)
    FROM tbl
    WHERE ts ...   -- limit the time span
    GROUP BY 1     -- shorthand, referring to the FLOOR(..)

然后你的图表程序需要取最小值和最大值,然后以某种方式变成一条垂直线。如果你真的想要某些百分位数而不是最小值和最大值,那会变得非常混乱。

获取间隔...

请使用人类可读的时间。我真的不喜欢从数据派生但未能使用“圆形”数字的 x 和 y 轴。(例如,当目标是大约 10 次抽动时,他们使用 143、286、...、1432 而不是 100、200、...、1500,但他们认为这意味着正好10 次抽动。)

要做到这一点,“正确”涉及找到整体的最大值和最小值,进行一些涉及 floor() 和/或 ceil() 的算术运算。并折腾一些启发式方法来获得“整数”。这可以是另一个讨论。它是纯粹的算法——无论是用你的编程语言还是 SQL 都可以很好地实现。

于 2018-07-06T14:20:23.133 回答