1

我有三个表,我需要根据一个公共字段加入它们的数据。

示例伪表定义:

barometer_log(设备,压力浮动,采样时间时间戳)

temperature_log(设备整数,温度浮点数,采样时间时间戳)

幅度日志(设备整数,幅度浮点数,utcTime 时间戳)

每个表最终将包含数十亿行,但目前每个表包含大约 500,000 行。

我需要能够将表​​中的数据(FULL JOIN)组合起来,以便将sampleTime合并为一列(COALESE),从而为我提供以下行: 设备、采样时间、压力、温度、幅度

我需要能够通过指定设备以及开始和结束日期来查询数据,例如 选择 .... where device=1000 and sampleTime between '2011-10-11' and '2011-10-17'

我尝试了使用 RIGHT 和 LEFT 连接的不同 UNION ALL 技术,如MySql full join (union) and ordering on multiple date columnsMySql full join (union) and ordering on multiple date columns,但查询时间太长,我必须运行数小时后停止它或引发有关临时文件大小的错误。对我来说,查询这三个表并在可接受的时间范围内合并它们的输出的最佳方法是什么?

这是建议的完整表定义。注意:未包含设备表。

幅度日志

CREATE TABLE magnitude_log (
  device int(11) NOT NULL,
  magnitude float not NULL,
  sampleTime timestamp NOT NULL,  
  PRIMARY KEY  (device,sampleTime),
  CONSTRAINT magnitudeLog_device 
    FOREIGN KEY (device) 
      REFERENCES device (id) 
      ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

气压计日志

CREATE TABLE barometer_log (
  device int(11) NOT NULL,
  pressure float not NULL,  
  sampleTime timestamp NOT NULL,  
  PRIMARY KEY  (device,sampleTime),
  CONSTRAINT barometerLog_device 
    FOREIGN KEY (device) 
      REFERENCES device (id) 
      ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

温度日志

CREATE TABLE temperature_log (
  device int(11) NOT NULL,
  sampleTime timestamp NOT NULL,  
  temperature float default NULL,
  PRIMARY KEY  (device,sampleTime),
  CONSTRAINT temperatureLog_device 
    FOREIGN KEY (device) 
      REFERENCES device (id) 
      ON DELETE CASCADE
)  ENGINE=InnoDB DEFAULT CHARSET=utf8;
4

3 回答 3

1

(device, sampleTime)首先,在要求的时间内从所有 3 个表中获取所有组合:

-------- Q --------
    SELECT device, sampleTime
    FROM magnitude_log
    WHERE device = 1000
      AND sampleTime >= '2011-10-11' 
      AND sampleTime <  '2011-10-18'
UNION
    SELECT device, sampleTime
    FROM barometer_log
    WHERE device = 1000
      AND sampleTime >= '2011-10-11' 
      AND sampleTime <  '2011-10-18'
UNION
    SELECT device, sampleTime
    FROM temperature_log
    WHERE device = 1000
      AND sampleTime >= '2011-10-11' 
      AND sampleTime <  '2011-10-18'

然后将其用于LEFT JOIN3 个表:

SELECT
    q.device
  , q.sampleTime
  , b.pressure
  , t.temperature
  , m.magnitude
FROM 
    ( Q ) AS q
  LEFT JOIN
    ( SELECT * 
      FROM magnitude_log 
      WHERE device = 1000
        AND sampleTime >= '2011-10-11' 
        AND sampleTime <  '2011-10-18'
    ) AS m
      ON (m.device, m.sampleTime) = (q.device, q.sampleTime)
  LEFT JOIN
    ( SELECT * 
      FROM barometer_log 
      WHERE device = 1000
        AND sampleTime >= '2011-10-11' 
        AND sampleTime <  '2011-10-18'
    ) AS b
      ON (b.device, b.sampleTime) = (q.device, q.sampleTime)
  LEFT JOIN
    ( SELECT * 
      FROM temperature_log_log 
      WHERE device = 1000
        AND sampleTime >= '2011-10-11' 
        AND sampleTime <  '2011-10-18'
    ) AS t
      ON (t.device, t.sampleTime) = (q.device, q.sampleTime)

您拥有的时间越长,查询与子查询的冲突就越长UNION。您可以考虑将其作为一个单独的表,可能通过触发器使用来自其他三个表Q的唯一组合来填充它。(device, sampleTime)

于 2011-11-29T07:21:24.593 回答
0

如果您正在查询一个小时间范围和大量设备,您可能需要考虑反转 PK 索引以使其成为 (timeRange,device)。

您可能想要在设备或 (device,timeRange) 上使用二级索引。

于 2011-11-29T08:22:05.147 回答
0

假设该表device包含您并不真正需要正确完全联接的所有设备,您只需要离开加入其他表device并按采样时间分组,如下所示:

SELECT
    d.id AS device,
    COALESCE(m.sampleTime, b.sampleTime, t.sampleTime) AS sampleTime,
    m.magnitude,
    b.pressure,
    t.temperature
FROM device AS d
    LEFT JOIN magnitude_log AS m ON d.id = m.device
    LEFT JOIN barometer_log AS b ON d.id = b.device
    LEFT JOIN temperature_log AS t ON d.id = t.device
WHERE d.id = 1000
GROUP BY device, sampleTime
HAVING sampleTime BETWEEN '2011-10-11' AND '2011-10-17'

然而,这可能会很慢,因为它将在时间跨度上实际匹配之前进行分组,但如果单个设备本身不会有数百万行,那应该不是问题。但是,如果是这样,我建议将 sampleTime 放在每个连接上:

SELECT
    d.id AS device,
    COALESCE(m.sampleTime, b.sampleTime, t.sampleTime) AS sampleTime,
    m.magnitude,
    b.pressure,
    t.temperature
FROM device AS d
    LEFT JOIN magnitude_log AS m ON d.id = m.device AND m.sampleTime BETWEEN '2011-10-11' AND '2011-10-17'
    LEFT JOIN barometer_log AS b ON d.id = b.device AND b.sampleTime BETWEEN '2011-10-11' AND '2011-10-17'
    LEFT JOIN temperature_log AS t ON d.id = t.device AND t.sampleTime BETWEEN '2011-10-11' AND '2011-10-17'
WHERE d.id = 1000
GROUP BY device, sampleTime
HAVING sampleTime IS NOT NULL

希望有帮助!

于 2011-11-29T07:26:04.680 回答