2

Hourly temperature readings are collected from several animal cages in an animal shelter every 30 minutes and dumped them into a file. A cron processes that data and inserts it into a MYSQL database. Currently all 48 temperature readings for the day are stored in one table, and I have it updating them as the data comes in or if no record exists, a new record is created storing the first temperature.

We currently have a table for Cage information and one for the cage temperature readings. Our total number of cages is 45. The amount of data we have is 7 years (roughly 2557 days). The total number of records for the temperature table is: 115,065

We will be adding different locations and additional cages to the system, thus the total number of cages will be greater than 1,000. We expect the data use to grow very fast.

Is there a more efficient way of structuring the table below to optimize read speed? The data is used to generate graphs of every cage that gets displayed every morning, and 30min crons to check for inadequate ventilation inside cage.

The current temperature table is as follows:

CREATE TABLE `temperature_readings` (
  `CAGE_ID` int(10) NOT NULL DEFAULT '0',
  `INT_VALUE_0000` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0030` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0100` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0130` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0200` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0230` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0300` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0330` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0400` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0430` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0500` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0530` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0600` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0630` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0700` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0730` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0800` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0830` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0900` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_0930` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1000` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1030` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1100` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1130` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1200` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1230` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1300` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1330` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1400` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1430` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1500` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1530` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1600` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1630` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1700` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1730` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1800` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1830` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1900` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_1930` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_2000` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_2030` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_2100` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_2130` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_2200` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_2230` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_2300` decimal(5,2) DEFAULT NULL,
  `INT_VALUE_2330` decimal(5,2) DEFAULT NULL,
  PRIMARY KEY (`CAGE_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

My thoughts were to either Normalize multiple temperature readings into a halfhour_read table such as

halfhour_read{
 - cage_id
 - datetime
 - temperature reading
} 

or Hash temperature_readings by either cage_id, or a todays(date) so that it is partitioned.

As far as I understand, the first option would bump the number of records from 115,065 to 5,523,120 and would grow quickly in comparison, yielding a future space problem.

4

2 回答 2

5

是的,标准化你的结构。只是为了好玩,试着用你当前的结构写下面的查询:上周笼子 A 的温度峰值是多少?

跟随你的直觉并使用这个结构:

CREATE TABLE readings (
    cage_id INT,
    dateofreading DATETIME,
    temperature DECIMAL(10,2),
    PRIMARY KEY (cage_id, dateofreading),
    INDEX (dateofreading, cage_id) -- suggested index, useful for time-based queries
)

预期的行大小(仅数据):4 + 8 + 4 = 16 字节。

16 字节 x 每天 48 个读数 x 10,000 个笼子 x 365 天 = 每年 2.6 GB。如果需要,乘以 3 或 4 以提供索引。无论如何,不​​用担心存储空间。

由于适当的索引,即使它包含数十亿条记录,从该表中提取数据应该几乎是即时的。无论如何,您的工作集(过去几周的数据)可能总是适合内存。

(如果您的要求是“100,000 个笼子,每天有 4,800,000 个读数”,那么您主要关心的不是存储空间,而是每秒处理数百万次插入)

为了将工作数据集保持在合理的大小,是的,对您的表进行分区,或者只是不时地将较旧的记录移动到存档表中。

于 2013-10-06T09:41:41.630 回答
1

绝对可以正常化...但是您将需要更大的磁盘:-)

实际上,500 万个短行并不是很多数据。MySQL 可以处理的远不止这些。500 万reading行大约为 100MB。

您还应该考虑按年份对数据进行分区,因为历史数据永远不会改变。

于 2013-10-06T07:34:27.237 回答