1

I have a table in mysql like this

CREATE TABLE IF NOT EXISTS `connections` (
  `src` int(10) unsigned NOT NULL,
  `sport` smallint(5) unsigned NOT NULL,
  `dst` int(10) unsigned NOT NULL,
  `dport` smallint(5) unsigned NOT NULL,
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`src`,`sport`,`dst`,`dport`,`time`),
  KEY `time` (`time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

2.5 million records daily inserted in this table.

When i want to select records for a period of time like a day. it takes about 7 minutes. how can i improve it.

i`m using ruby on rails version 4.0.0

My selection is like this

connections = Connection.select('src, dst, UNIX_TIMESTAMP(time) as time')
                  .where(time: timeFrom..timeTo)
                  .order('time ASC')

After selection from database i have a loop like this :

connections.each do |con|

        link = getServerID(con['src'])
        link = getServerID(con['dst']) if link == 0

        @total[link].append [con['time'] * 1000, con['dst']]
end

in this loop i have a bit process on src and dst then i add it to a hash this sections takes along and my computer crashed

4

3 回答 3

1

首先,您应该尝试在没有 Rails 的情况下直接针对数据库运行 SQL 查询。这有助于识别瓶颈:查询本身是慢还是 rails 慢?我想 SQL 部分应该不是问题,但首先要仔细检查。

我猜你最大的问题在于connections.each。这会将所有匹配的行加载到您的应用程序中并创建ActiveRecord它的模型。让我们做一些数学运算:(2.5M entries * 1KB只是猜测,可能更多)会导致2.5GB数据加载到您的内存中。您可能会看到使用的改进connection.find_each,因为它以较小的批量加载连接。

方法有什么作用getServerID?它被称为5M时代。

我很确定您将无法对这段代码进行太多改进。似乎是问题的错误数据库或错误的算法。由于您不太可能希望2.5M在网站上显示记录,因此最好告诉我们您想要实现的目标。

于 2013-10-08T09:07:07.380 回答
0

您可以尝试表分区:

http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

还有一张不错的幻灯片:

http://www.slideshare.net/datacharmer/mysql-partitions-tutorial

于 2013-10-08T07:18:27.997 回答
0

如前所述,获取 2.5 个 mio 条目需要大量内存/cpu 功率。尝试批量获取记录。

Rails 内置了批处理支持:http: //api.rubyonrails.org/classes/ActiveRecord/Batches.html

connections.find_each do |con|
    link = getServerID(con['src'])
    link = getServerID(con['dst']) if link == 0

    @total[link].append [con['time'] * 1000, con['dst']]
end

如果这不能解决您的问题,您应该考虑找到一种更好的方法来避免每次都遍历如此多的记录。

于 2013-10-08T14:12:31.870 回答