mysql - 存储大量日志数据的最佳方式

Question

我需要关于存储统计数据的最佳方法的建议。在 Django 上有一个项目，它有一个包含 30 000 个在线游戏的数据库（mysql）。

每场比赛都有三个统计参数：

观看次数，
播放次数，
点赞数

现在我需要每天存储这三个参数的历史数据，所以我正在考虑创建一个包含五列的单个数据库：

gameid, number of views, plays, likes, date (day-month-year data).

所以最后，每场比赛的每一天都将记录在一行中，所以在一天内该表将有 30000 行，在 10 天后将有 300000 行，在一年内将有 10 950 000 行. 我不是 DBA 方面的专家，但这告诉我，这很快就会成为性能问题。我不是在谈论 5 年后会发生什么。简单图表需要此表中收集的数据

(daily, weekly, monthly, custom range).

也许您对如何存储这些数据有更好的想法？也许 noSQL 在这种情况下会更合适？真的需要你对this.d的建议

score 5 · Accepted Answer

postgresql 中的分区非常适合大型日志。首先创建父表：

create table  game_history_log (
    gameid integer,
    views integer,
    plays integer,
    likes integer,
    log_date date
);

现在创建分区。在这种情况下，每个月一个，900 k 行，会很好：

create table game_history_log_201210 (
    check (log_date between '2012-10-01' and '2012-10-31')
) inherits (game_history_log);

create table game_history_log_201211 (
    check (log_date between '2012-11-01' and '2012-11-30')
) inherits (game_history_log);

注意每个分区中的检查约束。如果您尝试插入错误的分区：

insert into game_history_log_201210 (
    gameid, views, plays, likes, log_date
) values (1, 2, 3, 4, '2012-09-30');
ERROR:  new row for relation "game_history_log_201210" violates check constraint "game_history_log_201210_log_date_check"
DETAIL:  Failing row contains (1, 2, 3, 4, 2012-09-30).

分区的优点之一是它只会在正确的分区中搜索，无论有多少年的数据，它都会显着且一致地减少搜索大小。这里解释了搜索某个日期：

explain
select *
from game_history_log
where log_date = date '2012-10-02';
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..30.38 rows=9 width=20)
   ->  Append  (cost=0.00..30.38 rows=9 width=20)
         ->  Seq Scan on game_history_log  (cost=0.00..0.00 rows=1 width=20)
               Filter: (log_date = '2012-10-02'::date)
         ->  Seq Scan on game_history_log_201210 game_history_log  (cost=0.00..30.38 rows=8 width=20)
               Filter: (log_date = '2012-10-02'::date)

请注意，除了父表之外，它只扫描了正确的分区。显然，您可以在分区上建立索引以避免顺序扫描。

继承分区

score 3 · Accepted Answer

11M 行并不过分，但一般索引和主键的集群将更重要（在 InnoDB 上）。我建议将 (game_id, date) 作为主键，以便对某个游戏的所有数据的查询都在连续的行中。此外，当只需要最新数据时，您可能希望保留一个单独的表格，其中仅包含排名游戏的当前值等。

score 1 · Accepted Answer

我建议根本不要使用关系数据库。静态是一种变化很快的东西，因为新数据不断出现。我相信像 HBase 这样的东西会更适合 - 因为在这里添加新记录的速度更快。

score 1 · Accepted Answer

10kk 数据的 MySQL 没有性能问题。您可以通过游戏 id 应用分区（至少需要 5.5 版本）。

我有这样的数据的 MySQL DB，目前 980kk 行没有问题。

score 0 · Accepted Answer

不要保留每一行，而是以高精度保留最近的数据，以 mdeium 精度保留中间数据，以低精度保留长期数据。这是rrdtool采用的方法，可能比 mysql 更好。

mysql - 存储大量日志数据的最佳方式

5 回答 5

Related

Reference