情况如下:
我有一个 SAAS 应用程序,它是一个简单的 RSS Feed 阅读器。我想大多数人都知道这是什么——用户订阅 RSS 提要,然后从中读取项目。没什么新鲜的。一个提要可以有多个订阅者。
我已经为用户实现了一些统计数据,但我认为我没有选择正确的方法,因为随着用户和提要数量的增长,事情变得越来越慢。
这就是我现在正在做的事情:
每小时获取每个提要的文章总数:
SELECT COUNT(*) FROM articles WHERE feed_id=?
获取先前的值来计算增量(这有点慢):
SELECT value FROM feeds_stats WHERE feed_id=? AND name='total_articles' ORDER BY date DESC LIMIT 1
插入新值和增量:
INSERT INTO feeds_stats (date,feed_id,name,value,delta) VALUES ('".date("Y-m-d H:i:s",$global_timestamp)."','".$feed_id','total_articles','".$value."','".($value-$old_value)."')
为每个用户获取他的提要,并为每个提要获取他已阅读的文章数量:
SELECT COUNT(*) FROM users_articles ua JOIN articles a ON a.id=ua.article_id WHERE a.feed_id='%s' AND ua.user_id='%s' AND ua.read=1
users_articles 是一个表,其中包含每个用户每篇文章的阅读状态
然后再次获取增量:
SELECT value FROM users_feeds_stats WHERE user_id='?' AND feed_id='?' AND name='total_reads' ORDER BY date DESC LIMIT 1
并插入新值 + delta:
INSERT INTO users_feeds_stats (date,user_id,feed_id,name,value,delta) VALUES ('".date("Y-m-d H:i:s",$global_timestamp)."','".$user_id."','".$feed_id."','total_reads','".$value."','".($value-$old_value)."')
处理完用户的所有提要后,将进入聚合部分:
这有点棘手,我认为这里应该有很大的优化空间。这是 PHP 中的实际聚合函数:
<?php
function aggregate_user_stats($user_id=false,$feed_id=false){
global $global_timestamp;
// defined dimensions
$feed_types[0] = array("days_back" => 31, "group_by" => "DATE_FORMAT(date, '%Y-%m-%d')");
$feed_types[1] = array("days_back" => 31, "group_by" => "WEEKDAY(date)+1");
$feed_types[2] = array("days_back" => 31, "group_by" => "HOUR(date)");
if($user_id){
$where = " WHERE id=".$user_id;
}
$feed_where = "";
$getusers = mysql_query("SELECT id FROM users".$where)or die(__LINE__." ".mysql_error());
while($user = mysql_fetch_assoc($getusers)){
if($feed_id){
$feed_where = " AND feed_id=".$feed_id;
}
$user_feeds = array();
$getfeeds = mysql_query("SELECT feed_id FROM subscriptions WHERE user_id='".$user["id"]."' AND active=1".$feed_where)or die(__LINE__." ".mysql_error());
while($row = mysql_fetch_assoc($getfeeds)){
foreach($feed_types as $tab => $type){
$getdata = mysql_query("
SELECT ".$type["group_by"]." AS date, name, SUM(delta) AS delta FROM feeds_stats WHERE feed_id = '".$row["feed_id"]."' AND name='total_articles' AND date > DATE_SUB(NOW(), INTERVAL ".$type["days_back"]." DAY) GROUP BY name, ".$type["group_by"]."
UNION
SELECT ".$type["group_by"]." AS date, name, SUM(delta) AS delta FROM users_feeds_stats WHERE user_id = '".$user["id"]."' AND feed_id = '".$row["feed_id"]."' AND name='total_reads' AND date > DATE_SUB(NOW(), INTERVAL ".$type["days_back"]." DAY) GROUP BY name, ".$type["group_by"]."
")or die(__LINE__." ".mysql_error());
$data = array();
while($row = mysql_fetch_assoc($getdata)){
$data[$row["date"]][$row["name"]] = $row["delta"];
}
if(count($data)){
db_start_trx();
mysql_query("DELETE FROM stats_feeds_over_time WHERE feed_id='".$row["feed_id"]."' AND user_id='".$user["id"]."' AND tab='".$tab."'")or die(__LINE__." ".mysql_error());
foreach($data as $time => $keys){
mysql_query("REPLACE INTO stats_feeds_over_time (feed_id,user_id,tab,date,total_articles,total_reads,total_favs) VALUES ('".$row["feed_id"]."','".$user["id"]."','".$tab."','".$time."','".$keys["total_articles"]."','".$keys["total_reads"]."','".$keys["total_favs"]."')")or die(__LINE__." ".mysql_error());
}
db_commit_trx();
}
}
}
}
}
一些注意事项:
编辑:以下是所涉及表的 DDL:
CREATE TABLE `articles` (
`id` INTEGER(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`feed_id` INTEGER(11) UNSIGNED NOT NULL,
`date` INTEGER(10) UNSIGNED NOT NULL,
`date_updated` INTEGER(11) UNSIGNED NOT NULL,
`title` VARCHAR(1000) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`url` VARCHAR(2000) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`author` VARCHAR(200) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`hash` CHAR(32) COLLATE utf8_general_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `feed_id_hash` (`feed_id`, `hash`),
KEY `date` (`date`),
KEY `url` (`url`(255))
)ENGINE=InnoDB
AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT='';
CREATE TABLE `users_articles` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`user_id` INTEGER(11) UNSIGNED NOT NULL,
`article_id` INTEGER(11) UNSIGNED NOT NULL,
`subscription_id` INTEGER(11) UNSIGNED NOT NULL,
`read` TINYINT(4) UNSIGNED NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`, `article_id`),
KEY `article_id` (`article_id`),
KEY `subscription_id` (`subscription_id`)
)ENGINE=InnoDB
CHECKSUM=1 AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT='';
CREATE TABLE `feeds_stats` (
`id` INTEGER(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`feed_id` INTEGER(11) UNSIGNED NOT NULL,
`date` DATETIME NOT NULL,
`name` VARCHAR(50) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`value` INTEGER(11) NOT NULL,
`delta` INTEGER(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `feed_id` (`feed_id`),
KEY `date` (`date`)
)ENGINE=InnoDB
AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT='';
CREATE TABLE `users_feeds_stats` (
`id` INTEGER(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`user_id` INTEGER(11) UNSIGNED NOT NULL DEFAULT '0',
`feed_id` INTEGER(11) UNSIGNED NOT NULL,
`date` DATETIME NOT NULL,
`name` VARCHAR(50) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`value` INTEGER(11) NOT NULL,
`delta` INTEGER(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `feed_id` (`feed_id`),
KEY `user_id` (`user_id`),
KEY `date` (`date`)
)ENGINE=InnoDB
AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT='';
CREATE TABLE `stats_feeds_over_time` (
`feed_id` INTEGER(11) UNSIGNED NOT NULL,
`user_id` INTEGER(11) NOT NULL,
`tab` INTEGER(11) NOT NULL,
`date` VARCHAR(30) COLLATE utf8_general_ci NOT NULL DEFAULT '',
`total_articles` DOUBLE(9,2) UNSIGNED NOT NULL,
`total_reads` DOUBLE(9,2) UNSIGNED NOT NULL,
`total_favs` DOUBLE(9,2) UNSIGNED NOT NULL,
PRIMARY KEY (`feed_id`, `user_id`, `tab`, `date`)
)ENGINE=InnoDB
AUTO_INCREMENT=0
CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
COMMENT='';
在聚合函数的末尾,表 stats_feeds_over_time 中有一个 REPLACE。该表仅包含将显示在图表上的记录,因此实际的绘图过程不涉及繁重的查询。
最后,这是由此产生的图表:
如果有人指出我在哪里以及如何优化这个解决方案的正确方向,我会很高兴,即使这意味着放弃 MySQL 进行统计。
我在RRDTool方面有很长的经验,但这里的情况有所不同,因为“一天中的时间”、“一周中的一天”聚合。