-1

我在一家广告公司实习,我已经实现了一个工具来从 facebook 收集所有必要的数据并将它们导入数据库。

现在我正在尝试操作这些数据,首先是制作一些测试用例并获得一些结果。这些表每天增长 35k 行,因此在使用该工具一个月后,我注意到用于获取某些广告点击总和的查询开始变慢。

我在问我使用的查询是否可以加快如果我将它与连接一起使用以及如何。

这是我对每个广告的点击总和的查询(使用 adgroup_id、campaign_id 作为连接到其他表):

<!-- language-all: lang-sql -->
SELECT t1.adgroup_id, t1.campaign_id, t1.creative_ids, SUM( t2.clicks ) AS clicks
FROM adgroups t1, adgroup_stats t2
WHERE t1.adgroup_id = t2.adgroup_id
GROUP BY t1.creative_ids
ORDER BY clicks DESC 

目前,在专用服务器上完成查询需要 3 秒,我猜 6 个月后,随着表的增长,它将超过 60 秒左右。

编辑:这是查询的解释(虽然这是我第一次实际使用它,但不太清楚它的含义)

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  t2  ALL PRIMARY NULL    NULL    NULL    671549  Using temporary; Using filesort
1   SIMPLE  t1  ref PRIMARY PRIMARY 8   fbads.t2.adgroup_id 358 Using index
4

1 回答 1

0

That looks like a full table scan, and with that rapid growth small performance changes won't make a big difference on the long run. You need a different approach.

I would calculate aggregates for the previous months (days, etc) with a cron job, and when you need stats then merge that with the fresh results (using the query you already wrote). That why you only have to scan the fresh record, which means the queries is going to be fast.

Alternatively, you can keep up-to-date counters in the adgroups table, and update them on each click. Not sure if mysql is the right tool for this, I can recommend MongoDB, it can do very fast atomic increments on fields, and though it doesn't give you as strict guarantees (ACID) as a relational database, in this case it's not a problem, ad clicks aren't mission critical data, nobody is going to complain, if you lose < 0.01% percent of click information.

于 2013-07-09T09:33:22.647 回答