0

I know this has been asked before at least in this thread: is php sort better than mysql "order by"?

However, I'm still not sure about the right option here since the performance on doing the sorting on PHP side is almost 40 times faster. This MySQL query runs in about 350-400ms

SELECT 
keywords as id, 
SUM(impressions) as impressions, 
SUM(clicks) as clicks, 
SUM(conversions) as conversions, 
SUM(not_ctr) as not_ctr, 
SUM(revenue) as revenue, 
SUM(cost) as cost 
FROM visits WHERE campaign_id = 104 GROUP BY keywords(it's an integer) DESC

Keywords and campaign_id columns are indexed.

Using about 150k rows and returns around 1500 rows in total. The results are then recalculated (we calculate click through rates, conversion rates, ROI etc, as well as the totals for the whole result set). The calculations are done in PHP.

Now my idea was to store the results with PHP APC for quick retrieval, however we need to be able to order these results by the columns as well as the calculated values, therefore if I wanted to order by click-through rate I'd have to use (SUM(clicks) / (SUM(impressions) - SUM(not_ctr)) within the query which makes it around 40ms slower and the initial 400ms is a really long time already.

In addition we paginate these results, but adding LIMIT 0,200 doesn't really affect the performance.

While testing the APC approach I executed the query, did the additional calculations and stored the array in memory so it would only be executed once during the initial request and that worked like a charm. Fetching and sorting the array from memory only took around 10ms, however the script memory usage was about 25mb. Maybe it's worth loading the results into a memory table and then querying that table directly?

This is all done on my local machine(i7, 8gb ram) which has the default MySQL install and the production server is a 512MB box on Rackspace on which I haven't tested yet, so if possible ignore the server setup.

So the real question is: Is it worth using memory tables or should I just use the PHP sorting and ignore the RAM usage since I can always upgrade the RAM? What other options would you consider in optimizing the performance?

4

1 回答 1

1

In general, you want to do sorting on the database server and not in the application. One good reason is that the database should be implementing parallel sorts and it has access to indexes. A general rule may not be applicable in all circumstances.

I'm wondering if you indexes are helping you. I would recommend that you try the query:

  1. With no indexes
  2. With an index only on campaign_id
  3. With both indexes

Indexes are not always useful. One particularly important factor is called "selectivity". If you only have two campaigns in the table, then you are probably better off doing a full-table scan rather than indirectly searching through an index. This because particularly important when the table does not fit into memory (resulting in a condition where every row requires load a page into cache).

Finally, if this is going to be an application that expands beyond your single server, be careful. What is optimal on a single machine may not be optimal in a different environment.

于 2013-05-21T19:40:32.843 回答