Suppose I have the following database setup (a simplified version from what I actually have):
Table: news_posting (500,000+ entries)
| --------------------------------------------------------------|
| posting_id | name | is_active | released_date | token |
| 1 | posting_1 | 1 | 2013-01-10 | 123 |
| 2 | posting_2 | 1 | 2013-01-11 | 124 |
| 3 | posting_3 | 0 | 2013-01-12 | 125 |
| --------------------------------------------------------------|
PRIMARY posting_id
INDEX sorting ON (is_active, released_date, token)
Table: news_category (500 entries)
| ------------------------------|
| category_id | name |
| 1 | category_1 |
| 2 | category_2 |
| 3 | category_3 |
| ------------------------------|
PRIMARY category_id
Table: news_cat_match (1,000,000+ entries)
| ------------------------------|
| category_id | posting_id |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 2 | 2 |
| 3 | 2 |
| 1 | 3 |
| 2 | 3 |
| ------------------------------|
UNIQUE idx (category_id, posting_id)
My task is as follows. I must get a list of 50 latest news postings (at some offset) that are active, that are before today's date, and that are in one of the 20 or so categories that are specified in the request. Before I choose the 50 news postings to return, I must sort the appropriate news postings by token in descending order. My query is currently similar to the following:
SELECT DISTINCT posting_id
FROM news_posting np
INNER JOIN news_cat_match ncm ON (ncm.posting_id = np.posting_id AND ncm.category_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20))
WHERE np.is_active = 1
AND np.released_date < '2013-01-28'
ORDER BY np.token DESC LIMIT 50
With just one specified category_id
the query does not involve a filesort
and is reasonably fast because it does not have to process removal of duplicate results. However, calling EXPLAIN
on the above query that has multiple category_id
's returns a table that says that there is filesort
to be done. And, the query is extremely slow on my data set.
Is there any way to optimize the table setup and/or the query?