4

I am planning the table structuring and programming for a table which will have have about a billion rows.

Very often I would like to do a SELECT COUNT(*) ON mytable WHERE somecol="5". somecol will have an index on it and is an INT.

Option 1 is that I just have my one giant table and use SELECT COUNT(*) as above.

Option 2 is I alternatively could have an additional table, called mytableofcounts in which I have only two columns: somecol and num and in which I keep a record of the total counts for somecol. This table would only have about a few hundred thousand rows and somecol will be unique. Then I can SELECT num FROM mytableofcounts WHERE somecol="5" instead.

I would prefer option 1 because it's both easier and more efficient in terms of storage and programming, however my concern is that it might be slow. Would I save on processing speed by going with the extra table of option 2, or is it equally fast to go with option 1?

4

3 回答 3

4

If you have an index on somecol, then the database is basically implementing your second method.

When it scans the index, there are two approaches the engine can take. It can get the count from the index directly, or it can use the index to fetch the pages. If you have this:

select count(anothercol)
from mytable
where somecol = 5;

Then the engine can identify the rows where somecol = 5, but it still has to read the data pages to determine whether or not anothercol is NULL.

I'm pretty sure count(*) will just scan the index and not read the data pages. If you wanted to be sure then use:

select count(somecol)
from mytable
where somecol = 5;
于 2013-03-30T13:12:47.163 回答
0

Option 2 could be an idea of indexing... there are several types of indexing ...I strongly recomment you to read them. then you can make your own dicision.

I used your second option long time ago in counting rows and putting the value in another table. and it was really faster than the option 1... especially if the data is huge. but you need to keep updating it.

Regards

于 2013-03-30T13:12:22.067 回答
0

It very depends on the type of application.

If you have more updating than reading operations (i.e. a backend system) the first solution is easier and actually faster, since you don't have to perform the COUNT operation at each update.

The second option is better for a frontend application, where you have a lot of views that require that counting result, therefore working on a billion-rows table is not very convenient. With this solution, you could have an automatic trigger to manage the counter update, provided that the counter is not updated thousand times a day.

于 2013-03-30T13:16:41.690 回答