I am planning the table structuring and programming for a table which will have have about a billion rows.
Very often I would like to do a SELECT COUNT(*) ON mytable WHERE somecol="5"
. somecol
will have an index on it and is an INT
.
Option 1 is that I just have my one giant table and use SELECT COUNT(*)
as above.
Option 2 is I alternatively could have an additional table, called mytableofcounts
in which I have only two columns: somecol
and num
and in which I keep a record of the total counts for somecol
. This table would only have about a few hundred thousand rows and somecol
will be unique. Then I can SELECT num FROM mytableofcounts WHERE somecol="5"
instead.
I would prefer option 1 because it's both easier and more efficient in terms of storage and programming, however my concern is that it might be slow. Would I save on processing speed by going with the extra table of option 2, or is it equally fast to go with option 1?