If you want a normalised approach, that would be:
wordLetters{
INT wordID,
CHAR[1] letter,
INT count,
PK(wordID, letter)
}
words{
INT wordID PK,
VARCHAR(255) word UNIQUE
}
but this approach has a serious problem in terms of performance - namely it needs a full table scan on the word table. I am going to assume that there are not too many letters and suggest this approach:
words{
INT wordID PK,
VARCHAR(255) word UNIQUE,
INT cA KEY,
INT cB KEY,
...
INT cZ KEY,
KEY (cE, cT, cA, cO, cI, cN),
...
}
A lookup query will be long but it will use indexes efficiently and it is generated by the PHP code anyways:
If the user has [a,n,t]
, fetch the available words as:
SELECT word FROM words WHERE
cA <= 1 AND cB = 0 AND cC = 0 AND ... AND cY = 0 AND cZ = 0
This query will (probably) use the "ETAOIN" index as not many words that don't need an 'E' exist.
At this point, performance depends on the choice of indexes available for the database only and you can always add more indexes as deemed useful (even at runtime).
On database indexes:
An ordinary index is just a sorted list of items with an appropriate tree built over the list, enabling efficient range lookup (get all elements from x to y).
An ordinary index is defined by its sorting order. The sorting order is: order by some column first, then on another one, then on another one... .
For example, the [E,T,A,O,I,N]
index will have all words sorted: first all words that don't need an E
, then all words that need one E
, then all words that need two E
... . The words that need the same amount of E
s will be sorted: first all words that don't need a T
, then all words that need it once, then all words that need two T
s ... . Of the words that need the same number of E
s and T
s, those that don't need an A
come first.
If the database is asked to fetch all words that don't need an E
or a T
and at most one 'X', it can use this index to fulfill the first two requirements, then check all words within the range E=0, T=0
.
The particular choice, ETAOIN
is based on the phrase ETAOIN SHRDLU which orders the twelve most frequent letters in the english language by their frequency - this means that if this index is used, it should filter out the largest possible amount of words.
You use the example RSTLNE
. This index will/may get used when the player has no R
s, or S
s. Benchmarking the lookup may tell you how much time was saved by using each particular index.
You can use an EXPLAIN EXTENDED
query to see which indexes are considered and subsequently used for each particular query and how many rows are expected to be filtered out. Ex.:
EXPLAIN EXTENDED
SELECT word FROM words
WHERE cA=0 AND cB<=1 AND cC=0 AND ...