I would like to know in which cases a single hashed column with an index should be preferred against using a composite index. In my case i have two tables with approx 1 mio datasets, one of it receives an updated value from the other table (it is an data-import routine). MySQL 5.1 and 5.5 is used in my environments.
Example:
CREATE TABLE permanent (
ref_id_1 INT(10),
ref_id_2 INT(10),
ref_id_3 INT(10),
ref_id_4 INT(10),
ref_date DATE,
value INT(10));
CREATE TABLE import (
ref_id_1 INT(10),
ref_id_2 INT(10),
ref_id_3 INT(10),
ref_id_4 INT(10),
ref_date DATE,
value INT(10));
//Option 1
ALTER TABLE import ADD UNIQUE INDEX idx_composite(ref_id_1,ref_id_2,ref_id_3,ref_id_4,ref_date);
//Option 2
ALTER TABLE import ADD hash_col CHAR(32);
UPDATE import SET hash_col = MD5(CONCAT(ref_id_1,ref_id_2,ref_id_3,ref_id_4,ref_date));
ALTER TABLE import ADD UNIQUE INDEX idx_hash_col(hash_col);
Of course, the permanent table will also have an hash_col and the required indizes. Now the two possible update/joins will be:
//Join via columns
UPDATE permanent
INNER JOIN import
ON import.ref_id_1 = permanent.ref_id_2
AND import.ref_id_2 = permanent.ref_id_2
AND import.ref_id_3 = permanent.ref_id_3
AND import.ref_id_4 = permanent.ref_id_4
AND import.ref_date = permanent.ref_date
SET permanent.value = import.value;
//Join via Hash-col
UPDATE permanent
INNER JOIN import
ON import.hash_col = permanent.hash_col
SET permanent.value = import.value
So which approach should be preferred? Is there a rule of thumb like "if you have more than X columns, use hash instead". Thanks in advance!
p.s. this is my first question here, so please excuse if something is missing.