4

I would like to know in which cases a single hashed column with an index should be preferred against using a composite index. In my case i have two tables with approx 1 mio datasets, one of it receives an updated value from the other table (it is an data-import routine). MySQL 5.1 and 5.5 is used in my environments.

Example:

CREATE TABLE permanent (
ref_id_1 INT(10),
ref_id_2 INT(10),
ref_id_3 INT(10),
ref_id_4 INT(10),
ref_date DATE,
value INT(10));

CREATE TABLE import (
ref_id_1 INT(10),
ref_id_2 INT(10),
ref_id_3 INT(10),
ref_id_4 INT(10),
ref_date DATE,
value INT(10));

//Option 1
ALTER TABLE import ADD UNIQUE INDEX idx_composite(ref_id_1,ref_id_2,ref_id_3,ref_id_4,ref_date);
//Option 2
ALTER TABLE import ADD hash_col CHAR(32);
UPDATE import SET hash_col = MD5(CONCAT(ref_id_1,ref_id_2,ref_id_3,ref_id_4,ref_date)); 
ALTER TABLE import ADD UNIQUE INDEX idx_hash_col(hash_col);

Of course, the permanent table will also have an hash_col and the required indizes. Now the two possible update/joins will be:

//Join via columns 
UPDATE permanent
INNER JOIN import
ON import.ref_id_1 = permanent.ref_id_2
AND import.ref_id_2 = permanent.ref_id_2
AND import.ref_id_3 = permanent.ref_id_3
AND import.ref_id_4 = permanent.ref_id_4
AND import.ref_date = permanent.ref_date 
SET permanent.value = import.value;

//Join via Hash-col
UPDATE permanent
INNER JOIN import
ON import.hash_col = permanent.hash_col
SET permanent.value = import.value

So which approach should be preferred? Is there a rule of thumb like "if you have more than X columns, use hash instead". Thanks in advance!

p.s. this is my first question here, so please excuse if something is missing.

4

1 回答 1

1

Use a composite index. Comparing ten integers is faster than comparing two strings. Besides, in theory, MD5 hashes are not guaranteed to be unique (althought this shouldn't be too much of a practical issue).

于 2012-11-05T22:56:54.513 回答