I have a table with a list of 2.5 million doctors. I also have tables for accepted insurance, languages spoken, and for specialties (taxonomy) provided. The doctor table is like:
CREATE TABLE `doctors` (
`doctor_id` int(10) NOT NULL AUTO_INCREMENT,
`city_id` int(10) NOT NULL DEFAULT '0',
`d_gender` char(1) NOT NULL DEFAULT 'U',
`s_insurance` int(6) NOT NULL DEFAULT '0',
`s_languages` int(6) NOT NULL DEFAULT '0',
`s_taxonomy` int(6) NOT NULL DEFAULT '0',
PRIMARY KEY (`doctor_id`)
) ENGINE=InnoDB;
The other information is stored as such:
CREATE TABLE `doctors_insurance` (
`assoc_id` int(10) NOT NULL AUTO_INCREMENT,
`doctor_id` int(10) NOT NULL DEFAULT '0',
`insurance_id` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`assoc_id`)
) ENGINE=InnoDB;
CREATE TABLE `doctors_languages` (
`assoc_id` int(10) NOT NULL AUTO_INCREMENT,
`doctor_id` int(10) NOT NULL DEFAULT '0',
`language_id` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`assoc_id`)
) ENGINE=InnoDB;
CREATE TABLE `doctors_taxonomy` (
`assoc_id` int(10) NOT NULL AUTO_INCREMENT,
`doctor_id` int(10) NOT NULL DEFAULT '0',
`taxonomy_id` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`assoc_id`)
) ENGINE=InnoDB;
Naturally each doctor supports various different insurance plans, maybe speaks multiple languages, and some doctors can have several different specialties (taxonomy). So I opted to have separate tables for indexing, this way need I add new indices or drop old ones, I can simply remove the tables and not have to wait the long time it takes to actually do it the old fashioned way.
Also because of other scaling techniques to consider in the future, classic JOINs make no difference to me right now, so I'm not worried about it.
Indexing by name was easy:
CREATE TABLE `indices_doctors_names` (
`ref_id` int(10) NOT NULL AUTO_INCREMENT,
`doctor_id` int(10) NOT NULL DEFAULT '0',
`practice_id` int(10) NOT NULL DEFAULT '0',
`name` varchar(120) NOT NULL DEFAULT '',
PRIMARY KEY (`ref_id`),
KEY `name` (`name`)
) ENGINE=InnoDB;
However when I wanted to allow people to search by the city, specialties, insurance, language, and gender and other demographics, I created his:
CREATE TABLE `indices_doctors_demos` (
`ref_id` int(10) NOT NULL AUTO_INCREMENT,
`doctor_id` int(10) NOT NULL DEFAULT '0',
`city_id` int(10) NOT NULL DEFAULT '0',
`taxonomy_id` int(6) NOT NULL DEFAULT '0',
`insurance_id` int(6) NOT NULL DEFAULT '0',
`language_id` int(6) NOT NULL DEFAULT '0',
`gender_id` char(1) NOT NULL DEFAULT 'U',
PRIMARY KEY (`ref_id`),
KEY `index` (`city_id`,`taxonomy_id`,`insurance_id`,`language_id`,`gender_id`)
) ENGINE=InnoDB;
The idea is that there will be an entry for each change in specialty, insurance, or language primarily, though others will still the same. This creates an obvious problem. If a doctor has 3 specialties, supports 3 insurance providers, and speaks 3 languages, this alone means this specific doctor has 27 entries. So 2.5 million entries easily balloons into far more.
There has to be a better approach to do this, but how can it be done? Again, I'm not interested in moving to classic indexing techniques and using JOINs because it will quickly become too slow, I need a method that can scale out easily.