我有一个非常大的数据库(contacts
有大约 30 亿个条目,people
有大约 2.8 亿个条目,而其他表的条目数量可以忽略不计)。我运行的大多数其他查询都非常快。但是,我遇到了一个更复杂的查询,它真的很慢。我想知道是否有任何方法可以加快速度。
首先,这是我的架构:
CREATE TABLE activities (id INTEGER PRIMARY KEY, name TEXT NOT NULL);
CREATE TABLE contacts (
id INTEGER PRIMARY KEY,
person1_id INTEGER NOT NULL,
person2_id INTEGER NOT NULL,
duration REAL NOT NULL, -- hours
activity_id INTEGER NOT NULL
-- FOREIGN_KEY(person1_id) REFERENCES people(id),
-- FOREIGN_KEY(person2_id) REFERENCES people(id)
);
CREATE TABLE people (
id INTEGER PRIMARY KEY,
state_id INTEGER NOT NULL,
county_id INTEGER NOT NULL,
age INTEGER NOT NULL,
gender TEXT NOT NULL, -- M or F
income INTEGER NOT NULL
-- FOREIGN_KEY(state_id) REFERENCES states(id)
);
CREATE TABLE states (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
abbreviation TEXT NOT NULL
);
CREATE INDEX activities_name_index on activities(name);
CREATE INDEX contacts_activity_id_index on contacts(activity_id);
CREATE INDEX contacts_duration_index on contacts(duration);
CREATE INDEX contacts_person1_id_index on contacts(person1_id);
CREATE INDEX contacts_person2_id_index on contacts(person2_id);
CREATE INDEX people_age_index on people(age);
CREATE INDEX people_county_id_index on people(county_id);
CREATE INDEX people_gender_index on people(gender);
CREATE INDEX people_income_index on people(income);
CREATE INDEX people_state_id_index on people(state_id);
CREATE INDEX states_abbreviation_index on states(abbreviation);
CREATE INDEX states_name_index on states(name);
请注意,我已经在数据库中的每一列上创建了一个索引。我不关心数据库的大小;速度是我所关心的。
下面是一个查询示例,正如预期的那样,它几乎可以立即运行:
SELECT count(*) FROM people, states WHERE people.state_id=states.id and states.abbreviation='IA';
这是麻烦的查询:
SELECT * FROM contacts WHERE rowid IN
(SELECT contacts.rowid FROM contacts, people, states
WHERE contacts.person1_id=people.id AND people.state_id=states.id AND states.name='Kansas'
INTERSECT
SELECT contacts.rowid FROM contacts, people, states
WHERE contacts.person2_id=people.id AND people.state_id=states.id AND states.name='Missouri');
现在,我认为会发生的是每个子查询都会使用我创建的每个相关索引来加快速度。但是,当我显示查询计划时,我看到:
sqlite> EXPLAIN QUERY PLAN SELECT * FROM contacts WHERE rowid IN (SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person1_id=people.id AND people.state_id=states.id AND states.name='Kansas' INTERSECT SELECT contacts.rowid FROM contacts, people, states WHERE contacts.person2_id=people.id AND people.state_id=states.id AND states.name='Missouri');
0|0|0|SEARCH TABLE contacts USING INTEGER PRIMARY KEY (rowid=?) (~25 rows)
0|0|0|EXECUTE LIST SUBQUERY 1
2|0|2|SEARCH TABLE states USING COVERING INDEX states_name_index (name=?) (~1 rows)
2|1|1|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows)
2|2|0|SEARCH TABLE contacts USING COVERING INDEX contacts_person1_id_index (person1_id=?) (~12 rows)
3|0|2|SEARCH TABLE states USING COVERING INDEX states_name_index (name=?) (~1 rows)
3|1|1|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows)
3|2|0|SEARCH TABLE contacts USING COVERING INDEX contacts_person2_id_index (person2_id=?) (~12 rows)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (INTERSECT)
事实上,如果我显示我发布的第一个查询的查询计划,我会得到:
sqlite> EXPLAIN QUERY PLAN SELECT count(*) FROM people, states WHERE people.state_id=states.id and states.abbreviation='IA';
0|0|1|SEARCH TABLE states USING COVERING INDEX states_abbreviation_index (abbreviation=?) (~1 rows)
0|1|0|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows)
最后,这是一个使用我创建的索引之一的查询,以证明它们确实被使用:
SELECT contacts.* FROM contacts, people, states WHERE contacts.person1_id=people.id AND people.state_id=states.id AND states.name='Iowa';
该查询生成以下查询计划:
sqlite> EXPLAIN QUERY PLAN SELECT contacts.* FROM contacts, people, states WHERE contacts.person1_id=people.id AND people.state_id=states.id AND states.name='Iowa';
0|0|2|SEARCH TABLE states USING COVERING INDEX states_name_index (name=?) (~1 rows)
0|1|1|SEARCH TABLE people USING COVERING INDEX people_state_id_index (state_id=?) (~5569556 rows)
0|2|0|SEARCH TABLE contacts USING INDEX contacts_person1_id_index (person1_id=?) (~12 rows)
为什么 SQLite 使用覆盖索引而不是我创建的索引?这是正确的行为吗?