0

我已经构建了 Web 应用程序作为消除人员表中不必要数据的工具,该应用程序主要用于过滤有效获得选举权的人员的所有数据。起初,当主表仍然只有几行时,这不是问题,但当表充满大约 200K 行时,它真的很糟糕(6 秒)(真的更糟糕,因为表将高达 600 万行) .

我有如下表设计,我正在加入 4 个表(区域表从省、市、区和镇开始)。每个区域表都通过自己的 id 相互关联:

CREATE TABLE `peoples` (
    `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
    `id_prov` smallint(2) NOT NULL,
    `id_city` smallint(2) NOT NULL,
    `id_district` smallint(2) NOT NULL,
    `id_town` smallint(4) NOT NULL,
    `tps` smallint(4) NOT NULL,
    `urut_xls` varchar(20) NOT NULL,
    `nik` varchar(20) NOT NULL,
    `name` varchar(60) NOT NULL,
    `place_of_birth` varchar(60) NOT NULL,
    `birth_date` varchar(30) NOT NULL,
    `age` tinyint(3) NOT NULL DEFAULT '0',
    `sex` varchar(20) NOT NULL,
    `marital_s` varchar(20) NOT NULL,
    `address` varchar(160) NOT NULL,
    `note` varchar(60) NOT NULL,
    `m_name` tinyint(1) NOT NULL DEFAULT '0',
    `m_birthdate` tinyint(1) NOT NULL DEFAULT '0' ,
    `format_birthdate` tinyint(1) NOT NULL DEFAULT '0' ,
    `m_sex` tinyint(1) NOT NULL DEFAULT '0' COMMENT ,
    `m_m_status` tinyint(1) NOT NULL DEFAULT '0' ,
    `sex_double` tinyint(1) NOT NULL DEFAULT '0',
    `id_import` bigint(10) NOT NULL,
    `id_workspace` tinyint(4) unsigned NOT NULL DEFAULT '0',
    `stat_valid` smallint(1) NOT NULL DEFAULT '0' ,
    `add_manual` tinyint(1) unsigned NOT NULL DEFAULT '0' ,
    `insert_by` varchar(12) NOT NULL,
    `update_by` varchar(12) DEFAULT NULL,
    `mark_as_duplicate` smallint(1) NOT NULL DEFAULT '0' ,
    `mark_as_trash` smallint(1) NOT NULL DEFAULT '0' ,
    `in_date_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (`id`),
    KEY `ind_import` (`id_import`),
    KEY `ind_duplicate` (`mark_as_duplicate`),
    KEY `id_workspace` (`id_workspace`),
    KEY `tambah_manual` (`tambah_manual`),
    KEY `il` (`stat_valid`,`mark_as_trash`,`in_date_time`),
    KEY `region` (`id_prov`,`id_kab`,`id_kec`,`id_kel`,`tps`),
    KEY `name` (`name`),
    KEY `place_of_birth` (`place_of_birth`),
    KEY `ind_birth` (`birthdate`(10)),
    KEY `ind_sex` (`sex`(2))
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;

镇:

CREATE TABLE `town` (
    `id` smallint(4) NOT NULL,
    `id_district` smallint(2) NOT NULL,
    `id_city` smallint(2) NOT NULL,
    `id_prov` smallint(2) NOT NULL,
    `name_town` varchar(60) NOT NULL,
    `handprint` blob,
    `pps_1` varchar(60) DEFAULT NULL,
    `pps_2` varchar(60) DEFAULT NULL,
    `pps_3` varchar(60) DEFAULT NULL,
    `tpscount` smallint(2) DEFAULT NULL,
    `pps_4` varchar(60) DEFAULT NULL,
    `pps_5` varchar(60) DEFAULT NULL,
    PRIMARY KEY (`id_prov`,`id_kab`,`id_kec`,`id`),
    KEY `name_town` (`name_town`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

和像这样的查询

SELECT `E`.`id`, `E`.`id_prov`, `E`.`id_city`, `E`.`id_district`, `E`.`id_town`, 
  `B`.`name_prov`,`C`.`name_city`,`D`.`name_district`, `A`.`name_town`,
  `E`.`tps`, `E`.`urut_xls`, `E`.`nik`,`E`.`name`,`E`.`place_of_birth`,
  `E`.`birth_date`, `E`.age, `E`.`sex`,   `E`.`marital_s`, `E`.`address`,
  `E`.`note` 
FROM peoples E
JOIN test_prov B ON  E.id_prov = B.id
JOIN test_city C ON E.id_city = C.id 
    AND (C.id_prov=B.id)
JOIN test_district D ON E.id_district = D.id 
    AND ((D.id_city = C.id) AND (D.id_prov= B.id))
JOIN test_town A ON E.id_town = A.id 
    AND ((A.id_district = D.id) 
    AND (A.id_city = C.id) 
    AND (A.id_prov = B.id)) 
    AND E.stat_valid=1 
    AND E.mark_as_trash=0

mark_as_trash 是一个标记列,只包含 1 和 0,只是为了知道数据是否被标记为删除记录,而 stat_valid 是过滤结果值 - 如果值为 1,则数据有效以获得选举权。

我试图查看解释,但没有列用作索引查找。我相信这就是为什么应用程序在 200K 行中如此缓慢的问题。上面的查询只显示了两个条件,但该应用程序具有按姓名、出生地、出生日期、年龄范围等进行过滤的功能。

我怎样才能让这个表现更好?

4

1 回答 1

0

Can a city be in two provinces? If not then why do you check C.id_prov=B.id if E.id_city = C.id should give you just one row?

Also it seems that your query is slow because you're selecting 200k rows. Indexes will improve performance but do you really need all the rows at once? You should use pagination (limit, offset).

于 2013-06-02T19:06:27.677 回答