1

我正在构建一个查询,voters根据他们在votes表中的活动(700 万条记录)列出表中的选民(100 万条记录)。标准如下:

  • 大选 (GE) 每年只发生一次,并且只应计算 2004 年或之后的 GE。

  • 在前面提到的 GEs 中,只有 10% 到 50% 的选民投票的那些应该被计算在内。

一些不太重要的信息:

  • 无法更改架构。它以固定宽度的文本文件形式呈现给我们,通过脚本上传,并用于其他目的。

  • 只有当前的活跃选民名单及其投票历史可用。在下面的查询中,我包含了一个方程,该方程将上限阈值降低 10,000 名选民,每当年份减少 1 时。它并不完美,但它似乎过滤掉了不需要的 GE,同时保留了有效的 GE。

例如,如果在 2005 年、2006 年、2007 年、2009 年、2010 年和 2011 年有 100,000 到 500,000 名选民投票,那么我希望只列出在这些年份投票的选民。

mysqlfiddle 在这里

架构如下:

CREATE TABLE IF NOT EXISTS `voters` (
  `CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
  `LastName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `FirstName` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `MiddleInitial` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `NameSuffix` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `HouseNumber` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `HouseNumberSuffix` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `ApartmentNumber` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
  `StreetName` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `City` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
  `Zip` varchar(5) COLLATE utf8_unicode_ci NOT NULL,
  `ZipCode4` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress1` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress2` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress3` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `MailingAddress4` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `DOBY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `DOBM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `DOBD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `Gender` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `Other` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CO` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `SD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `CC` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `JD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `RegY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `RegM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `RegD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `Status` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `StatusChangeD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `LastVoted` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `Telephone` varchar(12) COLLATE utf8_unicode_ci NOT NULL,
  `County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  KEY `districts` (`CountyEMSID`,`ED`,`AD`,`CD`,`CO`,`SD`,`CC`,`JD`),
  KEY `vsn` (`CountyEMSID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `votes` (
  `CountyEMSID` varchar(9) COLLATE utf8_unicode_ci NOT NULL,
  `County` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `AD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ED` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `Party` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateY` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateM` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionDateD` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `ElectionType` varchar(2) COLLATE utf8_unicode_ci NOT NULL,
  `VoterType` varchar(1) COLLATE utf8_unicode_ci NOT NULL,
  KEY `CountyEMSID` (`CountyEMSID`),
  KEY `perfect` (`CountyEMSID`,`ElectionDateY`,`ElectionType`),
  KEY `CountyEMSID_2` (`CountyEMSID`,`ElectionDateY`,`ElectionType`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

到目前为止,我有以下查询应该只列出votes表中选民的唯一 ID (CountyEMSID)。它在 mysqlfiddle 中工作,但在 phpmyadmin 中挂起。

SELECT DISTINCT CountyEMSID
FROM `votes` 
WHERE ElectionDateY IN 
(
SELECT ElectionDateY
FROM `votes`
WHERE ElectionType = 'GE' 
AND ElectionDateY >= 2004 
GROUP BY ElectionDateY 
HAVING COUNT(*) < ((0.5 * (SELECT COUNT(*) FROM `voters`)) - ((YEAR(CURRENT_TIMESTAMP()) - ElectionDateY) * 10000)) 
AND COUNT(*) > (0.1 * (SELECT COUNT(*) FROM `voters`))
)

对于优化此查询并对其进行修改以使其从votes表中返回所有相应的选民信息,我将不胜感激。

4

1 回答 1

2

MySQLin对子句的优化很差。基本上,它为处理的每一行重新运行子查询。您应该将计算移到from子句中。这是我的尝试:

select distinct v.*
from votes v join
     (select electiondatey, count(*) as NumYVotes
      from votes v
      group by electiondatey
    ) ey
    on v.electiondatey = ev.electiondatey cross join
    (select count(*) as numvoters from voters) as const
where (NumYVotes < 0.5 * numvoters - year(now()) - ElectionDateY * 10000) and
      (NumYVotes > 0.1 * numvoters)

注意:我没有对此进行测试,因此它可能存在语法错误。

于 2013-01-27T20:20:32.907 回答