我有一个约 4 亿行的非规范化数据库,我已将所有重复数据移动到一个新的规范化数据库中,因此我可以用 id 来表示它。
现在我需要移动所有数据条目并将其转换为包含 id 的条目。
问题是 400m 行需要一段时间.. 我需要帮助来优化。
此查询每行需要 0.4 秒,因此需要几个月的时间:
INSERT IGNORE INTO normalized.entry (insDate, `date`, project, keyword, url, position, competition, serachEngine)
SELECT
CURDATE() as insDate
, d.id as dateId
, p.id as projectId
, k.id as keywordId
, z.id AS urlId
, old.position
, old.competition
, s.id as searchEngineId
FROM unnormalized.bigtable old
INNER JOIN normalized.`date` d ON old.insDate = d.`date`
INNER JOIN normalized.project p ON old.awrProject = p.project
INNER JOIN normalized.searchEngine s ON old.searchEngine = s.searchEngine
INNER JOIN normalized.keyword k ON old.keyword = k.keyword
INNER JOIN normalized.urlHash z ON old.url = z.url
WHERE old.id >= ".$start." AND old.id <= ".$stop."";
如果我使用更多的 php 并将其分成两个查询,那么每个条目只需要 0.07 秒,但这也需要几个月的时间:
$q = "SELECT tmp.id
, d.id as dateId
, p.id as projectId
, k.id as keywordId
, tmp.position
, tmp.competition
, s.id as searchEngineId
, tmp.url
, z.id AS urlId
FROM unnormalized.bigtable tmp
INNER JOIN normalized.`date` d ON tmp.insDate = d.`date`
INNER JOIN normalized.project p ON tmp.awrProject = p.project
INNER JOIN normalized.searchEngine s ON tmp.searchEngine = s.searchEngine
INNER JOIN normalized.keyword k ON tmp.keyword = k.keyword
INNER JOIN normalized.urlHash z ON tmp.url = z.url
WHERE tmp.id > ".$start." AND tmp.id < ".$stop."";
// echo $q;
$result = mysql_query($q, $local);
if (mysql_num_rows($result) > 0) {
while ($row = mysql_fetch_array($result)) {
$q = "SELECT id FROM normalized.url WHERE url = '".$row["url"]."'";
$resultUrl = mysql_query($q, $local);
$rowUrl = mysql_fetch_array($resultUrl);
$q = "INSERT IGNORE normalized.entry (insDate, `date`, project, keyword, url, position, competition, serachEngine) VALUES (NOW(), '".$row["dateId"]."', '".$row["projectId"]."', '".$row["keywordId"]."', '".$rowUrl["id"]."', '".$row["position"]."', '".$row["competition"]."', '".$row["searchEngineId"]."')";
如果不花半年时间,我不知道如何移植这些数据!/它需要的所有帮助
规格:我在 RDS 亚马逊服务器上使用 InnoDB。
编辑:第一个查询的解释选择:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra 1,SIMPLE,p,index,NULL,projectName,42,NULL,1346,"使用索引" 1,SIMPLE,s,index,NULL ,searchEngine,42,NULL,2336,"使用索引;使用连接缓冲区" 1,SIMPLE,k,index,NULL,keyword,42,NULL,128567,"使用索引;使用连接缓冲区" 1,SIMPLE,tmp,ref ,"keyword_url_insDate,keyword,searchEngine,url,awrProject",keyword_url_insDate,767,func,115,"使用 where" 1,SIMPLE,d,eq_ref,date,date,3,intradb.tmp.insDate,1,"使用 where ; 使用索引" 1,SIMPLE,z,ref,url,url,767,bbointradb.tmp.url,1,"使用索引"
显示创建表:
'rankingUrls201001', 'CREATE TABLE rankingUrls201001
(
id
int(11) NOT NULL AUTO_INCREMENT,
insDate
datetime NOT NULL,
keyword
varchar(255) COLLATE utf8_swedish_ci NOT NULL,
searchEngine
varchar(25) COLLATE utf8_swedish_ci NOT NULL,
url
varchar(255) COLLATE utf8_swedish_ci NOT NULL,
position
int(11 ) ) NOT NULL,
competition
varchar(20) COLLATE utf8_swedish_ci NOT NULL,
awrProject
varchar(200) COLLATE utf8_swedish_ci NOT NULL,
server
varchar(20) COLLATE utf8_swedish_ci NOT NULL,
rank
varchar(40) COLLATE utf8_swedish_ci NOT NULL, PRIMARY KEY ( id
), KEY keyword_url_insDate
( keyword
, url
, insDate
) , 键keyword
( keyword
), 键searchEngine
( searchEngine
), 键url
( url
), 键awrProject
(awrProject
) ) ENGINE=InnoDB AUTO_INCREMENT=2266575 默认字符集=utf8 COLLATE=utf8_swedish_ci'