我有超过 100,000 条记录的 MPTT 树,使用lft
,rght
和parent_id
列存储在 MySQL 中。现在左/右值已损坏,而父 ID 仍然完好无损。它需要大量的查询才能在应用层修复它。有没有一种好方法可以将负担放在数据库上并让它仅使用 SQL 重新计算左/右值?
澄清一下,我需要重新计算嵌套集的数字 lft/rght 值,而不是相邻记录的 id。
(来源:mysql.com)
这是我从@Lieven 的回答中改编的内容,并结合了这里的反馈以获得更好的性能:
DROP PROCEDURE IF EXISTS tree_recover;
DELIMITER //
CREATE PROCEDURE tree_recover ()
MODIFIES SQL DATA
BEGIN
DECLARE currentId, currentParentId CHAR(36);
DECLARE currentLeft INT;
DECLARE startId INT DEFAULT 1;
# Determines the max size for MEMORY tables.
SET max_heap_table_size = 1024 * 1024 * 512;
START TRANSACTION;
# Temporary MEMORY table to do all the heavy lifting in,
# otherwise performance is simply abysmal.
CREATE TABLE `tmp_tree` (
`id` char(36) NOT NULL DEFAULT '',
`parent_id` char(36) DEFAULT NULL,
`lft` int(11) unsigned DEFAULT NULL,
`rght` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX USING HASH (`parent_id`),
INDEX USING HASH (`lft`),
INDEX USING HASH (`rght`)
) ENGINE = MEMORY
SELECT `id`,
`parent_id`,
`lft`,
`rght`
FROM `tree`;
# Leveling the playing field.
UPDATE `tmp_tree`
SET `lft` = NULL,
`rght` = NULL;
# Establishing starting numbers for all root elements.
WHILE EXISTS (SELECT * FROM `tmp_tree` WHERE `parent_id` IS NULL AND `lft` IS NULL AND `rght` IS NULL LIMIT 1) DO
UPDATE `tmp_tree`
SET `lft` = startId,
`rght` = startId + 1
WHERE `parent_id` IS NULL
AND `lft` IS NULL
AND `rght` IS NULL
LIMIT 1;
SET startId = startId + 2;
END WHILE;
# Switching the indexes for the lft/rght columns to B-Trees to speed up the next section, which uses range queries.
DROP INDEX `lft` ON `tmp_tree`;
DROP INDEX `rght` ON `tmp_tree`;
CREATE INDEX `lft` USING BTREE ON `tmp_tree` (`lft`);
CREATE INDEX `rght` USING BTREE ON `tmp_tree` (`rght`);
# Numbering all child elements
WHILE EXISTS (SELECT * FROM `tmp_tree` WHERE `lft` IS NULL LIMIT 1) DO
# Picking an unprocessed element which has a processed parent.
SELECT `tmp_tree`.`id`
INTO currentId
FROM `tmp_tree`
INNER JOIN `tmp_tree` AS `parents`
ON `tmp_tree`.`parent_id` = `parents`.`id`
WHERE `tmp_tree`.`lft` IS NULL
AND `parents`.`lft` IS NOT NULL
LIMIT 1;
# Finding the element's parent.
SELECT `parent_id`
INTO currentParentId
FROM `tmp_tree`
WHERE `id` = currentId;
# Finding the parent's lft value.
SELECT `lft`
INTO currentLeft
FROM `tmp_tree`
WHERE `id` = currentParentId;
# Shifting all elements to the right of the current element 2 to the right.
UPDATE `tmp_tree`
SET `rght` = `rght` + 2
WHERE `rght` > currentLeft;
UPDATE `tmp_tree`
SET `lft` = `lft` + 2
WHERE `lft` > currentLeft;
# Setting lft and rght values for current element.
UPDATE `tmp_tree`
SET `lft` = currentLeft + 1,
`rght` = currentLeft + 2
WHERE `id` = currentId;
END WHILE;
# Writing calculated values back to physical table.
UPDATE `tree`, `tmp_tree`
SET `tree`.`lft` = `tmp_tree`.`lft`,
`tree`.`rght` = `tmp_tree`.`rght`
WHERE `tree`.`id` = `tmp_tree`.`id`;
COMMIT;
DROP TABLE `tmp_tree`;
END//
DELIMITER ;
对一些测试数据运行良好,但它仍在我的 100,000 条记录树上运行,所以我还不能给出任何最终判断。直接在物理表上运行的幼稚脚本性能极差,至少运行数小时,更有可能运行数天。切换到临时 MEMORY 表使这个时间缩短到大约一个小时,选择正确的索引将其缩短到 10 分钟。
使用 SQL Server,以下脚本似乎对我有用。
category_id name parent lft rgt lftcalc rgtcalc
----------- -------------------- ----------- ----------- ----------- ----------- -----------
1 ELECTRONICS NULL 1 20 1 20
2 TELEVISIONS 1 2 9 2 9
3 TUBE 2 3 4 3 4
4 LCD 2 5 6 5 6
5 PLASMA 2 7 8 7 8
6 PORTABLE ELECTRONICS 1 10 19 10 19
7 MP3 PLAYERS 6 11 14 11 14
8 FLASH 7 12 13 12 13
9 CD PLAYERS 6 15 16 15 16
10 2 WAY RADIOS 6 17 18 17 18
SET NOCOUNT ON
GO
DECLARE @nested_category TABLE (
category_id INT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT,
lft INT,
rgt INT
);
DECLARE @current_Category_ID INTEGER
DECLARE @current_parent INTEGER
DECLARE @SafeGuard INTEGER
DECLARE @myLeft INTEGER
SET @SafeGuard = 100
INSERT INTO @nested_category
SELECT 1,'ELECTRONICS',NULL,NULL,NULL
UNION ALL SELECT 2,'TELEVISIONS',1,NULL,NULL
UNION ALL SELECT 3,'TUBE',2,NULL,NULL
UNION ALL SELECT 4,'LCD',2,NULL,NULL
UNION ALL SELECT 5,'PLASMA',2,NULL,NULL
UNION ALL SELECT 6,'PORTABLE ELECTRONICS',1,NULL,NULL
UNION ALL SELECT 7,'MP3 PLAYERS',6,NULL,NULL
UNION ALL SELECT 8,'FLASH',7,NULL,NULL
UNION ALL SELECT 9,'CD PLAYERS',6,NULL,NULL
UNION ALL SELECT 10,'2 WAY RADIOS',6,NULL,NULL
/* Initialize */
UPDATE @nested_category
SET lft = 1
, rgt = 2
WHERE parent IS NULL
UPDATE @nested_category
SET lft = NULL
, rgt = NULL
WHERE parent IS NOT NULL
WHILE EXISTS (SELECT * FROM @nested_category WHERE lft IS NULL) AND @SafeGuard > 0
BEGIN
SELECT @current_Category_ID = MAX(nc.category_id)
FROM @nested_category nc
INNER JOIN @nested_category nc2 ON nc2.category_id = nc.parent
WHERE nc.lft IS NULL
AND nc2.lft IS NOT NULL
SELECT @current_parent = parent
FROM @nested_category
WHERE category_id = @current_category_id
SELECT @myLeft = lft
FROM @nested_category
WHERE category_id = @current_parent
UPDATE @nested_category SET rgt = rgt + 2 WHERE rgt > @myLeft;
UPDATE @nested_category SET lft = lft + 2 WHERE lft > @myLeft;
UPDATE @nested_category SET lft = @myLeft + 1, rgt = @myLeft + 2 WHERE category_id = @current_category_id
SET @SafeGuard = @SafeGuard - 1
END
SELECT * FROM @nested_category ORDER BY category_id
SELECT COUNT(node.name), node.name, MIN(node.lft)
FROM @nested_category AS node,
@nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY
node.name
ORDER BY
3, 1
SET NOCOUNT ON
GO
DECLARE @nested_category TABLE (
category_id INT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT,
lft INT,
rgt INT,
lftcalc INT,
rgtcalc INT
);
INSERT INTO @nested_category
SELECT 1,'ELECTRONICS',NULL,1,20,NULL,NULL
UNION ALL SELECT 2,'TELEVISIONS',1,2,9,NULL,NULL
UNION ALL SELECT 3,'TUBE',2,3,4,NULL,NULL
UNION ALL SELECT 4,'LCD',2,5,6,NULL,NULL
UNION ALL SELECT 5,'PLASMA',2,7,8,NULL,NULL
UNION ALL SELECT 6,'PORTABLE ELECTRONICS',1,10,19,NULL,NULL
UNION ALL SELECT 7,'MP3 PLAYERS',6,11,14,NULL,NULL
UNION ALL SELECT 8,'FLASH',7,12,13,NULL,NULL
UNION ALL SELECT 9,'CD PLAYERS',6,15,16,NULL,NULL
UNION ALL SELECT 10,'2 WAY RADIOS',6,17,18,NULL,NULL
/* Initialize */
UPDATE @nested_category
SET lftcalc = 1
, rgtcalc = 2
WHERE parent IS NULL
DECLARE @current_Category_ID INTEGER
DECLARE @current_parent INTEGER
DECLARE @SafeGuard INTEGER
DECLARE @myRight INTEGER
DECLARE @myLeft INTEGER
SET @SafeGuard = 100
WHILE EXISTS (SELECT * FROM @nested_category WHERE lftcalc IS NULL) AND @SafeGuard > 0
BEGIN
SELECT @current_Category_ID = MAX(nc.category_id)
FROM @nested_category nc
INNER JOIN @nested_category nc2 ON nc2.category_id = nc.parent
WHERE nc.lftcalc IS NULL
AND nc2.lftcalc IS NOT NULL
SELECT @current_parent = parent
FROM @nested_category
WHERE category_id = @current_category_id
SELECT @myLeft = lftcalc
FROM @nested_category
WHERE category_id = @current_parent
UPDATE @nested_category SET rgtcalc = rgtcalc + 2 WHERE rgtcalc > @myLeft;
UPDATE @nested_category SET lftcalc = lftcalc + 2 WHERE lftcalc > @myLeft;
UPDATE @nested_category SET lftcalc = @myLeft + 1, rgtcalc = @myLeft + 2 WHERE category_id = @current_category_id
SELECT * FROM @nested_category WHERE category_id = @current_parent
SELECT * FROM @nested_category ORDER BY category_id
SET @SafeGuard = @SafeGuard - 1
END
SELECT * FROM @nested_category ORDER BY category_id
SELECT COUNT(node.name), node.name, MIN(node.lft)
FROM @nested_category AS node,
@nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY
node.name
ORDER BY
3, 1
你救我!!!我使用混合树模型,所以当这一天到来时,我的树(30000+)被破坏了。我从你的两个技术中学习,但不是恢复,只是完全重建失去了所有的排序和反向树......我认为,需要记住旧的 cat_left ......只是为了可能的完整性。所以,它可能看起来像......
如果存在则删除程序 tree_recover; 分隔符 | 创建过程 tree_recover () 修改 SQL 数据 开始 声明 currentId,currentParentId CHAR(36); 声明 currentLeft INT; DECLARE startId INT DEFAULT 1; # 确定 MEMORY 表的最大大小。 设置 max_heap_table_size = 1024 * 1024 * 512; 开始交易; # 临时 MEMORY 表完成所有繁重的工作, # 否则性能简直糟透了。 如果存在 `tmp_cat`,则删除表; 创建表`tmp_cat`( `cat_id` char(36) NOT NULL DEFAULT '', `cat_parent` char(36) 默认为 NULL, `cat_left` int(11) 无符号默认 NULL, `cat_right` int(11) 无符号默认 NULL, `cat_left_old` int(11) 无符号默认 NULL, 主键(`cat_id`), 使用哈希的索引(`cat_parent`), 使用哈希的索引(`cat_left`), 使用哈希的索引(`cat_right`), 使用哈希的索引(`cat_left_old`) ) 引擎 = 内存 选择`cat_id`, `cat_parent`, `cat_left`, `cat_right`, `cat_left` 作为 cat_left_old 来自“目录”; # 平衡比赛场地。 更新`tmp_cat` 设置`cat_left` = NULL, `cat_right` = NULL; # 为所有根元素建立起始编号。 WHILE EXISTS (SELECT * FROM `tmp_cat` WHERE `cat_parent` IS NULL AND `cat_left` IS NULL AND `cat_right` IS NULL ORDER BY cat_left_old LIMIT 1) DO 更新`tmp_cat` SET `cat_left` = startId, `cat_right` = startId + 1 `cat_parent` 为空 AND `cat_left` 为空 AND `cat_right` 为空 限制 1; SET startId = startId + 2; 结束; # 将 cat_left/rght 列的索引切换到 B-Trees 以加快下一部分,该部分使用范围查询。 DROP INDEX `cat_left` ON `tmp_cat`; 删除索引 `cat_right` ON `tmp_cat`; 删除索引 `cat_left_old` ON `tmp_cat`; 在 `tmp_cat` (`cat_left`) 上使用 BTREE 创建索引`cat_left`; 在 `tmp_cat` (`cat_right`) 上使用 BTREE 创建索引`cat_right`; 在 `tmp_cat` (`cat_left_old`) 上使用 BTREE 创建索引`cat_left_old`; # 给所有子元素编号 WHILE EXISTS (SELECT * FROM `tmp_cat` WHERE `cat_left` IS NULL ORDER BY cat_left_old LIMIT 1) DO # 选择一个未处理的元素,它有一个已处理的父元素。 选择`tmp_cat`.`cat_id` INTO currentId 来自`tmp_cat` INNER JOIN `tmp_cat` AS `parents` ON `tmp_cat`.`cat_parent` = `parents`.`cat_id` WHERE `tmp_cat`.`cat_left` 为 NULL AND `parents`.`cat_left` 不为空 ORDER BY `tmp_cat`.cat_left_old DESC 限制 1; # 查找元素的父元素。 选择`cat_parent` INTO currentParentId 来自`tmp_cat` WHERE `cat_id` = currentId; # 查找父级的 cat_left 值。 选择`cat_left` INTO 当前左 来自`tmp_cat` WHERE `cat_id` = currentParentId; # 将当前元素2的所有元素向右移动。 更新`tmp_cat` SET `cat_right` = `cat_right` + 2 WHERE `cat_right` > currentLeft; 更新`tmp_cat` SET `cat_left` = `cat_left` + 2 WHERE `cat_left` > currentLeft; # 设置当前元素的 cat_left 和 rght 值。 更新`tmp_cat` SET `cat_left` = currentLeft + 1, `cat_right` = currentLeft + 2 WHERE `cat_id` = currentId; 结束; # 将计算值写回物理表。 更新`目录`,`tmp_cat` SET `catalog`.`cat_left` = `tmp_cat`.`cat_left`, `目录`.`cat_right` = `tmp_cat`.`cat_right` WHERE `catalog`.`cat_id` = `tmp_cat`.`cat_id`; 犯罪; 如果存在 `tmp_cat`,则删除表; 结束|
在提供的所有解决方案中,我遇到了一个问题,MySQL 会提示它会Running query
持续数小时但什么都不会发生。
然后我意识到,如果我在 tmp_tree 表的第一条记录(带有 的那个parent_id = 0
)中将 lft 和 rght 值设置为 1 和 2,那么一切正常。也许程序需要更新才能自动执行此操作。