0

I have a column that may contain entries like this: abc.yahoo.com efg.yshoo.com hij.yahoo.com

I need to delete all the duplicates and LEAVE ONE ONLY as I don't need the others. Such command can be easily done if I know the second part (ex: yahoo.com) but my problem is that the part (yahoo.com) is not fixed. I may have entries such as: abc.msn.com efg.msn.com hij.msn.com

And I want to treat all these cases at once. Is this possible?

4

2 回答 2

0

这是假设您只想取出.列上第一个然后组之前的字母:

DELETE a FROM tbl a
LEFT JOIN
(
    SELECT   MIN(id) AS id
    FROM     tbl
    GROUP BY SUBSTRING(column, LOCATE('.', column))
) b ON a.id = b.id
WHERE b.id IS NULL

id您的主键列名称在哪里,并且column是包含要分组的值的列。

这也将考虑xxx.co.uk到最后有两个部分的域。

确保您有当前数据的备份或在事务中运行此操作(ROLLBACK;如果它不符合您的需要,您可以在此操作)。

编辑:如果删除重复项后,您想用 替换第一个之前的字母.*您可以简单地使用:

UPDATE tbl
SET column = CONCAT('*', SUBSTRING(column, LOCATE('.', column)))
于 2012-07-29T20:02:21.023 回答
0

要删除重复项,您可以使用

DELETE FROM your_table t1
LEFT JOIN
(
    SELECT   MIN(id) AS id
    FROM     your_table 
    GROUP BY SUBSTRING_INDEX(REVERSE(col), '.', 2)
) t2 ON t2.id = t1.id
WHERE b.id IS NULL

如果您需要为此创建一个 UNIQUE 约束,您可以执行以下操作

1.添加另一个字段来保存域值

ALTER TABLE your_table ADD COLUMN `domain` VARCHAR(100) NOT NULL DEFAULT '';

2.用正确的值更新它

UPDATE your_table set domain = REVERSE(SUBSTRING_INDEX(REVERSE(col), '.', 2));

3.添加唯一约束

ALTER IGNORE TABLE your_table ADD UNIQUE domain (domain);

4.添加前插入前更新前触发器设置域列

DELIMITER $$

CREATE TRIGGER `your_trigger` BEFORE INSERT ON `your_table ` FOR EACH ROW 
BEGIN
    set new.domain = REVERSE(SUBSTRING_INDEX(REVERSE(new.col1), '.', 2));
END$$


CREATE TRIGGER `your_trigger` BEFORE UPDATE ON `your_table ` FOR EACH ROW 
BEGIN
    set new.domain = REVERSE(SUBSTRING_INDEX(REVERSE(new.col1), '.', 2));
END$$

DELIMITER ;

注意:这假设域是用“.”分隔的最后 2 个单词,它不适用于 ebay.co.uk 等域。为此,您可能需要创建一个存储函数,该函数返回给定主机的域并使用它而不是REVERSE(SUBSTRING_INDEX....

于 2012-07-29T20:04:46.077 回答