我有一个大型数据集,其中一些是重复记录,可以通过两个字段中的重复项来识别。
要查找这些记录,可以使用以下查询:
SELECT * FROM supplierstuffs
GROUP BY "Supplier Code", "Cost ex Tax"
HAVING count("Description") > 1
基本上我想要做的是将“描述”的所有值组合在一起形成单行,然后用单行替换所有重复的行。
到目前为止,这是我一半的查询,它既笨拙又可怕。我的主要目标是让它发挥作用——但如果我在此过程中学习了一些 sql 中的新技巧,那根本不是一件坏事。
UPDATE supplierstuffs SET "Description" =
(SELECT array_to_string(array_accum("Description"), ', ') FROM supplierstuffs
GROUP BY "Supplier Code", "Cost ex Tax"
HAVING count("Description") > 1)
WHERE .....
这是我所得到的。我应该读什么才能更进一步?我已经阅读了几本书和很多关于这个主题的网页。但是在这种情况下,我认为我的问题不仅限于缺乏 SQL(好吧,这不是我唯一的问题),而是更多地以错误的方式解决问题。
编辑1:
'Name'; 'Supplier Code'; 'Desciption';
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"04 -14, S - 3XL"
"7CP PODIUM CRICKET PANT ";"7CP";"08 -14, S - 2XL"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"04 -14, S - 3XL"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"04 -14, S - 3XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL, XS - 2XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL, 8-16"
^^ 是我想从 vv 创建的
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"S - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"8-16"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"7CP PODIUM CRICKET PANT ";"7CP";"08 -14"
"7CP PODIUM CRICKET PANT ";"7CP";"S - 2XL"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"04 -14"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"S - 3XL"
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"04 -14"
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"S - 3XL"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"04 -14"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"S - 3XL"
^^ 注意不超过一个描述行的行需要保持不变。
到目前为止,我已经在一个新表中创建了新记录:
INSERT INTO tmptable
SELECT "Name" , "Supplier Code", array_to_string(array_accum("Description"), ', ')
FROM supplierstuffs
GROUP BY "Name", "Supplier Code", "Description"
HAVING count("Description") > 1
所以现在剩下的就是删除 cat 命令捕获的记录。看来我不能DELETE FROM
用有条款?我在想这DELETE FROM table WHERE oid IN (SELECT OID's using having clause)
会有效吗?
编辑2:
SELECT array_accum(oid)
FROM supplierstuffs
GROUP BY "Name", "Supplier Code", "Colour", "Cost ex Tax"
HAVING count("Description") > 1
返回几个 2 个 oid 的数组,所有这些都需要删除。我觉得我很接近,但又很远。提前致谢