python - 如何使用 python/mysql 处理重复项？

Question

我有一个 sql 查询，它从我的表 Person 中返回一个带有 id 的重复项列表：

1   hudson
43  hudson
67  hudson
34  roger
79  roger
89  kerry
403 kerry

使用 Python 脚本，我想自动化这种查询，例如“hudson”案例：

UPDATE Customer SET person_id = 1 WHERE person_id = 43;

当设置重复的数量（例如 2）时，我认为我们可以执行以下操作：

cursor.execute(*myquery that returns list of duplicates*)
rows=cursor.fetchmany(2)
row1=rows[1] #??
row2=rows[2] #??
cursor.execute('UPDATE Customer SET person_id = row1[0] WHERE person_id = row2[0];')

当重复的数量可变时，我真的不知道该怎么做。

非常感谢你的帮助

score 0 · Accepted Answer

按名称分组并从每个组中选择最小的人员 ID。

还可以考虑使用Python Pandas并将所有数据转储到 Pandas DataFrame 中，然后您就可以使用该drop_duplicates函数。我发现创建自己的 SQL-to-h5 和 SQL-to-Pandas 后端代码让我在 Pandas 中完成我所有的 Python 工作并且永远不会直接弄乱 SQL 是非常值得的。

python - 如何使用 python/mysql 处理重复项？

1 回答 1

Related

Reference