python - 在数据库中构建重复数据删除结果

翻译自：https://stackoverflow.com/questions/45118420 2017-07-15T12:54:06.453

65 次

我正在使用 python 项目重复数据删除在我的数据中查找重复的组织名称。许多示例都侧重于如何处理数据，而不是如何实现结果。是否有任何最佳实践来获取结果、将其放入数据库并查询重复的记录分组？

到目前为止，我的想法是像这样构造两个表（使用 sqlalchemy），但我觉得它有些不对劲：

class Organization(Base):
    __tablename__ = 'organization'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    cluster_id = Column(Integer, ForeignKey('duplicate_organization.cluster_id'))


class DuplicateOrganzation(Base):
    __tablename__ = 'duplicate_organization'

    id = Column(Integer, primary_key=True)
    cluster_id = Column(Integer)
    name = Column(String)
    organizations = relationship("Organization")

python - 在数据库中构建重复数据删除结果

0 回答 0

Related

Reference