python - 父/子数据库包含循环引用

Question

我有一个关键字表，其中每个关键字都分配了一个 id 并且是唯一的。我有第二个表，它将父关键字的 ID 链接到子关键字的 ID。一个关键字最多可以有大约 800 个孩子或根本没有。孩子们可以成为更多关键字的父母（等等……）

我遇到的问题是孩子（或孙子或曾孙）可能是导致循环结构的初始关键字的父项。我正在尝试使用递归函数为初始关键字构建树数据结构，但该函数要么永远不会结束，要么超过 Python 中的 1000 级递归限制。

是否有更好的方法来设计我的父/子表以防止这种情况（或在插入期间进行预先检查），或者是否有更好的方法来编写递归函数来防止这种情况发生？我试图限制递归函数的深度，但遇到了单级问题（即子级是父级的父级）。同样，我的目标是为初始关键字创建树结构。

Table Keyword:
    id int(11) not null primary key auto_increment (id of keyword)
    text varchar(255) unique (keyword text e.g. "computer help desk")

Table Keyword_Relation:
    id int(11) not null primary key auto_increment (id for parent/child combo, not keyword id)
    parent int(11) (id of parent keyword)
    child int(11) (id of child keyword)

score 2 · Accepted Answer

您要做的是创建拓扑排序。已经发布了许多方法可以最佳地执行此操作，这将取决于您的架构和首选方法。

在您的情况下，听起来您没有多父母。但是我如何以编程方式处理它是从叶节点（即没有子节点的节点）开始并上升树。上升时，保留您遇到的节点的集合。如果你曾经重复一次遭遇，那么就会存在一个循环，并且不可能进行拓扑排序。

你不会得到一个无限循环，但你的拓扑肯定有可能有超过 1000 个节点......所以递归对你来说可能是不可能的。

编辑：回答你关于“更好的设计”的问题......如果可能的话，存储根节点标识符可能是有利的。即：给定父母，孩子，孙子，曾孙，曾曾...孙

每行不仅包含它们的直接父 ID，还包含根节点Parent ID... 或一些“已知良好”的根节点

如果你这样做，你可以通过只上升到根节点来加速拓扑排序方法，并且只包括具有相同根节点的集合。

score 1 · Accepted Answer

You can start at the top of the tree, and just keep track of the nodes you've already found and ignore them.

def getchildren(node, ignore_nodes=[]):
    child_nodes = []
    for child in node.children():
        if child in ignore_nodes:
            continue
        child_nodes.append(child)
        ignore_nodes.append(child)
        nodes, ignore_nodes = getchildren(child, ignore_nodes)
        child_nodes.extend(nodes)
    return child_nodes, ignore_nodes

python - 父/子数据库包含循环引用

2 回答 2

Related

Reference