4

有没有办法从 pandas DataFrame 读取数据并使用 anytree 构造树?

Parent Child
A      A1
A      A2
A2     A21

我可以使用静态值来做到这一点,如下所示。但是,我想通过使用 anytree 从 pandas DataFrame 中读取数据来自动执行此操作。

>>> from anytree import Node, RenderTree
>>> A = Node("A")
>>> A1 = Node("A1", parent=A)
>>> A2 = Node("A2", parent=A)
>>> A21 = Node("A21", parent=A2)

输出是

A
├── A1
└── A2
    └── A21

这个问题,尤其是答案已被采纳,真正复制,来自:

从文件中读取数据并使用 python 中的 anytree 创建树

非常感谢@Fabien N

4

2 回答 2

0

有关详细信息,请参阅 @Fabian N 在从文件中读取数据并使用 python 中的任何树创建树的答案。

以下是采用他对外部文件与 pandas DataFrame 一起使用的答案:

    df['Parent_child'] = df['Parent'] + ',' + df['child'] # column of comma separated Parent,child

    i = 0
    for index, row in df.iterrows():
        if row['child']==row['Parent']:  # I modified the DataFrame by concatenating a 
                                         # dataframe of all the roots in my data, then 
                                         # copied in into both parent and child columns.  
                                         # This can be skipped by statically setting the 
                                         # roots, only making sure the assumption 
                                         # highlighted by @Fabien in the above quoted 
                                         # answer still holds true (This assumes that the 
                                         # entries are in such an order that a parent node 
                                         # was always introduced as a child of another 
                                         # node beforehand)

            root = Node(row['Parent'])
            nodes = {}
            nodes[root.name] = root
            i=i+1
        else:
            line = row['Parent_child'].split(",")
            name = "".join(line[1:]).strip()
            nodes[name] = Node(name, parent=nodes[line[0]])
            #predecessor = df['child_Parent'].values[i]
            i=i+1
                
    for pre, _, node in RenderTree(root):
        print("%s%s" % (pre, node.name))

如果有更好的方法来实现上述目标,请发布答案,我会接受它作为解决方案。

非常感谢@Fabian N。

于 2020-09-27T04:44:03.143 回答
0

如果不存在,首先创建节点,将它们的引用存储在字典中nodes以供进一步使用。必要时为孩子更换父母。Parent我们可以通过查看哪些值不在值中来推导树木森林的根Child,因为父节点不是任何节点的子节点,它不会出现在Child列中。

def add_nodes(nodes, parent, child):
    if parent not in nodes:
        nodes[parent] = Node(parent)  
    if child not in nodes:
        nodes[child] = Node(child)
    nodes[child].parent = nodes[parent]

data = pd.DataFrame(columns=["Parent","Child"], data=[["A","A1"],["A","A2"],["A2","A21"],["B","B1"]])
nodes = {}  # store references to created nodes 
# data.apply(lambda x: add_nodes(nodes, x["Parent"], x["Child"]), axis=1)  # 1-liner
for parent, child in zip(data["Parent"],data["Child"]):
    add_nodes(nodes, parent, child)

roots = list(data[~data["Parent"].isin(data["Child"])]["Parent"].unique())
for root in roots:         # you can skip this for roots[0], if there is no forest and just 1 tree
    for pre, _, node in RenderTree(nodes[root]):
        print("%s%s" % (pre, node.name))

结果:

A
├── A1
└── A2
    └── A21
B
└── B1

更新打印特定的根:

root = 'A' # change according to usecase
for pre, _, node in RenderTree(nodes[root]):
    print("%s%s" % (pre, node.name))
于 2020-09-27T05:26:14.307 回答