2

I am trying to use the Python AnyTree module to map URL redirections into a tree without creating any duplicate nodes.

I've tried to play around with the code using AnyTree docs and similar questions e.g. Tree with no duplicate children

My current code is:

from anytree import Node, RenderTree

root_nodes = []

for url in redirections:
    is_root = True
    for redir in redirections:
        if url['url'] == redir['redir']:
            is_root = False
    if is_root:
        root = Node(url['url'])
        root_nodes.append(root)


for root in root_nodes:
    for redir in redirections:
        if redir['url'] == root.name:
            sub = Node(redir['redir'], parent=root)
        else:
            sub = next((c for c in root.children if c.name == redir['url']), None)
            if sub is None:
                sub = Node(redir['redir'], parent=root)
            else:
                new_node = sub
                sub = Node(redir['redir'], parent=new_node)

Basically, given a list of redirections like:

redirections = [
    {
        'url': "alpha.com",
        'redir_url': "beta.com",
    },
    {
        'url': "alpha.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "beta.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "beta.com",
        'redir_url': "delta.com",
    },
    {
        'url': "delta.com",
        'redir_url': "foxtrot.com",
    },
    {
        'url': "foxtrot.com",
        'redir_url': "golf.com",
    },
    {
        'url': "india.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "india.com",
        'redir_url': "juliet.com",
    },
]

I want AnyTree to produce an output like:

alpha.com -> beta.com -> charlie.com
                      -> delta.com -> foxtrot.com -> golf.com
          -> charlie.com

india.com -> charlie.com
          -> juliet.com

Instead, it currently prints:

alpha.com
├── beta.com
│   ├── charlie.com
│   └── delta.com
├── charlie.com
├── foxtrot.com
│   └── golf.com
├── charlie.com
└── juliet.com
india.com
├── beta.com
│   ├── charlie.com
│   └── delta.com
├── charlie.com
├── foxtrot.com
│   └── golf.com
├── charlie.com
└── juliet.com

As you can see, there are lots of duplicates. Also, foxtrot and golf aren't added to the delta chain. Finally, india has man redirections that do not occur from those URLs.

Note that the redirections array could be in any order (not necessarily the order the redirections occurred in)

4

1 回答 1

0

您需要一个知道所有节点并正确链接它们的容器。

from anytree import Node, RenderTree

redirections = [
    {
        'url': "alpha.com",
        'redir_url': "beta.com",
    },
    {
        'url': "alpha.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "beta.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "beta.com",
        'redir_url': "delta.com",
    },
    {
        'url': "delta.com",
        'redir_url': "foxtrot.com",
    },
    {
        'url': "foxtrot.com",
        'redir_url': "golf.com",
    },
    {
        'url': "india.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "india.com",
        'redir_url': "juliet.com",
    },
]


class Fab:

    def __init__(self):
        self.nodemap = {}

    @property
    def roots(self):
        return [node for node in self.nodemap.values() if node.is_root]

    def create(self, name=None, parentname=None):
        node = self._create(name)
        if parentname is not None:
            self._create(parentname).parent = node

    def _create(self, name):
        nodemap = self.nodemap
        if name not in nodemap:
            node = nodemap[name] = Node(name)
        else:
            node = nodemap[name]
        return node


f = Fab()
for redirect in redirections:
    url = redirect['url']
    redir_url = redirect['redir_url']
    f.create(url, redir_url)

for root in f.roots:
    for pre, fill, node in RenderTree(root):
        print("%s%s" % (pre, node.name))

这会给你

alpha.com
└── beta.com
    └── delta.com
        └── foxtrot.com
            └── golf.com
india.com
├── charlie.com
└── juliet.com

我将添加一个通用节点工厂来解决这个问题:https ://github.com/c0fec0de/anytree/issues/122

于 2020-01-20T21:21:02.867 回答