python - 如何在python中实现trie的remove功能？

Question

我已经在 python 中阅读了 trie 的以下实现： https ://stackoverflow.com/a/11016430/2225221

并尝试为其设置删除功能。基本上，即使一开始我也遇到了问题：如果你想从树中删除一个词，它可以有子“词”，或者它可以是另一个词的“子词”。

如果你用“del dict[key]”删除，你也删除了上面提到的这两种词。谁能帮助我，如何正确删除所选单词（让我们假设它在特里）

score 4 · Accepted Answer

我认为最好递归地做，代码如下：

def remove(self, word):
    self.delete(self.tries, word, 0)

def delete(self, dicts, word, i):
    if i == len(word):
        if 'end' in dicts:
            del dicts['end']
            if len(dicts) == 0:
                return True
            else:
                return False
        else:
            return False
    else:
        if word[i] in dicts and self.delete(dicts[word[i]], word, i + 1):
            if len(dicts[word[i]]) == 0:
                del dicts[word[i]]
                return True
            else:
                return False

        else:
            return False

score 3 · Accepted Answer

基本上，要从特里删除一个单词（因为它在您链接到的答案中实现），您只需要删除它的_end标记，例如：

def remove_word(trie, word):
    current_dict = trie
    for letter in word:
        current_dict = current_dict.get(letter, None)
        if current_dict is None:
            # the trie doesn't contain this word.
            break
    else:
        del current_dict[_end]

但是请注意，这并不能确保 trie 具有最小的大小。删除单词后，可能会在左侧的 trie 中存在不再被任何单词使用的分支。这不会影响数据结构的正确性，它只是意味着 trie 可能会消耗比绝对必要更多的内存。您可以通过从叶节点向后迭代并删除分支来改进这一点，直到找到一个具有多个子节点的分支。

编辑：这是一个想法，您可以如何实现一个删除函数，该函数还剔除任何不必要的分支。可能有一种更有效的方法，但这可能会让你开始：

def remove_word2(trie, word):
    current_dict = trie
    path = [current_dict]
    for letter in word:
        current_dict = current_dict.get(letter, None)
        path.append(current_dict)
        if current_dict is None:
            # the trie doesn't contain this word.
            break
    else:
        if not path[-1].get(_end, None):
            # the trie doesn't contain this word (but a prefix of it).
            return
        deleted_branches = []
        for current_dict, letter in zip(reversed(path[:-1]), reversed(word)):
            if len(current_dict[letter]) <= 1:
                deleted_branches.append((current_dict, letter))
            else:
                break
        if len(deleted_branches) > 0:
            del deleted_branches[-1][0][deleted_branches[-1][1]]
        del path[-1][_end]

本质上，它首先找到将要删除的单词的“路径”，然后向后迭代以找到可以删除的节点。然后它会删除可以删除的路径的根（这也会隐式删除_end节点）。

score 1 · Accepted Answer

def remove_a_word_util(self, word, idx, node):
    if len(word) == idx:
        node.is_end_of_word = False
        return bool(node.children)

    ch = word[idx]
    if ch not in node.children:
        return True

    flag = self.remove_a_word_util(word, idx+1, node.children[ch])
    if flag:
        return True

    node.children.pop(ch)
    return bool(node.children) or node.is_end_of_word

score 0 · Accepted Answer

处理这种结构的一种方法是通过递归。在这种情况下，递归的好处在于它会压缩到 trie 的底部，然后将返回的值通过分支向上传递。

下面的函数就是这样做的。它转到叶子并删除该_end值，以防输入单词是另一个单词的前缀。然后它传递一个布尔值 ( boo)，表示它current_dict仍在外围分支中。一旦我们达到当前 dict 有多个孩子的点，我们删除适当的分支并将 boo 设置为，False这样每个剩余的递归都不会做任何事情。

def trie_trim(term, trie=SYNONYMS, prev=0):
    # checks that we haven't hit the end of the word
    if term:
        first, rest = term[0], term[1:]
        current_length = len(trie)
        next_length, boo = trie_trim(rest, trie=trie[first], prev=current_length)

        # this statement avoids trimming excessively if the input is a prefix because
        # if the word is a prefix, the first returned value will be greater than 1
        if boo and next_length > 1:
            boo = False

        # this statement checks for the first occurrence of the current dict having more than one child
        # or it checks that we've hit the bottom without trimming anything
        elif boo and (current_length > 1 or not prev):
            del trie[first]
            boo = False

        return current_length, boo

    # when we do hit the end of the word, delete _end
    else:
        del trie[_end]
        return len(trie) + 1, True

score 0 · Accepted Answer

有点长，但我希望这有助于回答您的问题：

class Trie:
    WORD_END = "$"
    
    def __init__(self):
        self.trie = {}

    def insert(self, word):
        cur = self.trie
        for char in word:
            if char not in cur:
                cur[char] = {}
            cur = cur[char]
        cur[Trie.WORD_END] = word

    def delete(self, word):
        def _delete(word, cur_trie, i=0):
            if i == len(word):
                if Trie.WORD_END not in cur_trie:
                    raise ValueError("'%s' is not registered in the trie..." %word)
                cur_trie.pop(Trie.WORD_END)
                if len(cur_trie) > 0:
                    return False
                return True
            if word[i] not in cur_trie:
                raise ValueError("'%s' is not registered in the trie..." %word)
            cont = _delete(word, cur_trie[word[i]], i+1)
            if cont:
                cur_trie.pop(word[i])
                if Trie.WORD_END in cur_trie:
                    return False
                return True
            return False
        _delete(word, self.trie)

t = Trie()
t.insert("bar")
t.insert("baraka")
t.insert("barakalar")

t.delete("barak") # raises error as 'barak' is not a valid WORD_END although it is a valid path.
t.delete("bareka") # raises error as 'e' does not exist in the path.
t.delete("baraka") # deletes the WORD_END of 'baraka' without deleting any letter as there is 'barakalar' afterwards.
t.delete("barakalar") # deletes until the previous word (until the first Trie.WORD_END; "$" - by going backwards with recursion) in the same path (until 'baraka').

python - 如何在python中实现trie的remove功能？

5 回答 5

Related

Reference