.net - 在 .NET 中实现 Trie 的明智方法是什么？

Question

我得到了trie背后的概念。但是在实现方面我有点困惑。

我能想到的最明显的构造Trie类型的方法是Trie维护一个 internal Dictionary<char, Trie>。事实上，我已经以这种方式编写了一个，并且它有效，但是......这似乎有点矫枉过正。我的印象是 trie 应该是轻量级的，并且每个节点Dictionary<char, Trie>都有一个单独的节点对我来说似乎不是很轻量级。

有没有更合适的方法来实现我所缺少的这种结构？

更新：好的！根据 Jon 和 leppie 的非常有用的意见，这是我迄今为止提出的：

(1) 我有Trie类型，它有一个私有_nodes成员 type Trie.INodeCollection。

(2)Trie.INodeCollection接口有以下成员：

interface INodeCollection
{
    bool TryGetNode(char key, out Trie node);
    INodeCollection Add(char key, Trie node);
    IEnumerable<Trie> GetNodes();
}

(3) 该接口共有三种实现：

class SingleNode : INodeCollection
{
    internal readonly char _key;
    internal readonly Trie _trie;

    public SingleNode(char key, Trie trie)
    { /*...*/ }

    // Add returns a SmallNodeCollection.
}

class SmallNodeCollection : INodeCollection
{
    const int MaximumSize = 8; // ?

    internal readonly List<KeyValuePair<char, Trie>> _nodes;

    public SmallNodeCollection(SingleNode node, char key, Trie trie)
    { /*...*/ }

    // Add adds to the list and returns the current instance until MaximumSize,
    // after which point it returns a LargeNodeCollection.
}

class LargeNodeCollection : INodeCollection
{
    private readonly Dictionary<char, Trie> _nodes;

    public LargeNodeCollection(SmallNodeCollection nodes, char key, Trie trie)
    { /*...*/ }

    // Add adds to the dictionary and returns the current instance.
}

(4) 当 aTrie被第一次构造时，它的_nodes成员是null。根据上述步骤，第一次调用Add创建一个SingleNode，随后调用从那里开始。Add

这有意义吗？从某种意义上说，这感觉像是一种改进，它在一定程度上减少了 a 的“体积” （节点在拥有足够数量的子节点之前Trie不再是成熟的对象）。Dictionary<char, Trie>然而，它也变得更加复杂。是不是太纠结了？我是否采取了一条复杂的路线来实现本应直截了当的事情？

score 4 · Accepted Answer

好吧，您需要每个节点都有一些可以有效实现IDictionary<char, Trie>. 您可以编写自己的自定义实现，根据它有多少子节点来改变其内部结构：

对于单个子节点，只使用 achar和 aTrie
对于较小的数字，请使用 aList<Tuple<char, Trie>>或 aLinkedList<Tuple<char,Trie>>
对于较大的数字，请使用Dictionary<char, Trie>

（刚刚看过 leppie 的回答，我相信这是他所说的那种混合方法。）

score 3 · Accepted Answer

在我看来，将其实现为字典并不是实现 Trie - 那是实现字典字典。

当我实现了一个 trie 时，我按照 Damien_The_Unbeliever 建议的方式完成了它（+1 那里）：

public class TrieNode
{
  TrieNode[] Children = new TrieNode[no_of_chars];
}

理想情况下，这要求您的 trie 仅支持由指示的有限字符子集，no_of_chars并且您可以将输入字符映射到输出索引。例如，如果支持 AZ，那么您自然会将 A 映射到 0，将 Z 映射到 25。

然后，当您需要添加/删除/检查节点的存在时，您可以执行以下操作：

public TrieNode GetNode(char c)
{
  //mapping function - could be a lookup table, or simple arithmetic
  int index = GetIndex(c);
  //TODO: deal with the situation where 'c' is not supported by the map
  return Children[index];
}

在实际情况中，我已经看到了这种优化，例如，AddNode 将采用 aref TrieNode以便可以按需更新节点并自动将其放入父 TrieNode 的Children正确位置。

您也可以使用三元搜索树代替，因为 trie 的内存开销可能非常疯狂（特别是如果您打算支持所有 32k 的 unicode 字符！）并且 TST 性能相当令人印象深刻（并且还支持前缀和通配符搜索作为以及汉明搜索）。同样，TST 可以原生支持所有 unicode 字符，而无需进行任何映射；因为它们处理大于/小于/等于操作而不是绝对索引值。

我从这里获取代码并稍作修改（它是在泛型之前编写的）。

我想您会对 TST 感到惊喜；一旦我实现了一个，我就完全远离了 Tries。

唯一棘手的事情是保持 TST 平衡。Tries 没有的问题。

score 3 · Accepted Answer

如果您的字符来自有限的集合（例如只有大写拉丁字母），那么您可以存储一个 26 元素数组，每次查找只是

Trie next = store[c-'A']

其中 c 是当前查找字符。

score 2 · Accepted Answer

有几种方法，但使用单链表可能是最简单和轻量级的。

我会做一些测试来查看每个节点的子节点数量。如果不多（比如 20 或更少），链接列表方法应该比哈希表更快。您还可以根据子节点的数量采用混合方法。

.net - 在 .NET 中实现 Trie 的明智方法是什么？

4 回答 4

Related

Reference