c - 在 C 中将单词添加到 Trie 结构中

Question

嗨，我正在尝试为英语到西班牙语词典创建一个 trie 结构。

这是我到目前为止所拥有的：

struct s_trie_node
{
    char * translation; /* NULL if node not a word */
    char * word;

    /* pointer array to child nodes */
    struct s_trie_node * children[UCHAR_MAX+1];
};

int add_word(const char * word, char * translation) { 
    /* TODO: add word to trie structure
    If word exists, append translation to existing string
    Be sure to store a copy of translation, since
    the string is reused by load_dictionary()
    */
    struct s_trie_node * curr = proot;
    char * currLetter = word;
    while (currLetter != '\0') {
        while ((curr -> children) != NULL) {
            char * currChildLetter = ((curr -> children) -> word);
            char * copyWord = word;
            while (copyWord == currChildLetter) {
                copyWord++;
                currChildLetter++;
            }
            if (currChildLetter == '\0') {
                curr = (curr -> children);
                break;
            }
            (curr -> children)++;
        }
        currLetter++
    }
}

我不知道从这里去哪里。任何帮助，将不胜感激。谢谢！

score 4 · Accepted Answer

好吧，我认为你的 add_word 函数吃得太多了。试着先把它分解成更小的问题。一旦你解决了小问题，大问题可能会变得更容易。

首先，我们必须实际创建 Trie 节点（尝试在 add_word 中执行此操作会很难看）。所以，现在让我们创建一个函数来执行此操作：

/* Allocates, initializes, and returns a new Trie node. The node will contain 
 * a copy of word and trans, rather than use them directly. The children array
 * will be initialized to all NULL's.
 */
struct s_trie_node * trie_node_create(const char * prefix, const char * trans)
{
    struct s_trie_node * n = malloc(sizeof(struct s_trie_node));
    int i;

    n->word = prefix ? strdup(prefix) : strdup("");
    n->translation = trans ? strdup(trans) : NULL;
    for (i = 0; i < UCHAR_MAX + 1; i++)
        n->children[i] = NULL;
    return n;
}

您应该注意，我们正在创建字符串的副本，而不是直接使用它们。这使这个 Trie 库的用户的生活变得轻松，并且还允许我们在不再需要它们时释放它们，而不必担心用户在其他地方使用它们。然而，这是一个重要的决定，因为这意味着我们有责任确保这些字符串在以后被释放。此外，我们正在使用 strdup，这意味着我们假设传递给我们的字符串是“干净的”（即以 NULL 字符结尾）。

无论如何，现在我们可以创建节点了。让我们继续讨论更多与 Trie 相关的问题。显然，您将需要能够找出 2 个字符串的公共前缀的长度。如果你不能做到这一点，你就不能做其他任何事情。因此，我们可以使用以下函数：

/* Returns length of common prefix of v & w. */
int match(char * v, char * w)
{
    char * start = v;
    for (; *v && *v == *w; v++, w++);
    return v - start;
}

这是非常基本的，但是是必需的。当我们将单词与前缀节点进行比较时，知道公共前缀的长度将告诉我们它是完全匹配还是部分匹配。完全匹配意味着我们只需要更新节点。部分匹配可能会导致子节点必须被“拆分”为 2，并且很可能意味着我们必须在 Trie 中走得更远。这种分裂节点的想法是至关重要的。如果列表中只有一个单词，例如。“hello”，那么将只有 2 个节点：根节点（空字符串）和根节点的唯一子节点“hello”。如果我们现在希望添加另一个与“hello”具有共同前缀的单词，例如。“hey”，我们需要将“hello”分成2个节点：“he”，根节点的子节点，“llo”，“he”的子节点。

/* Creates a new node that is a child of n. The word stored at n will be 
 * truncated after location (index into n->word), with the remaining suffix 
 * of the word belonging to the new child of n.
 */
struct s_trie_node * trie_node_split(struct s_trie_node * n, int location)
{
    struct s_trie_node * child;
    char * prefix;
    char * suffix;
    int len = strlen(n->word);

    if (location <= 0)
        return NULL;
    if (location >= len)
        return n;

    prefix = strndup(n->word, location);
    suffix = strndup(n->word + location, len - location);

    child = trie_node_create(suffix, n->translation);
    memcpy(child->children, n->children,
        sizeof(struct s_trie_node *) * UCHAR_MAX);
    free(n->word);
    n->word = prefix;
    n->translation = NULL;
    n->children[0] = child;
    n->children[1] = NULL;
    return n;
}

通过查找 2 个字符串之间的公共前缀长度、创建节点和拆分节点的能力，我们拥有了操作和遍历 Trie 所需的所有基本操作。

现在，递归通常与 Trie 结构很好地结合在一起。因此，假设给您一个 trie（根节点）和一个要在 Trie 中匹配的单词。这个词要么与我们的一个孩子共享一个共同的前缀，要么不会。如果没有，那么我们可以简单地创建一个值为该单词的新节点并将其添加到我们的子列表中。但是，如果确实如此，那么我们会遇到几种不同的情况，具体取决于公共前缀的长度。

案例1：单词与我们的孩子完全匹配（即单词相同）。在这种情况下，我们的孩子与这个词完全匹配，我们可以简单地更新翻译并停止（无需创建任何新节点）。

案例2：这个词，就其整体而言，是我们孩子的前缀。在这种情况下，我们需要将孩子分成两部分；第一个是我们的单词，第二个是之前存储在我们孩子的单词的其余部分。第一部分成为新的孩子，我们将翻译存储在其中，第二部分成为我们孩子的孩子。

案例3：我们的孩子，就其整体而言，是单词的前缀。在这种情况下，我们从 word 中删除公共前缀（将 word 缩短为仅后缀）。然后我们将单词的后缀添加到以我们的孩子为根的子树中（即递归）。

情况4：公共前缀比两个词都短。在这种情况下，我们需要先拆分孩子。前缀成为新的孩子，后缀成为孩子的孩子。然后我们从单词中删除前缀，然后将单词的其余部分添加到以我们的孩子为根的子树中（即递归）。

这就是全部4个案例。有了这个，我们现在可以轻松地编写一个函数来处理这些情况，使用递归遍历特里树。

/* Add a translation to the Trie rooted at root. */
int trie_add_word(struct s_trie_node * root, char * word, char * trans)
{
    struct s_trie_node ** n;
    int loc;

    for (n = root->children; *n; n++) {
        /* Find length of longest common prefix. */
        loc = match(word, (*n)->word);

        if (!loc) {
            continue;

        } else {
            if (loc != strlen((*n)->word))
                trie_node_split(*n, loc);

            word += loc;
            if (!*word) {
                if ((*n)->translation)
                    free((*n)->translation);
                (*n)->translation = strdup(trans);
                return 0;
            }

            return trie_add_word(*n, word, trans);
        }
    }

    /* Failed to find any children that matched. */
    if (n - root->children >= UCHAR_MAX) {
        fprintf(stderr, "Ran out of room to store children in.");
        return -1;
    }
    *n = trie_node_create(word, trans);
    return 0;
}

就是这样！我想，答案很长，但很有趣。

c - 在 C 中将单词添加到 Trie 结构中

1 回答 1

Related

Reference