8

I have a trie which I am using to do some string processing. I have a simple compiler which generates trie from some data. Once generated, my trie won't change at run time.

I am looking for an approach where I can persist the trie in a file and load it effectively. I have looked at sqllite to understand how they are persisting b-treebut their file format looks bit advanced and I may not need all of those.

It'd be helpful if someone can provide some ideas to persist and read the trie. I am programming using C.

4

4 回答 4

12

I did some research and found the following little gems online:

  1. trie.h
  2. trie.c

A working trie with serialization and deserialization. It was originally written for use in Python (there's a corresponding triemodule.c for tying it to Python), but it's pure C; you could mine it for ideas or use it as you wish.

Update:

It appears the links are no longer working. I'll keep the originals up, but here are the links in the wayback machine:

  1. trie.h
  2. trie.c
于 2010-04-03T17:53:20.390 回答
5

Assuming your entire data structure fits in memory, a recursive serialization approach is simplest. Sqllite works with data structures that won't fit in memory, so it is probably overkill to try copying their methods.

Here is example pseudocode for reading/writing a node. It works by recursively reading/writing the child nodes. It has nothing trie-specific, and should work for other tree data structures as well.

void writeNode(Node *node)
    write node data to file
    write node.numOfChildren to file
    for each child:
        writeNode(child)

Node *readNode()
    Node *node = allocateNewNode()
    read node data from file
    read node.numOfChildren from file
    for (i=0; i<node.numOfChildren; i++)
        Node *child = readNode()
        node.addChild(child)
于 2010-04-03T17:54:47.647 回答
1

If all of your nodes are the same size then you can just enumerate your nodes (root = 0) and write each of them to a file at their index. While writing them you will have to change their references to other nodes to those nodes' indexes, though. You will probably also need a NULL value. You could use -1 or you could use (root = 1) and (NULL = 0).

You will probably also be able to compress these nodes somewhat by changing their pointer fields to be smaller types.

If your nodes are different sizes then it's more complicated.

于 2010-04-05T07:46:10.230 回答
0

Once you generated a trie in memory, it's not complex to write it into a file, with mmap, it's like that we re-generate the trie in a continous memory block, or you can write each node into a file in a recursive function.

于 2021-10-29T03:05:05.250 回答