dawgdic
是一个很棒的 DAWG 库,但它有一个明显的缺点,因为它是静态的(不可更新),并且必须以按字母顺序排序的字符串构造。如果构建 DAWG 的原始数据很大(几千兆字节),则 DAWG 的初始构建涉及对大量字符串进行排序可能需要太多资源。
是否有提供内存高效结构的库,dawgdic
允许从未排序的字典进行构造?
目前,我认为没有任何库允许从未排序的字典构建 DAWG。
但是,经过大量搜索,我找到了这篇论文,“Incremental Construction of Minimal Acyclic Finite-State Automata” ,我认为它正是你想要的。或许你可以在读完这篇文章后自己制作一个库,分享给大家!
编辑:你看过这个问题吗?
我发现了一些很棒的库,它们允许从未排序的数据进行在线构建,尽管它们不是基于 DAWG:
I currently know of no C++ implementations of a DAWG which support the construction from non-sorted data, but if you're open to creating your own solution which has such a feature, Incremental Construction of Minimal Acyclic Finite-State Automata (2000) is a paper which basically lays out the algorithm behind it.
Alternatively, if you're open to porting solutions from other languages, it may be worth your while to check out MDAG, a Java implementation of the data structure. It supports both on-the-fly string addition and string removal, which is exactly what you are looking for. The code is also easy to follow, and extensively commented, so porting it should be a fairly simple task.
Disclaimer: I am the author of MDAG :) .