4

我的字典中有 300000 个单词(实际上以 txt 格式(新行分隔)保存在我的 Android 设备的 sdcard 上)。我想构建一个数据结构,该结构将花费尽可能少的时间从我的 txt 文件中插入单词(String-s)到这个数据结构中。而且这个 DS 必须非常快地检查字典(这个 DS)中是否存在单词。我已经尝试了几个内置的 DS,最快的 IMO 是 TreeSet。是否有任何其他(非内置)DS 可以更快地插入/创建 DS 并且与 TreeSet 一样用于搜索?

还有一件事是我可以通过重新排列我的 txt 文件(以正确的顺序放置单词)来“帮助”TreeSet 更快地插入。

问候

4

1 回答 1

5

Firstly, well done on experimenting to find the best structure for your application. Often people will argue without trying out various options to get real performance data.

If you want to save build time, and your words file does not change very often, the obvious build speed improvement is caching the data structure. Whatever data structure you are using, build the structure once, and then store the structure to the SD-card (rather than just storing the strings). Standard java.util structures can be stored using Serialization.

If you want fastest build time, and your word list is sorted in alphabetical order, or can be, then you could just store in a String array. Build time will be very quick again, and search time will be similar to a TreeSet (using Arrays.binarySearch()).

If you want faster lookup, you might want to check out Perfect Hashing or Tries, but these aren't in the Java standard libraries.

A trie will be much more memory efficient than either of these, which may make it quicker. (Information on finding an implementation)

I'm surprised TreeSet is faster than HashSet in your experiments, which means that you might be operating in a situation where memory allocation is expensive. Did you remember to set the initial capacity when you allocated the HashSet? Remember to avoid an expensive rehash, you need to set the initial capacity to at least number of items/0.75 (the load factor).

于 2011-06-16T11:50:42.057 回答