c - 一大串字符串中最长的子串

Question

我有一个庞大的固定文本字符串库，以及一个经常更改的输入字符串 s。我需要在最短的时间内从库中的任何字符串到 s 中找到最长的匹配子字符串，从字符串 s 的开头开始。在一个完美的世界中，我还会从库中返回下一个最长的匹配，以及下一个最好的匹配，依此类推。这不是最长的公共字符串问题 - 我不是在为库中的所有字符串寻找最长的公共字符串......我只需要尽可能快地在 s 和庞大库中的每个字符串之间找到一个成对的最佳子字符串。

score 1 · Accepted Answer

After rereading, I think the best way to do this is probably to build a trie or a prefix tree of your big library of strings, then match s against that.

This has a couple of advantages. First, it stores your big library in (at least somewhat) compressed form. Second, it more or less automatically tells you all the strings that match a given input, not just the longest one.

It also fits your use case quite well -- while it takes quite a bit of work to build a trie or (especially) a prefix tree from the input, using it afterwards is quite fast.

score 0 · Accepted Answer

提前对列表进行排序（即编译时间或之前），然后使用 bsearch

http://www.cplusplus.com/reference/clibrary/cstdlib/bsearch/

找到匹配项后，您可以在附近前后查看，以获得尽可能多的“匹配项”。

顺便说一句，bsearch 不一定是最快的，因为它通过了比较器函数，但在标准 C 库中。

c - 一大串字符串中最长的子串

2 回答 2

Related

Reference