string - Efficiently searching large list of strings

Question

I have a large list of strings which needs to be searched by the user of an iPhone/Android app. The strings are sorted alphabetically, but that isn't actually all that useful, since a string should be included in the results if the search query falls anywhere inside the string, not just the beginning. As the user is typing their search query, the search should be updated to reflect the results of what they currently entered. (e.g. if they type "cat", it should display the results for "c", "ca" and "cat" as they type it).

My current approach is the following:

I have a stack of "search results", which starts out empty. If the user types something to make the search query longer, I push the current search results onto the stack, then search through only the current search results for the new ones (it's impossible for something to be in the full string list but not the current results in this case).

If the user hits backspace, I only need to pop the search results off of the stack and restore them. This can be done nearly instantaneously.

This approach is working great for "backwards" searching (making the search query shorter) and cases where the search query is already long enough for the number of results to be low. However, it still has to search though the full list of strings in O(n) time for each of the first few letters the user types, which is quite slow.

One approach I've considered is to have a pre-compiled list of results of all possible search queries of 2 or 3 letters. The problem with this approach is that it would require 26^2 or 26^3 such lists, and would take up a pretty large amount of space.

Any other optimizations or alternate approaches you can think of?

score 4 · Accepted Answer

您应该考虑使用前缀树 (trie)来制作预先计算的列表。我不确定在子字符的基础上显示“c”、“ca”和“cat”的结果是个好主意。例如，假设用户正在搜索“吃”这个词。你的算法必须找到所有包含“e”的单词，然后是“ea”，最后是“eat”；其中大部分对用户没有用处。对于电话应用程序，如果您按单词进行操作可能会更好。可以对多词字符串进行标记，因此在“a large stake”中搜索“stake”可以正常工作，但不能搜索“take”。

score 1 · Accepted Answer

我注意到谷歌和我想象的其他人在仅按下 1 或 2 个字符时不会提供完整列表。在您的情况下，也许一个好的起点是仅在用户输入至少 3 个字符时才开始填充搜索查询结果。

对于以后的版本，如果它很重要，您可以从 Google 的做法中获得启发，并进行更复杂的处理：跟踪以前用户选择的实际条目，并按频率对这些条目进行排序。然后，每天在您的服务器上运行一个 cron 作业来填充一个小型数据库表，其中前 10 个条目以每个字母开头，如果只按下了 1 或 2 个字母，则使用这个小表中的结果而不是扫描完整的列表。

score 1 · Accepted Answer

1

您可以使用压缩后缀树

于 2012-05-03T00:35:49.477 回答

string - Efficiently searching large list of strings

3 回答 3

Related

Reference