c - 从电话数字键盘进行字母搜索

Question

我正在实现基于电话键盘的字母搜索，例如电话键盘1

当用户键入 2 时，我得到 {A, B, C} 的组合。当用户输入 23 时，我在组合中得到 {AD, AE, AF, BD, BE, BF, CD, CE, CF}，依此类推。如果我继续输入并进行组合，我会得到数千个组合，这会使搜索过程非常缓慢。所以现在我想实现一个算法来删除像CF BD CD这样的不合逻辑的组合，我的意思是逻辑上没有人的名字以这些组合开头，也许是两个没有元音的辅音。所以这样我想缩小我的搜索范围。任何人都知道这种状态机，用 C 实现？

score 4 · Accepted Answer

您可以根据正在搜索的数据集构建一组有效前缀。匹配部分输入应该很容易。

score 1 · Accepted Answer

Keep in mind that when it comes to linguistic data "illogical" is not a good proxy for "unlikely." This is particularly true when it comes to names. As an example, according to a standard definition of "consonant" in English, my last name starts with four consonants. If it were to be written after a German fashion, it would start with five. When thinking about such issues it is useful to keep in mind that:

Sounds are not letters, and letters are not sounds: in most orthographic systems, the mapping of letters to sounds is not 1:1
Many languages have unexpected syllabic nuclei: Tamazight Berber, for instance, allows syllables where the sound m plays the role of the syllabic nucleous, like the vowel generally do in English. So a Berber name can look like CCmC (where C stands for consonants) and be perfect in that language. It is not unlikely that a person of Berber origin would then use the similar orthography in English, which a naive system would rule out as "illogical"
Finally, many systems for writing foreign names and words in English use di-graphs or tri-graphs (two letter and three letter combinations) for representing the sounds of the foreign language in English: this can create what looks like illicit consontant clusters. We know that English does that (sh represents one sound, see point 1), but this is particularly true when transcribing foreign words.

So unless you know very well the orthographic rules for the names you are expecting, you are likely to rule out legitimate names using a naive system.

c - 从电话数字键盘进行字母搜索

2 回答 2

Related

Reference