我正在尝试做一个 T9 系统,可能是手机中的系统,但使用的是键盘。我真的需要一些关于如何做到这一点的建议。
我已经找到了一个包含我想要使用的单词的文本文件。我希望能够将数字 2 按钮用作 'abc' 3 = 'def', 4='ghi'.. 等如果有人感到无聊或只是可以帮助我走上这条路,那将是非常重要的。
我正在尝试做一个 T9 系统,可能是手机中的系统,但使用的是键盘。我真的需要一些关于如何做到这一点的建议。
我已经找到了一个包含我想要使用的单词的文本文件。我希望能够将数字 2 按钮用作 'abc' 3 = 'def', 4='ghi'.. 等如果有人感到无聊或只是可以帮助我走上这条路,那将是非常重要的。
这是一个蛮力T9模仿者:
import itertools
n2l={2:'abc',3:'def',4:'ghi',5:'jkl',6:'mno',7:'pqrs',8:'tuv',9:'wxyz'}
with open('/usr/share/dict/words','r') as di: # UNIX 250k unique word list
all_words={line.strip() for line in di}
def combos(*nums):
t=[n2l[i] for i in nums]
return tuple(''.join(t) for t in itertools.product(*(t)))
def t9(*nums):
combo=combos(*nums)
return sorted(word for word in all_words if word.startswith(combo))
def try_it(*nums):
l=list(t9(*nums))
print(' {:10} {:10,} words'.format(','.join(str(i) for i in nums),len(l)))
if len(l)<100:
print(nums,'yields:',l)
try_it(2)
try_it(2,3)
try_it(2,3,4)
try_it(2,3,3,4)
try_it(2,3,3,4,5)
印刷:
2 41,618 words
2,3 4,342 words
2,3,4 296 words
2,3,3,4 105 words
2,3,3,4,5 16 words
(2, 3, 3, 4, 5) yields: ['aedile', 'aedileship', 'aedilian', 'aedilic', 'aedilitian',
'aedility', 'affiliable', 'affiliate', 'affiliation', 'bedikah', 'befilch',
'befile', 'befilleted', 'befilmed', 'befilth', 'cedilla']
您可以看到,从 25 万字(一个非常大的集合)开始需要 5 个数字才能收敛到可管理的大小。
虽然此代码是说明性的,并且可以帮助您入门,但您还需要做两件事:
拿 2
这是加权的快速尝试。我读了同一个大字典(常见的 Unix '单词'文件),然后用Project Gutenberg 的 The Adventures of Sherlock Holmes 对这些单词进行加权。你可以使用任何好的文本集合来做到这一点。
from collections import Counter
import re
import itertools
all_words=Counter()
n2l={2:'abc',3:'def',4:'ghi',5:'jkl',6:'mno',7:'pqrs',8:'tuv',9:'wxyz'}
with open('/usr/share/dict/words','r') as di: # UNIX 250k unique word list
all_words.update({line.strip() for line in di if len(line) < 6})
with open('holmes.txt','r') as fin: # http://www.gutenberg.org/ebooks/1661.txt.utf-8
for line in fin:
all_words.update([word.lower() for word in re.findall(r'\b\w+\b',line)])
def combos(*nums):
t=[n2l[i] for i in nums]
return tuple(''.join(t) for t in itertools.product(*(t)))
def t9(*nums):
combo=combos(*nums)
c1=combos(nums[0])
first_cut=(word for word in all_words if word.startswith(c1))
return (word for word in first_cut if word.startswith(combo))
def try_it(*nums):
s=set(t9(*nums))
n=10
print('({}) produces {:,} words. Top {}:'.format(','.join(str(i) for i in nums),
len(s),min(n,len(s))))
for i, word in enumerate(
[w for w in sorted(all_words,key=all_words.get, reverse=True) if w in s],1):
if i<=n:
print ('\t{:2}: "{}" -- weighted {}'.format(i, word, all_words[word]))
print()
try_it(2)
try_it(2,3)
try_it(2,3,4)
try_it(2,3,3,4)
try_it(6,6,8,3)
try_it(2,3,3,4,5)
印刷:
(2) produces 2,584 words. Top 10:
1: "and" -- weighted 3089
2: "a" -- weighted 2701
3: "as" -- weighted 864
4: "at" -- weighted 785
5: "but" -- weighted 657
6: "be" -- weighted 647
7: "all" -- weighted 411
8: "been" -- weighted 394
9: "by" -- weighted 372
10: "are" -- weighted 356
(2,3) produces 261 words. Top 10:
1: "be" -- weighted 647
2: "been" -- weighted 394
3: "before" -- weighted 166
4: "after" -- weighted 99
5: "between" -- weighted 60
6: "better" -- weighted 51
7: "behind" -- weighted 50
8: "certainly" -- weighted 45
9: "being" -- weighted 45
10: "bed" -- weighted 40
(2,3,4) produces 25 words. Top 10:
1: "behind" -- weighted 50
2: "being" -- weighted 45
3: "began" -- weighted 25
4: "beg" -- weighted 13
5: "ceiling" -- weighted 10
6: "beginning" -- weighted 7
7: "begin" -- weighted 6
8: "beggar" -- weighted 6
9: "begging" -- weighted 4
10: "begun" -- weighted 4
(2,3,3,4) produces 5 words. Top 5:
1: "additional" -- weighted 4
2: "addition" -- weighted 3
3: "addicted" -- weighted 1
4: "adding" -- weighted 1
5: "additions" -- weighted 1
(6,6,8,3) produces 11 words. Top 10:
1: "note" -- weighted 38
2: "notes" -- weighted 9
3: "move" -- weighted 5
4: "moved" -- weighted 4
5: "novel" -- weighted 4
6: "movement" -- weighted 3
7: "noted" -- weighted 2
8: "moves" -- weighted 1
9: "moud" -- weighted 1
10: "november" -- weighted 1
(2,3,3,4,5) produces 0 words. Top 0:
Dictionary<char,char[]> btnDict = new Dictionary<char,char[]>()
{
{'0',new char[]{'A','B','C'}},
{'1',new char[]{'D','E','F'}},
{'2',new char[]{'G','H','I'}},
{'3',new char[]{'J','K','L'}},
{'4',new char[]{'M','N','0'}},
{'5',new char[]{'P','Q'}},
{'6',new char[]{'R','S','T'}},
{'7',new char[]{'U','V','W'}},
{'8',new char[]{'X','Y','Z'}},
{'9',new char[]{'#','@','.'}}
};
public void PrintT9(string input)
{
char[] T9Suggestion = new char[input.Length];
FillPosition(T9Suggestion, 0, input.ToArray<char>());
}
void FillPosition(char[] array, int position, char[] input)
{
char[] alphabets = btnDict[input[position]];
foreach (char alphabet in alphabets)
{
array[position] = alphabet;
if (position == array.Length - 1)
{
string s = new string(array);
Console.Write(s+",");
}
else
{
FillPosition(array, position + 1, input);
}
}
}
}
一种天真的方法是生成由给定数字序列产生的所有可能的字母组合。请注意,这些组合本质上是 N 个字母元组的笛卡尔积,每个元组对应一个数字,N 是单词的长度。要获得所有组合,您可以使用itertools.product
,例如:
itertools.product(*(letters(d) for d in digits))
whereletters
是一个函数,因此letters('1')
返回'abc'
等等,digits
是代表一个单词的数字字符串。然后遍历您的单词列表并找到匹配项。