2

我正在尝试做一个 T9 系统,可能是手机中的系统,但使用的是键盘。我真的需要一些关于如何做到这一点的建议。

我已经找到了一个包含我想要使用的单词的文本文件。我希望能够将数字 2 按钮用作 'abc' 3 = 'def', 4='ghi'.. 等如果有人感到无聊或只是可以帮助我走上这条路,那将是非常重要的。

4

3 回答 3

4

这是一个蛮力T9模仿者:

import itertools 

n2l={2:'abc',3:'def',4:'ghi',5:'jkl',6:'mno',7:'pqrs',8:'tuv',9:'wxyz'}

with open('/usr/share/dict/words','r') as di:  # UNIX 250k unique word list 
    all_words={line.strip() for line in di}

def combos(*nums):
    t=[n2l[i] for i in nums]
    return tuple(''.join(t) for t in itertools.product(*(t)))

def t9(*nums):
    combo=combos(*nums)
    return sorted(word for word in all_words if word.startswith(combo))

def try_it(*nums):
    l=list(t9(*nums))
    print('  {:10} {:10,} words'.format(','.join(str(i) for i in nums),len(l)))
    if len(l)<100:
        print(nums,'yields:',l)

try_it(2)
try_it(2,3)
try_it(2,3,4)
try_it(2,3,3,4)
try_it(2,3,3,4,5)

印刷:

  2              41,618 words
  2,3             4,342 words
  2,3,4             296 words
  2,3,3,4           105 words
  2,3,3,4,5          16 words
(2, 3, 3, 4, 5) yields: ['aedile', 'aedileship', 'aedilian', 'aedilic', 'aedilitian', 
    'aedility', 'affiliable', 'affiliate', 'affiliation', 'bedikah', 'befilch', 
    'befile', 'befilleted', 'befilmed', 'befilth', 'cedilla']

您可以看到,从 25 万字(一个非常大的集合)开始需要 5 个数字才能收敛到可管理的大小。

虽然此代码是说明性的,并且可以帮助您入门,但您还需要做两件事:

  1. 一组较小的单词;-)和
  2. 将出现在 UI 的 T9 自动完成区域中的更常见单词的排名。(即,“附属”或“附属”比“aedile”或“befilth”更可能是(2、3、3、4、5)中所需的词。这些需要以某种方式进行排名......)

拿 2

这是加权的快速尝试。我读了同一个大字典(常见的 Unix '单词'文件),然后用Project Gutenberg 的 The Adventures of Sherlock Holmes 对这些单词进行加权。你可以使用任何好的文本集合来做到这一点。

from collections import Counter
import re
import itertools 

all_words=Counter()
n2l={2:'abc',3:'def',4:'ghi',5:'jkl',6:'mno',7:'pqrs',8:'tuv',9:'wxyz'}
with open('/usr/share/dict/words','r') as di:  # UNIX 250k unique word list 
     all_words.update({line.strip() for line in di if len(line) < 6}) 

with open('holmes.txt','r') as fin:   # http://www.gutenberg.org/ebooks/1661.txt.utf-8
    for line in fin:
         all_words.update([word.lower() for word in re.findall(r'\b\w+\b',line)])

def combos(*nums):
    t=[n2l[i] for i in nums]
    return tuple(''.join(t) for t in itertools.product(*(t)))

def t9(*nums):
    combo=combos(*nums)
    c1=combos(nums[0])
    first_cut=(word for word in all_words if word.startswith(c1))
    return (word for word in first_cut if word.startswith(combo))

def try_it(*nums):
    s=set(t9(*nums))
    n=10
    print('({}) produces {:,} words. Top {}:'.format(','.join(str(i) for i in nums),
            len(s),min(n,len(s))))
    for i, word in enumerate(
          [w for w in sorted(all_words,key=all_words.get, reverse=True) if w in s],1):
        if i<=n:
            print ('\t{:2}:  "{}" -- weighted {}'.format(i, word, all_words[word]))

    print()        

try_it(2)
try_it(2,3)
try_it(2,3,4)
try_it(2,3,3,4)
try_it(6,6,8,3)   
try_it(2,3,3,4,5)      

印刷:

(2) produces 2,584 words. Top 10:
     1:  "and" -- weighted 3089
     2:  "a" -- weighted 2701
     3:  "as" -- weighted 864
     4:  "at" -- weighted 785
     5:  "but" -- weighted 657
     6:  "be" -- weighted 647
     7:  "all" -- weighted 411
     8:  "been" -- weighted 394
     9:  "by" -- weighted 372
    10:  "are" -- weighted 356

(2,3) produces 261 words. Top 10:
     1:  "be" -- weighted 647
     2:  "been" -- weighted 394
     3:  "before" -- weighted 166
     4:  "after" -- weighted 99
     5:  "between" -- weighted 60
     6:  "better" -- weighted 51
     7:  "behind" -- weighted 50
     8:  "certainly" -- weighted 45
     9:  "being" -- weighted 45
    10:  "bed" -- weighted 40

(2,3,4) produces 25 words. Top 10:
     1:  "behind" -- weighted 50
     2:  "being" -- weighted 45
     3:  "began" -- weighted 25
     4:  "beg" -- weighted 13
     5:  "ceiling" -- weighted 10
     6:  "beginning" -- weighted 7
     7:  "begin" -- weighted 6
     8:  "beggar" -- weighted 6
     9:  "begging" -- weighted 4
    10:  "begun" -- weighted 4

(2,3,3,4) produces 5 words. Top 5:
     1:  "additional" -- weighted 4
     2:  "addition" -- weighted 3
     3:  "addicted" -- weighted 1
     4:  "adding" -- weighted 1
     5:  "additions" -- weighted 1

(6,6,8,3) produces 11 words. Top 10:
     1:  "note" -- weighted 38
     2:  "notes" -- weighted 9
     3:  "move" -- weighted 5
     4:  "moved" -- weighted 4
     5:  "novel" -- weighted 4
     6:  "movement" -- weighted 3
     7:  "noted" -- weighted 2
     8:  "moves" -- weighted 1
     9:  "moud" -- weighted 1
    10:  "november" -- weighted 1

(2,3,3,4,5) produces 0 words. Top 0:
于 2012-08-22T17:14:09.577 回答
0
Dictionary<char,char[]> btnDict = new Dictionary<char,char[]>()
        {
            {'0',new char[]{'A','B','C'}},
            {'1',new char[]{'D','E','F'}},
            {'2',new char[]{'G','H','I'}},
            {'3',new char[]{'J','K','L'}},
            {'4',new char[]{'M','N','0'}},
            {'5',new char[]{'P','Q'}},
            {'6',new char[]{'R','S','T'}},
            {'7',new char[]{'U','V','W'}},
            {'8',new char[]{'X','Y','Z'}},
            {'9',new char[]{'#','@','.'}}
        };

        public void PrintT9(string input)
        {
            char[] T9Suggestion = new char[input.Length];
            FillPosition(T9Suggestion, 0, input.ToArray<char>());
        }

        void FillPosition(char[] array, int position, char[] input)
        {
            char[] alphabets = btnDict[input[position]];
            foreach (char alphabet in alphabets)
            {
                array[position] = alphabet;
                if (position == array.Length - 1)
                {
                    string s = new string(array);
                    Console.Write(s+",");
                }
                else
                {
                    FillPosition(array, position + 1, input);
                }
            }

        }
    }

http://coding4geeks.blogspot.com/2015/01/t9-dictionary.html

于 2015-01-26T09:24:24.470 回答
0

一种天真的方法是生成由给定数字序列产生的所有可能的字母组合。请注意,这些组合本质上是 N 个字母元组的笛卡尔积,每个元组对应一个数字,N 是单词的长度。要获得所有组合,您可以使用itertools.product,例如:

itertools.product(*(letters(d) for d in digits))

whereletters是一个函数,因此letters('1')返回'abc'等等,digits是代表一个单词的数字字符串。然后遍历您的单词列表并找到匹配项。

于 2012-08-22T14:15:04.130 回答