0

I'm trying to write code to help me at crossword puzzle. I'm experiencing the following errors.

1.When I try to use the much larger text file with my word list I receive no output only the small 3 string word list works.

2.The match test positive for the first two strings of my test word list. I need it to only test true for the entire words in my word list. [ SOLVED SOLUTION in the code bellow ]

lex.txt contains

dad

add

test

I call the code using the following.
./cross.py dad

[ SOLVED SOLUTION ] This is really slow.

#!/usr/bin/env python

import itertools, sys, re

sys.dont_write_bytecode = True
original_string=str(sys.argv[1])
lenth_of_string=len(original_string)
string_to_tuple=tuple(original_string)


with open('wordsEn.txt', 'r') as inF:
    for line in inF:
        for a in set (itertools.permutations(string_to_tuple, lenth_of_string)):
            joined_characters="".join(a)
            if re.search('\\b'+joined_characters+'\\b',line):
                print joined_characters
4

1 回答 1

0

让我们看一下您的代码。您获取输入字符串,创建它的所有可能排列,然后在字典中查找这些排列。

从我的角度来看,最显着的速度影响是你一遍又一遍地为字典中的每个单词创建单词的排列。这是非常耗时的。

除此之外,您甚至不需要排列。很明显,如果两个单词有相同的字母,则可以通过置换来相互“转换”。所以你的一段代码可以重新实现如下:

import itertools, sys, re
import time
from collections import Counter


sys.dont_write_bytecode = True
original_string=str(sys.argv[1]).strip()
lenth_of_string=len(original_string)
string_to_tuple=tuple(original_string)

def original_impl():
    to_return = []
    with open('wordsEn.txt', 'r') as inF:
        for line in inF:
            for a in set (itertools.permutations(string_to_tuple, lenth_of_string)):
                joined_characters="".join(a)
                if re.search('\\b'+joined_characters+'\\b',line):
                    to_return.append(joined_characters)
    return to_return

def new_impl():
    to_return = []
    stable_counter = Counter(original_string)
    with open('wordsEn.txt', 'r') as inF:
        for line in inF:
            l = line.strip()
            c = Counter(l)
            if c == stable_counter:
                to_return.append(l)
    return to_return

t1 = time.time()
result1 = original_impl()
t2 = time.time()
result2 = new_impl()
t3 = time.time()

assert result1 == result2

print "Original impl took ", t2 - t1, ", new impl took ", t3 - t2, "i.e. new impl is ", (t2-t1) / (t3 - t2), " faster"

对于包含 100 个单词的 8 个字母的字典,输出为:

Original impl took  42.1336319447 , new impl took  0.000784158706665 i.e. new impl is  53731.0006081  faster

字典中 10000 条记录的原始实现所消耗的时间是无法忍受的。

于 2013-12-27T09:57:36.077 回答