0

How can I search any given txt file for anagrams and display the anagrams for every word in that file.

So far I can read the file, extract every single word and alphabetically sort every single word. I've tried making two dicts one dict containing the actual words in the text file as keys and the alphabetically sorted version of the words as values, and another dict of the dictionary file I have that is set up the same way.

Using both these dictionaries I've been unable to find an efficient way to get the following output for every word in the input list:

'eerst':  steer reste trees

If I try to loop through all the words in the given list, and inside each loop, loop inside the dictionary, looking and recording the anagrams, it takes too much time and is very inefficient. If I try the following:

for x in input_list:
    if x in dictionary:
        print dictionary[x]

I only get the first anagram of every word and nothing else. If that made any sense, any suggestions would be immensely helpful.

4

2 回答 2

1

我不确定我在想的是否是您当前在代码中所做的事情,但我想不出更好的方法:

from collections import defaultdict

words = 'dog god steer reste trees dog fred steer'.split() # or words from a file

unique_words = set(words)
anagram_dict = defaultdict(list)
for word in unique_words:
    key = "".join(sorted(word))
    anagram_dict[key].append(word)

for anagram_list in anagram_dict.values():
    if len(anagram_list) > 1:
        print(*anagram_list)

这将打印(以任意顺序):

god dog
steer trees reste

如果您想获取字典键值,您可以使最终循环超过 theitems而不是valuesof anagram_dict(如果您愿意,您可以打印出没有像'fred'上面示例中那样的任何字谜的单词)。请注意,由于有了set,重复的单词不会被多次排序。

运行时间应该是O(M + U*N*log(N))单词MU数量,唯一单词的数量以及N它们的平均长度。除非您要对有机化学教科书或其他有很多长词的东西进行分类,否则输入的长度应该非常接近线性。

于 2013-10-10T04:37:43.953 回答
0

这是另一种获取字谜的方法itertools.groupby

from itertools import groupby
words = list_of_words

for k, g in groupby(sorted(words, key=sorted), key=sorted):
    g = list(g)
    if len(g) > 1:
        print(g)

big-O 复杂度不如通常的列表字典方法那么好,但它仍然相当有效,而且当你大声朗读它时听起来很有趣

于 2013-10-10T05:10:23.497 回答