python - 来自给定文件的 Python Anagram Finder

Question

我已经在阳光下尝试了一切来解决这个问题，但一无所获。我什至不确定如何解决这个问题。说明如下...

您的程序将询问用户包含单词列表的文件的名称。单词列表被格式化为每行一个单词。• 对于每个单词，找出该单词的所有字谜（有些不止一个）。• 输出：报告有多少单词有 0、1、2 等字谜。输出形成最多字谜的单词列表（如果有多个具有相同最大长度的集合，则输出所有单词）。• 您应该使用适当的功能分解。

请记住，我已经编写了不到一个月的程序，所以尽可能地把所有东西都简化了。提前致谢。

score 3 · Accepted Answer

我认为这是家庭作业。你知道字谜只是一个词的排列。慢慢来：学习如何计算一个单词的字谜，然后再学习如何计算多个单词。以下交互式会话展示了如何计算单词的字谜。你可以从那里继续。

>>> # Learn how to calculate anagrams of a word
>>> 
>>> import itertools
>>> 
>>> word = 'fun'
>>> 
>>> # First attempt: anagrams are just permutations of all the characters in a word
>>> for permutation in itertools.permutations(word):
...     print permutation
... 
('f', 'u', 'n')
('f', 'n', 'u')
('u', 'f', 'n')
('u', 'n', 'f')
('n', 'f', 'u')
('n', 'u', 'f')
>>> 
>>> # Now, refine the above block to print actual words, instead of tuple
>>> for permutation in itertools.permutations(word):
...     print ''.join(permutation)
... 
fun
fnu
ufn
unf
nfu
nuf
>>> # Note that some words with repeated characters such as 'all'
>>> # has less anagrams count:
>>> word = 'all'
>>> for permutation in itertools.permutations(word):
...     print ''.join(permutation)
... 
all
all
lal
lla
lal
lla
>>> # Note the word 'all' and 'lla' each repeated twice. We need to
>>> # eliminate redundancy. One way is to use set:
>>> word = 'all'
>>> anagrams = set()
>>> for permutation in itertools.permutations(word):
...     anagrams.add(''.join(permutation))
... 
>>> anagrams
set(['lal', 'all', 'lla'])
>>> for anagram in anagrams:
...     print anagram
... 
lal
all
lla
>>> # How many anagrams does the word 'all' have?
>>> # Just count using the len() function:
>>> len(anagrams)
3
>>>

为了您的方便，我将上面的会话粘贴在这里。

更新

现在有亚伦的澄清。最低级别的问题是：如何确定两个单词是否是字谜？答案是：“当它们有相同数量的字母时。” （对我来说）最简单的方法是对所有字母进行排序并进行比较。

def normalize(word):
    word = word.strip().lower() # sanitize it
    word = ''.join(sorted(word))
    return word

# sort_letter('top') ==> 'opt'
# Are 'top' and 'pot' anagrams? They are if their sorted letters are the same:
if normalize('top') == normalize('pot'):
    print 'they are the same'
    # Do something

现在您已经知道如何比较两个单词，让我们处理一个单词列表：

>>> import collections
>>> anagrams = collections.defaultdict(list)
>>> words = ['top', 'fun', 'dog', 'opt', 'god', 'pot']
>>> for word in words:
...     anagrams[normalize(word)].append(word)
... 
>>> anagrams
defaultdict(<type 'list'>, {'opt': ['top', 'opt', 'pot'], 'fnu': ['fun'], 'dgo': ['dog', 'god']})
>>> for k, v in anagrams.iteritems():
...     print k, '-', v
... 
opt - ['top', 'opt', 'pot']
fnu - ['fun']
dgo - ['dog', 'god']

在上面的会话中，我们使用 anagrams（一个 defaultdict，与具有默认值的 dict 相同）来存储单词列表。键是排序后的字母。这意味着，anagrams['opt'] ==> ['top', 'opt', 'pot']。从那里，您可以分辨出哪个字谜最多。其余的应该很容易。

python - 来自给定文件的 Python Anagram Finder

1 回答 1

更新

Related

Reference