0

我有一个默认字典,它具有 3 层嵌入,稍后将用于三元组。

counts = defaultdict(lambda:defaultdict(lambda:defaultdict(lambda:0)))

然后,我有一个 for 循环遍历文档并创建每个字母的计数(以及 bicounts 和 tricounts)

counts[letter1][letter2][letter3] = counts[letter1][letter2][letter3] + 1

我想添加另一层,以便我可以指定字母是辅音还是元音。

我希望能够在辅音与元音上运行我的二元组和三元组,而不是在字母表的每个字母上运行,但我不知道该怎么做。

4

2 回答 2

0

假设您需要计算元音和辅音的顺序,您可以简单地保留一张不同的地图。

如果你有一个函数is_vowel(letter)True如果它letter是元音并且False它是辅音,你可以这样做。

vc_counts[is_vowel(letter1)][is_vowel(letter2)][is_vowel(letter3)] = \
vc_counts[is_vowel(letter1)][is_vowel(letter2)][is_vowel(letter3)] + 1
于 2017-02-17T00:46:38.580 回答
0

我不确定您到底想做什么,但我认为嵌套 dict 方法不像使用平面 dict 那样干净,您可以在其中通过组合字符串(即d['ab'],而不是d['a']['b'])进行键控。我还输入了代码来检查二元/三元是否仅由元音/辅音或混合组成。

代码:

from collections import defaultdict


def all_ngrams(text,n):
    ngrams = [text[ind:ind+n] for ind in range(len(text)-(n-1))]
    ngrams = [ngram for ngram in ngrams if ' ' not in ngram]
    return ngrams


counts = defaultdict(int)
text = 'hi hello hi this is hii hello'
vowels = 'aeiouyAEIOUY'
consonants = 'bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ'

for n in [2,3]:
    for ngram in all_ngrams(text,n):
        if all([let in vowels for let in ngram]):
            print(ngram+' is all vowels')

        elif all([let in consonants for let in ngram]):
            print(ngram+' is all consonants')

        else:
            print(ngram+' is a mixture of vowels/consonants')

        counts[ngram] += 1

print(counts)

输出:

hi is a mixture of vowels/consonants
he is a mixture of vowels/consonants
el is a mixture of vowels/consonants
ll is all consonants
lo is a mixture of vowels/consonants
hi is a mixture of vowels/consonants
th is all consonants
hi is a mixture of vowels/consonants
is is a mixture of vowels/consonants
is is a mixture of vowels/consonants
hi is a mixture of vowels/consonants
ii is all vowels
he is a mixture of vowels/consonants
el is a mixture of vowels/consonants
ll is all consonants
lo is a mixture of vowels/consonants
hel is a mixture of vowels/consonants
ell is a mixture of vowels/consonants
llo is a mixture of vowels/consonants
thi is a mixture of vowels/consonants
his is a mixture of vowels/consonants
hii is a mixture of vowels/consonants
hel is a mixture of vowels/consonants
ell is a mixture of vowels/consonants
llo is a mixture of vowels/consonants
defaultdict(<type 'int'>, {'el': 2, 'his': 1, 'thi': 1, 'ell': 2, 'lo': 2, 'll': 2, 'ii': 1, 'hi': 4, 'llo': 2, 'th': 1, 'hel': 2, 'hii': 1, 'is': 2, 'he': 2})
于 2017-02-17T00:45:19.903 回答