1

I'd like to have my top k frequent words in my FreqDist. Or words that has a freq > p.

How do I do this?

After viewing the doc, I didn't find anything like threshold or cut. Also, the freq() function can only be called for each individual bin.

Of course I can write ad-hoc code like

[(x,f) for x in FreqDist.samples if FreqDist.freq(x) > p]

but it doesn't look elegant.

4

1 回答 1

2

根据您提到的文档FreqDist,类似 dict 的方法(keys()items()等)返回样本和/或其频率按频率降序排序。因此,您可以使用如下代码仅过滤掉频率足够高的样本:

above_p = []
for (x, f) in FreqDist.iteritems():
    if not f > p:
        break
    above_p.append((x, f))

或单线:

from itertools import takewhile
above_p = [(x, f) for (x, f) in takewhile(lambda x, f: f > p, FreqDist.iteritems())]

至于前k:

top_k = FreqDist.items()[:k]

或者:

from itertools import islice
top_k = list(islice(FreqDist.iteritems(), k))
于 2013-11-01T10:57:30.423 回答