-1
import os
import re
import sys
sys.stdout=open('f1.txt','w')
from collections import Counter
from glob import glob

def removegarbage(text):
    text=re.sub(r'\W+',' ',text)
    text=text.lower()
    return text

folderpath='d:/induvidual-articles'
counter=Counter()


filepaths = glob(os.path.join(folderpath,'*.txt'))

num_files = len(filepaths)

with open('topics.txt','r') as filehandle:
    lines = filehandle.read()
    words = removegarbage(lines).split()
   counter.update(words)


for word, count in counter.most_common():
    probability=count//num_files
    print('{}  {} {}'.format(word,count,probability))

我得到一个零除错误:对于 lineprobability=count//num_files 的浮点数除以零

我该如何纠正它?

我需要我的输出形式为:单词、计数、概率

请帮忙!

4

2 回答 2

8

你的num_files变量是0。

检查是否folderpath='d:/induvidual-articles'正确(induvidual拼写错误,但原始目录也可能拼写错误)。

于 2013-06-17T13:55:24.460 回答
1

检查路径是否存在。如果是,请检查目录是否包含至少 1 个 .txt 文件。并将整个 for 循环移动到 if 块内


if num_files:
    for word, count in counter.most_common():
       ...
else:
   print "No text files found!"

于 2013-06-17T14:03:16.067 回答