0

假设我有一个名为 website.csv 的 csv 文件:

facebook.com        a social network website
twitter.com         another social network website
facebook.com        a social website
facebook.com        a website
twitter.com         another network website 
youtube.com         a website like facebook but to share videos
youtube.com         a video sharing website

我想创建一个字典,其中包含网站的名称(作为键),值是描述中的单词字典,将每个单词作为键,单词出现的频率作为值,并且应该在每个网站的描述中包含变量“ TOTAL ”作为单词数的总和。

这是我创建的代码:

def webdescription(data):
    import csv
    data = website.csv
    csvreader = csv.reader(data)
    d = defaultdict(int)
    dfinal = {}
    for line in data:
        description_list = line[1].split()
        dfinal[line[0]] = d
        for each in description_list:
            d[each] += 1
            d['_TOTAL_'] = sum(d.itervalues())
    return dfinal

预期的输出应该是:

{'facebook.com': {'a':3, 'social': 2, 'network':1, 'website':3, '_TOTAL_': 9}
{'twitter.com': {'another':2, 'social':1, 'network':2, 'website':2, '_TOTAL_': 7}
{'youtube.com': {'a':2, 'website':2, 'like': 1, 'sharing':1, 'share':1, 'video':1,     
'videos': 1, 'facebook': 1, '_TOTAL_':10}

而且我似乎没有得到预期的输出。帮助将不胜感激!

4

2 回答 2

2

您总是使用相同的 d。您应该为每个新行创建一个新对象,例如

for line in data:
    description_list = line[1].split()
    d = dfinal[line[0]] = defaultdict(int)
于 2013-05-07T08:10:46.203 回答
1

website.csv

facebook.com,a social network website
twitter.com,another social network website
facebook.com,a social website
facebook.com,a website
twitter.com,another network website 
youtube.com,a website like facebook but to share videos
youtube.com,a video sharing website

>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> with open('website.csv') as f:
        for name, desc in csv.reader(f):
            words = desc.split()
            d[name].update(words)
            d[name]['TOTAL'] += len(words)


>>> d
defaultdict(<class 'collections.Counter'>, {'facebook.com': Counter({'TOTAL': 9, 'a': 3, 'website': 3, 'social': 2, 'network': 1}), 'twitter.com': Counter({'TOTAL': 7, 'website': 2, 'network': 2, 'another': 2, 'social': 1}), 'youtube.com': Counter({'TOTAL': 12, 'a': 2, 'website': 2, 'sharing': 1, 'like': 1, 'videos': 1, 'share': 1, 'but': 1, 'to': 1, 'facebook': 1, 'video': 1})})
于 2013-05-07T08:09:01.010 回答