1

我正在尝试构建字典,其中每个键都有一个值,即字典本身。以下代码的问题是,当 new if 完成时,它不会将新项目附加到字典中

dict_features = {}
def regexp_features(fileids):
    for fileid in fileids:
        if re.search(r'мерзавец|подлец', agit_corpus.raw(fileid)):
            dict_features[fileid] = {'oskorblenie':'1'}
        else:
            dict_features[fileid] = {'oskorblenie':'0'}

        if re.search(r'честны*|труд*', agit_corpus.raw(fileid)):
            dict_features[fileid] = {'samoprezentacia':'1'}
        else:
            dict_features[fileid] = {'samoprezentacia':'0'}
    return dict_features

结果是字典

{'neagitacia/20124211.txt': {'samoprezentacia': '0'}, 'agitacia/discreditacia1.txt': {'samoprezentacia': '0'}

但是我需要

{'neagitacia/20124211.txt': {'oskorblenie':'1', 'samoprezentacia': '0'}, 'agitacia/discreditacia1.txt': {'oskorblenie':'0', 'samoprezentacia': '0'}
4

1 回答 1

1

您正在重写相同的值fileid

在您的代码中,

if re.search(r'мерзавец|подлец', agit_corpus.raw(fileid)):
    dict_features[fileid] = {'oskorblenie':'1'}
else:
    dict_features[fileid] = {'oskorblenie':'0'}

if re.search(r'честны*|труд*', agit_corpus.raw(fileid)):
    dict_features[fileid] = {'samoprezentacia':'1'}
else:
    dict_features[fileid] = {'samoprezentacia':'0'}

对于一个fileid,您创建第一个,然后使用第二个if-else构造替换它。(这两个if-else结构都设置了值,因为 theif或 theelse将始终被执行)

您可能正在寻找的是一个defaultdictwithdict作为默认值。类似于-

>>> from collections import defaultdict
>>> a = defaultdict(dict)
>>> a['abc']
{}
>>> a['abc']['def'] = 1
>>> a
defaultdict(<type 'dict'>, {'abc': {'def': 1}})
>>> a['abc']['fgh'] = 2
>>> a
defaultdict(<type 'dict'>, {'abc': {'fgh': 2, 'def': 1}})

因此,您的代码可能会更改为

dict_features = defaultdict(dict)
def regexp_features(fileids):
    for fileid in fileids:
        if re.search(r'мерзавец|подлец', agit_corpus.raw(fileid)):
            dict_features[fileid]['oskorblenie'] = '1'
        else:
            dict_features[fileid]['oskorblenie'] = '0'

        if re.search(r'честны*|труд*', agit_corpus.raw(fileid)):
            dict_features[fileid]['samoprezentacia'] = '1'
        else:
            dict_features[fileid]['samoprezentacia'] = '0'
    return dict_features
于 2013-08-02T18:15:26.773 回答