python - 如何比较同一个字典中的两个以上的键？

Question

我正在解析 100 个遵循类似格式的文件。从该文件中，我创建了一个字典，其中可能包含两个键或两个以上的键，其中值在一个集合中。无论如何，总会有一个包含“Y”值的键。对于该键，我需要删除其他键中存在的任何重复值。

我有一个类似的问题，我只有两个键，它已经解决了。Python：如何比较字典中不同键的值，然后删除重复项？

当字典有两个键但不超过两个时，以下代码可以正常工作。

for d, p in zip(temp_list, temp_search_list):
    temp2[d].add(p) #dictionary with delvt and pin names for cell
for test_d, test_p in temp2.items():
    if not re.search('Y', ' '.join(test_p)) :
         tp = temp2[test_d]
    else:
         temp2[test_d] = [t for t in temp2[test_d] if t not in tp]

使用三个键的示例字典，但取决于解析的文件，我可以有更多的键。

temp2 = {'0.1995': set(['X7:GATE', 'X3:GATE', 'IN1']), '0.199533': set(['X4:GATE', 'X8:GATE', 'IN2']), '0.399': set(['X3:GATE', 'X5:GATE', 'X1:GATE', 'IN0', 'X4:GATE', 'Y', 'X8:GATE'])}

预期输出：

temp2
{'0.1995': set(['X7:GATE', 'X3:GATE','IN1']), '0.199533': set(['X4:GATE', 'X8:GATE', 'IN2']), '0.399': set(['X5:GATE', 'X1:GATE', 'IN0', 'Y'])}

score 1 · Accepted Answer

您只需 1 个实际上必须遍历整个数据集的循环即可完成整个操作。

from collections import defaultdict

target = None
result = defaultdict(set)
occurance_dict = defaultdict(int)
# Loop over the inputs, building the result, counting the
# number of occurances for each value as you go and marking
# the key that contains 'Y'
for key, value in zip(temp_list, temp_search_list):
    # This is here so we don't count values twice if there
    # is more than one instance of the value for the given
    # key.  If we don't do this, if a value only exists in
    # the 'Y' set, but it occurs multiple times in the input,
    # we would still filter it out later on.
    if value not in result[key]:
        occurance_dict[value] += 1
        result[key].add(value)
    if value == 'Y':
        if target is None:
            target = key
        else:
            raise ValueError('Dataset contains more than 1 entry containing "Y"')
if target is None:
    raise ValueError('Dataset contains no entry containing "Y"')
# Filter the marked ('Y' containing) entry; if there is more than
# 1 occurance of the given value, then it exists in another entry
# so we don't want it in the 'Y' entry
result[target] = {value for value in result[target] if occurance_dict[value] == 1}

Yesoccurance_dict与 a 非常相似collections.Counter，但如果我不需要，我宁愿不迭代数据集两次（即使它发生在幕后），而且我们也不计算给定的第二次出现相同键的值。

score 1 · Accepted Answer

您需要将对Y值的搜索与对其余数据的搜索分开。当你已经在构建时，你真的想这样做temp2，以避免不必要的循环：

y_key = None
for d, p in zip(temp_list, temp_search_list):
    temp2[d].add(p)
    if p == 'Y':
        y_key = d

接下来，删除重复值最容易set.difference_update()用于就地更改集合：

y_values = temp2[y_key]
for test_d, test_p in temp2.iteritems():
    if test_d == y_key:
        continue
    y_values.difference_update(test_p)

使用您的示例temp2，并假设y_key在构建时已经设置temp2，第二个循环的结果是：

>>> temp2 = {'0.1995': set(['X7:GATE', 'X3:GATE', 'IN1']), '0.199533': set(['X4:GATE', 'X8:GATE', 'IN2']), '0.399': set(['X3:GATE', 'X5:GATE', 'X1:GATE', 'IN0', 'X4:GATE', 'Y', 'X8:GATE'])}
>>> y_key = '0.399'
>>> y_values = temp2[y_key]
>>> for test_d, test_p in temp2.iteritems():
...     if test_d == y_key:
...         continue
...     y_values.difference_update(test_p)
... 
>>> temp2
{'0.1995': set(['X7:GATE', 'X3:GATE', 'IN1']), '0.199533': set(['X4:GATE', 'X8:GATE', 'IN2']), '0.399': set(['X5:GATE', 'X1:GATE', 'IN0', 'Y'])}

请注意这些值X3:GATE和是如何从集合X4:GATE中删除的。X8:GATE0.399

score 0 · Accepted Answer

我希望我能想出一个可爱的方法来使用列表推导和/或 itertools 模块，但我做不到。我将从以下内容开始：

dict1 = {1: set([1,2,3,4,5]),
         2: set([3,4,5,6]),
         3: set([1,7,8,9])
        }

list1 = dict1.items()
newDict = {}
for i in range(len(list1)):
    (k1,set1) = list1[i]
    newDict[k1] = set1
    for j in range(i+1,len(list1)):
        (k2, set2) = list1[j]
        newDict[k2] = set2 - (set1 & set2)

print newDict 
# {1: set([1, 2, 3, 4, 5]), 2: set([6]), 3: set([8, 9, 7])}

如果您有庞大的字典，这可能不是超级有效。

另一个想法：集合是否太长以至于你不能形成一个collection.Counter？您将首先通过字典并剥离每组中的成员并将它们粘贴在计数器中（可能可以通过列表理解在一行中完成）。然后，循环通过originalDict.iteritems(). 到一个新的字典中，您可以插入其值是原始集合的键（即 0.1995），过滤（&我认为使用如上），以便它只包含计数器中计数 > 0 的条目。对于您插入新字典，将它们从计数器中删除（即使它们的计数>1）。归根结底，您仍然需要循环两次。

score 0 · Accepted Answer

对我来说似乎很简单。首先找到'Y'在其值集中具有 a 的键，然后遍历所有其他值集并将它们从该值集中删除。

temp2 = {'0.1995':  set(['X7:GATE', 'X3:GATE', 'IN1']),
         '0.199533':set(['X4:GATE', 'X8:GATE', 'IN2']),
         '0.399':   set(['X3:GATE', 'X5:GATE', 'X1:GATE', 'IN0', 'X4:GATE', 'Y', 'X8:GATE'])}

y_key = None
for k,v in temp2.iteritems():
    if 'Y' in v:
        y_key = k
        break

if y_key is None:
    print "no 'Y' found in values"
    exit()

result = {}
for k,v in temp2.iteritems():
    if k != y_key:
        temp2[y_key] -= v

print 'temp2 = {'
for k,v in temp2.iteritems():
    print '  {!r}: {!r},'.format(k,v)
print '}'

输出：

temp2 = {
  '0.1995': set(['X7:GATE', 'X3:GATE', 'IN1']),
  '0.199533': set(['X4:GATE', 'X8:GATE', 'IN2']),
  '0.399': set(['X5:GATE', 'X1:GATE', 'IN0', 'Y']),
}

python - 如何比较同一个字典中的两个以上的键？

4 回答 4

Related

Reference