1

我想比较两个变量(字典和列表)的值。字典有一个嵌套结构,所以我必须遍历所有项目。我发现了简单的解决方案,但我很确定我可以以更好的方式做到这一点(使用 python)。简而言之,我想查找变量user_from_database中不存在的项目。user_from_client

我的解决方案:

#variable containing users from client side
users_from_client = {
  "0": {
    "COL1": "whatever",
    "COL2": "val1",
    "COL3": "whatever",
  },
  "1": {
    "COL1": "whatever",
    "COL2": "val2",
    "COL3": "whatever",
  },
  "3": {
    "COL1": "whatever",
    "COL2": "val3",
    "COL3": "whatever",
  }    
} 

#variable containing users from the database
users_from_database = [
  ["val1"],
  ["val2"],
  ["val5"],
  ["val7"]
]

#This function is used to find element from the nested dictionaries(d)
def _check(element, d, pattern = 'COL2'):
  exist = False
  for k, user in d.iteritems():
    for key, item in user.iteritems():
      if key == pattern and item == element:
        exist = True
  return exist

#Finding which users should be removed from the database  
to_remove = []
for user in users_from_db:
  if not _check(user[0], users_from_agent):
    if user[0] not in to_remove:
      to_remove.append(user[0])

#to_remove list contains: [val5, val7"] 

使用 python 方法给出相同结果的更好方法是什么?可能我不必补充说我是 python 的新手(我假设你可以看到上面的代码)。

4

3 回答 3

1

只需使用错误安全的字典查找

def _check(element, d, pattern = 'COL2'):
    for user in d.itervalues():
        if user.get(pattern) == element:
            return True
    return False

或作为一个班轮:

def _check(element, d, pattern = 'COL2'):
    return any(user.get(pattern) == element for user in d.itervalues())

或者尝试以单行方式完成整个工作:

#Finding which users should be removed from the database  
to_remove = set(
    name
    for name in users_from_database.itervalues()
    if not any(user.get('COL2') == name for (user,) in users_from_client)
)

assert to_remove == {"val5", "val7"}

sets 可以使它更加简洁(和高效):

to_remove = set(
    user for (user,) in users_from_database
) - set(
    user.get('COL2') for user in users_from_client
)

你的数据结构有点奇怪。考虑使用:

users_from_client = [
  {
    "COL1": "whatever",
    "COL2": "val1",
    "COL3": "whatever",
  }, {
    "COL1": "whatever",
    "COL2": "val2",
    "COL3": "whatever",
  }, {
    "COL1": "whatever",
    "COL2": "val3",
    "COL3": "whatever",   
  }
] 

#variable containing users from the database
users_from_database = set(
  "val1",
  "val2",
  "val5",
  "val7"
)

这将您的代码减少到:

to_remove = users_from_database - set(
    user.get('COL2') for user in users_from_client
)
于 2013-04-26T19:42:33.340 回答
0

例如,您可以创建倒排字典以进行快速查找并将其放在缓存中。

>>> from collections import defaultdict
>>> 
>>> users_inverted = defaultdict(list)
>>> for pk, user in users_from_client.iteritems():
...  for key in user.iteritems():
...   users_inverted[key].append(int(pk))
... 
>>> users_inverted
defaultdict(<type 'list'>, {('COL3', 'whatever'): [1, 0, 3], ('COL2', 'val1'): [0], ('COL1', 'whatever'): [1, 0, 3], ('COL2', 'val2'): [1], ('COL2', 'val3'): [3]})

然后查找用户会非常快:

>>> def _check(element, pattern = 'COL2'):
...  return bool(users_inverted[(pattern, element)])
>>> 
>>> _check('whatever', 'COL3')
True
>>> _check('whatever', 'COL333')
False

除了速度之外,您还可以获得每个属性对的用户列表

于 2013-04-26T20:20:35.930 回答
0

好吧,我不知道有什么超级优雅的方法可以做到这一点,但是您可以对代码进行一些小的改进。

首先,您没有使用k,因此您不妨只迭代这些值。其次,您无需跟踪exists,找到匹配项即可立即返回。最后,如果您正在检查键值对,您只需测试元组是否包含在项目中。

def _check(element, d, pattern = 'COL2'):
  for user in d.itervalues():
    if (pattern, element) in user.items():
      return True
  return False
于 2013-04-26T19:36:34.413 回答