0

检查 pkg 名称是否已经在另一个填充有结构的列表中的方法:

例如: test_pkg_list[] 将包含以下内容:

test_pkg_list[0]: 

name = git
version = 1.0
description = git package

test_pkg_list[1]:

name = opengl
version = 1.25
description = graphics

So on...

所以我的目标是检查列表中是否有重复的名称。

def _pkg_exists_in_list(self, list, pkg_name):
        if len(list) >= 1:
            if any(pkg_name in item for item in list):
                return True 
            else:
                return False
        else:
            return False

我传入两个参数:

test_pkg_list = [] #Note that this list does populate over time, at first its empty.
pkg_name = 'git'

#Call the method an pass the paramters
if self._pkg_exists_in_list(test_pkg_list, pkg_name) is False:
  #No duplicates found continue
else:
  #We found duplicate, stop.

我继续收到以下异常错误:

argument of type 'instance' is not iterable
4

2 回答 2

4

您的代码比它需要的复杂得多。

def _pkg_exists_in_list(self, the_list, pkg_name):
    return pkg_name in the_list

原因如下:

def _pkg_exists_in_list(self, list, pkg_name):  # don't call it list; don't overwrite built-ins
        if len(list) >= 1:   # Unnecessary; [] resolves Boolean to False
            if any(pkg_name in item for item in list): # can just check if an item is in a list using the `in` statement; no need to match every string to every string
                return True # Can just return the evaluation of an expression; poor form to explicitly return True/False after if statement
            else:
                return False
    else:
        return False

更新:

我想我应该指出,正如评论中所说的那样,item in mylist它与您的代码不完全相同any(mystring in item for item in mylist),而是等同于,更详细地说,any(mystring == item for item in mylist)。但是,我猜您实际上的意思是==in.

第二次更新:

虽然我喜欢Alex使用字典的想法,但可能没有必要。

import re
def _pkg_exists_in_list(self, the_list, pkg_name):
    return any(re.search(r'name = ' + pkg_name, item) for item in the_list)

我想这只是一个更有效的问题。

更新 2.1:

我赢了。

C:\Users\JJ>python -m timeit -s "p = ['''name = git\nversion = 1.0\nd
escription = git package''', '''name = opengl\nversion = 1.25\ndescription = gra
phics''']; import re" "dictlist = []" "for item in p:" " d = {}" " for line in i
tem.splitlines():" "  k, v = line.split('=')" "  d[k.strip()] = v.strip()" " dic
tlist.append(d)" "any('git' == x['name'] for x in dictlist)"
100000 loops, best of 3: 5.38 usec per loop

C:\Users\JJ>python -m timeit -s "p = ['''name = git\nversion = 1.0\nd
escription = git package''', '''name = opengl\nversion = 1.25\ndescription = gra
phics''']; import re" "any(re.search(r'name = ' + 'git', item) for item in p)"
1000000 loops, best of 3: 1.36 usec per loop
于 2013-01-06T07:35:18.967 回答
3

我会将您的字符串列表转换为dicts 列表,然后使用以下内容进行搜索:

test_pkg_list = [
"""name = git
version = 1.0
description = git package""",

"""name = opengl
version = 1.25
description = graphics"""]

dictlist = []

# Turn into a list of dictionaries
for item in test_pkg_list:
    d = {}
    for line in item.splitlines():
        k, v = line.split('=')
        d[k.strip()] = v.strip()
    dictlist.append(d)

print dictlist
# [
#    {'version': '1.0', 'name': 'git', 'description': 'git package'}, 
#    {'version': '1.25', 'name': 'opengl', 'description': 'graphics'}
# ]

searchname = 'git'

# Now search by name
print any(searchname == x['name'] for x in dictlist)

如果你不想转换成 dict 的麻烦,你可以做一些简单的事情:

>>> searchname = 'git'
>>> print any(searchname in line for line in test_pkg_list)
True
>>> searchname = 'empty'
>>> print any(searchname in line for line in test_pkg_list)
False
>>> searchname = 'version' # This is a problem
>>> print any(searchname in line for line in test_pkg_list)
True

# Or to ensure it only matches the name:
>>> print any('name = ' + searchname in line for line in test_pkg_list)
False
>>> searchname = 'git'
>>> print any('name = ' + searchname in line for line in test_pkg_list)
True
>>> searchname = 'version'
>>> print any('name = ' + searchname in line for line in test_pkg_list)
False

或者您可以只提取名称:

for line in test_pkg_list:
    firstline = line.splitlines()[0]
    name = firstline.split('=')[1].strip()
    print name

一条线:

>>> names = [line.splitlines()[0].split('=')[1].strip() for line in test_pkg_list]
['git', 'opengl']

然后比较:

>>> 'git' in names
True
>>> 'test' in names
False

性能与使用re: (65% 的速度)相当

>>> timeit.timeit("any(re.search(r'name = ' + 'git', item) for item in p)", "p = ['''name = git\nversion = 1.0\ndescription = git package''', '''name = opengl\nversion = 1.25\ndescription = graphics''']; import re")
2.338025673656987

>>> timeit.timeit("'git' in [line.splitlines()[0].split('=')[1].strip() for line in p]", "p = ['''name = git\nversion = 1.0\ndescription = git package''', '''name = opengl\nversion = 1.25\ndescription = graphics''']")
3.5689878827767245
于 2013-01-06T08:13:04.307 回答