python - 在python中对列表中的项目进行分组

Question

我有 60 条记录，其中有一列“skillsList”（“skillsList”是技能列表）和“IdNo”。我想找出有多少“IdNo”有共同的技能。

我怎么能在python中做到这一点。我不知道如何计算特定列表项的数量。将不胜感激任何帮助。

>>> a = open("C:\Users\abc\Desktop\Book2.csv")
>>> type(a1)
<type 'str'>

我打印 a1 时的一些文字

>>> a1
'IdNo, skillsList\n1,"u\'Training\', u\'E-Learning\', u\'PowerPoint\', u\'Teaching\', u\'Accounting\', u\'Team Management\', u\'Team Building\', u\'Microsoft Excel\', u\'Microsoft Office\', u\'Financial Accounting\', u\'Microsoft Word\', u\'Customer Service\'"\n2,"u\'Telecommunications\', u\'Data Center\', u\'ISO 27001\', u\'Management\', u\'BS25999\', u\'Technology\', u\'Information Technology...\', u\'Certified PMP\\xae\', u\'Certified BS25999 Lead...\'"\n3,"u\'Market Research\', u\'Segmentation\', u\'Marketing Strategy\', u\'Consumer Behavior\', u\'Experience Working with...\'"

谢谢

score 0 · Accepted Answer

struct = [{id: 1, skills: ['1', '2', '3']}, {...}]
for el in struct:
   if '1' in el.get('skills'):
      print 'id %s get this skill' % el.get('id')

score 0 · Accepted Answer

您可以建立技能的倒排索引。因此，您构建了一个字典，其中每个键作为技能名称，键的值是一组IdNo. 这样你也可以找出哪些IdNos有一些技能

代码看起来像

skills = {}
with open('filename.txt') as f:
    for line in f.readlines():
        items = [item.strip() for item in line.split(',')]
        idNo = items[0]
        skill_list = items[1:]
        for skill in skill_list:
            if skill in skills:
                skills[skill].add(idNo)
            else:
                skills[skill] = set([idNo, ])

现在你有skills字典，看起来像

skills = {
    'Training': set(1,2,3),
    'Powerpoint': set(1,3,4),
    'E-learning': set(9,10,11),
    .....,
    .....,

}

现在您看到 1,3,4 具有Powerpoint技能，如果您想知道idNo谁同时具有“培训”和“Powerpoint”技能，您可以做

skills['Powerpoint'].intersection(skills['Training'])

如果你想知道idNo谁有“培训”或“Powerpoint”技能，你可以做

skills['Powerpoint'].union(skills['Training'])

score 0 · Accepted Answer

你必须自己做。您可以使用技能字典，将 dic 中的每个项目初始化为零。然后遍历您的记录并在看到时增加技能项目。

python - 在python中对列表中的项目进行分组

3 回答 3

Related

Reference