python - python 3 csv数据结构问题

Question

我有一个这样的csv文件

Category    Subcategory
-----------------------
cat         panther
cat         tiger
dog         wolf
dog         heyena
cat         lion
dog         beagle

我试图编写一个输出类似这样的脚本（顺序不重要）：

animals = [
              [['cat'], ['panther', 'tiger', 'lion']],
              [['dog'], ['wolf', 'heyena', 'beagle']]
          ]

到目前为止，我能够制作一个独特类别的列表，以及一个独特的子类别列表。

for p in infile:
    if(p[0] not in catlist):
        catlist.append(p[0])
    if(p[1] not in subcatlist) :
        subcatlist.append(p[1])

但是我在编写“如果类别'猫'在动物[]中，但'豹'不在'猫'中，则附加它”的逻辑时遇到了麻烦。

我玩过 zip() 和 dict() 一些，但我几乎只是在这里晃来晃去。对python相当陌生。使用 Python 3。

score 4 · Accepted Answer

如果要将键映射到某些值，则使用字典要容易得多。构建它们特别方便的是defaultdict。

假设您的 infile 将输入行拆分为空白，以下内容应该会有所帮助：

from collections import defaultdict

animals = defaultdict(list)

for p in infile:
    animals[p[0]].append(p[1])

score 2 · Accepted Answer

您可以考虑使用集合和字典。使用类别名称作为字典的键。所以对于每个p in infile, animals[p[0]].add(p[1])，假设 p0, p1 是类型和物种。

这样做的好处是，如果 'Panther' 多次作为 'Cat' 出现，您不必检查它是否已经存在于 'Cat' 列表中，因为 set 类型将确保您拥有一组唯一的元素。

>>> from collections import defaultdict
>>> animals = defaultdict(set)
>>> animals['Cat'].add('Panther')
>>> animals
defaultdict(<class 'set'>, {'Cat': {'Panther'}})
>>> animals['Cat'].add('Lion')
>>> animals
defaultdict(<class 'set'>, {'Cat': {'Lion', 'Panther'}})
>>> animals['Cat'].add('Panther')
>>> animals
defaultdict(<class 'set'>, {'Cat': {'Lion', 'Panther'}})

与使用列表相比：

>>> moreanimals = defaultdict(list)
>>> moreanimals['Cat'].append('Panther')
>>> moreanimals
defaultdict(<class 'list'>, {'Cat': ['Panther']})
>>> moreanimals['Cat'].append('Panther')
>>> moreanimals
defaultdict(<class 'list'>, {'Cat': ['Panther', 'Panther']})

python - python 3 csv数据结构问题

2 回答 2

Related

Reference