python - 如何从一个文件创建两级字典？

Question

我有一个看起来像这样的 csv 文件（实际上它更大）：

country;company1;company2;company3
finland;30;30;40
sweden;20;30;50
norway;10;20;70

我怎样才能最简单地阅读这个文件，这样我就可以获得这样的字典（字典中的字典）：

{ 'company1': {'finland': 30, 'sweden': 20, 'norway': 10}
'company2': {'finland': 30, 'sweden': 30, 'norway': 20}
... 
}

我首先尝试从文件的第一个（也就是公司）中创建一个单独的列表，然后从它们中创建一个字典。但是后来我在尝试阅读第一行之后的行并在已经创建的字典中创建字典时遇到了问题。

如果解释不好，我很抱歉，我是编码新手！

score 2 · Accepted Answer

如果您已经在这个项目中使用 pandas，@fsimonjetz 的回答非常棒。如果您不是，仅将它用于此任务是一个巨大的过度杀伤力，因为我们可以用简单的逻辑解析和转置数据。

import csv

from collections import defaultdict

output = defaultdict(dict)

with open('path/to/your.csv') as f:
    reader = csv.DictReader(f, delimiter=';')
    companies = reader.fieldnames[1:]
    for line in reader:
        country = line['country']
        for company in companies:
            output[company][country] = line[company]
            # or output.setdefault(company, {})[country] = line[company]
            # if you want 'output' to be a "normal" dict instead of defaultdict

print(dict(output))  # or just print(output) if you don't mind seeing OrderedDict
                     # repr

输出

{'company1': {'finland': '30', 'sweden': '20', 'norway': '10'}, 
 'company2': {'finland': '30', 'sweden': '30', 'norway': '20'}, 
 'company3': {'finland': '40', 'sweden': '50', 'norway': '70'}}

score 0 · Accepted Answer

一种方法是使用pandas，如果您需要使用表格数据，这无论如何都是一个好主意：

>>> import pandas as pd
>>> df = pd.read_csv('path/to/your.csv', delimiter=';', index_col='country')
>>> df.to_dict()
{'company1': {'finland': 30, 'sweden': 20, 'norway': 10},
 'company2': {'finland': 30, 'sweden': 30, 'norway': 20},
 'company3': {'finland': 40, 'sweden': 50, 'norway': 70}}

score -1 · Accepted Answer

I think using an OrderedDict would help a lot. You could do it in a way similar as this:

import csv
from collections import OrderedDict

with open('file.csv') as f:
    reader = csv.reader(f, delimiter=';')
    list_companies = next(reader)  # ['country', 'company1', 'company2', ...]
    companies_dict = OrderedDict()
    for company in list_companies[1:]:  # We forget about 'country'
        companies_dict[company] = {}  # We initialize the companies' dicts in order
    for country_values in reader:  # For every line after the first one
        country = country_values[0]  # We get the country at the beginning of every line
        for countries_dict, value in zip(companies_dict.values(), country_values[1:]):
            countries_dict[country] = value  # And set the value for every company in order

    print(dict(companies_dict))
    # {'company1': {'finland': '30', 'sweden': '20', 'norway': '10'}, ...}

The zip function might be new to you, it's a generator that basically takes two (or more) iterables and puts the elements in the same position together as a set. For example, zip(['finland', 'sweden' , 'england'], [30, 30, 40]) == [('finland', 30), ('sweden', 30), ('england', 40)]

This might not be exactly correct for your purpose but I believe it's a good enough approach to what you want to achieve.

python - 如何从一个文件创建两级字典？

3 回答 3

Related

Reference