36

获取包含标题行的数据文件并将该行读入命名元组以便可以通过标题名称访问数据行的最佳方法是什么?

我正在尝试这样的事情:

import csv
from collections import namedtuple

with open('data_file.txt', mode="r") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", ", ".join(i for i in reader[0]))
    next(reader)
    for row in reader:
        data = Data(*row)

reader 对象不可下标,因此上面的代码会抛出一个TypeError. 将文件头读入命名元组的pythonic方法是什么?

4

3 回答 3

48

采用:

Data = namedtuple("Data", next(reader))

并省略该行:

next(reader)

将此与基于以下 martineau 评论的迭代版本相结合,该示例变为 Python 2

import csv
from collections import namedtuple
from itertools import imap

with open("data_file.txt", mode="rb") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", next(reader))  # get names from column headers
    for data in imap(Data._make, reader):
        print data.foo
        # ...further processing of a line...

对于 Python 3

import csv
from collections import namedtuple

with open("data_file.txt", newline="") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", next(reader))  # get names from column headers
    for data in map(Data._make, reader):
        print(data.foo)
        # ...further processing of a line...
于 2012-01-25T17:30:05.817 回答
28

请看一看csv.DictReader。基本上,它提供了在您查找时从第一行获取列名的能力,然后,您可以使用字典按名称访问行中的每一列。

如果由于某种原因您仍然需要以 a 的形式访问行collections.namedtuple,那么将字典转换为命名元组应该很容易,如下所示:

with open('data_file.txt') as infile:
    reader = csv.DictReader(infile)
    Data = collections.namedtuple('Data', reader.fieldnames)
    tuples = [Data(**row) for row in reader]
于 2012-01-25T18:05:18.523 回答
0

我建议这种方法:

import csv
from collections import namedtuple

with open("data.csv", 'r') as f:
        reader = csv.reader(f, delimiter=',')
        Row = namedtuple('Row', next(reader))
        rows = [Row(*line) for line in reader]

如果您使用 Pandas,解决方案会变得更加优雅:

import pandas as pd
from collections import namedtuple

data = pd.read_csv("data.csv")
Row = namedtuple('Row', data.columns)
rows = [Row(*row) for index, row in data.iterrows()]

在这两种情况下,您都可以通过字段名称与记录进行交互:

for row in rows:
    print(row.foo)
于 2020-04-18T18:54:34.117 回答