33

My program needs to read csv files which may have 1,2 or 3 columns, and it needs to modify its behaviour accordingly. Is there a simple way to check the number of columns without "consuming" a row before the iterator runs? The following code is the most elegant I could manage, but I would prefer to run the check before the for loop starts:

import csv
f = 'testfile.csv'
d = '\t'

reader = csv.reader(f,delimiter=d)
for row in reader:
    if reader.line_num == 1: fields = len(row)
    if len(row) != fields:
        raise CSVError("Number of fields should be %s: %s" % (fields,str(row)))
    if fields == 1:
        pass
    elif fields == 2:
        pass
    elif fields == 3:
        pass
    else:
        raise CSVError("Too many columns in input file.")

Edit: I should have included more information about my data. If there is only one field, it must contain a name in scientific notation. If there are two fields, the first must contain a name, and the second a linking code. If there are three fields, the additional field contains a flag which specifies whether the name is currently valid. Therefore if any row has 1, 2 or 3 columns, all must have the same.

4

5 回答 5

30

您可以使用itertools.tee

itertools.tee(iterable[, n=2])
从单个可迭代对象中返回 n 个独立的迭代器。

例如。

reader1, reader2 = itertools.tee(csv.reader(f, delimiter=d))
columns = len(next(reader1))
del reader1
for row in reader2:
    ...

请注意,删除reader1完成后的引用很重要 - 否则tee必须将所有行存储在内存中,以防您next(reader1)再次调用

于 2012-07-03T11:52:07.037 回答
20

这似乎也有效:

import csv

datafilename = 'testfile.csv'
d = '\t'
f = open(datafilename,'r')

reader = csv.reader(f,delimiter=d)
ncol = len(next(reader)) # Read first line and count columns
f.seek(0)              # go back to beginning of file
for row in reader:
    pass #do stuff
于 2012-07-03T12:06:40.347 回答
4

如果用户向您提供列数较少的 CSV 文件会怎样?是否使用默认值?

如果是这样,为什么不用空值来扩展行呢?

reader = csv.reader(f,delimiter=d)
for row in reader:
    row += [None] * (3 - len(row))
    try:
        foo, bar, baz = row
    except ValueError:
        # Too many values to unpack: too many columns in the CSV
        raise CSVError("Too many columns in input file.")

现在 bar 和 baz 至少会是None,异常处理程序将处理任何超过 3 个项目的行。

于 2012-07-03T11:52:47.430 回答
3

我会建议这样一个简单的方法:

with open('./testfile.csv', 'r') as csv:
     first_line = csv.readline()
     your_data = csv.readlines()

ncol = first_line.count(',') + 1 
于 2019-06-21T01:28:54.390 回答
-1

我将按如下方式重建它(如果文件不是太大):

import csv
f = 'testfile.csv'
d = '\t'

reader = list(csv.reader(f,delimiter=d))
fields = len( reader[0] )
for row in reader:
    if fields == 1:
        pass
    elif fields == 2:
        pass
    elif fields == 3:
        pass
    else:
        raise CSVError("Too many columns in input file.")
于 2012-07-03T12:09:29.193 回答