0

所以我试图打开并读取没有字段名称的 csv 文件。根据我所做的研究,我很确定它是用 UTF-8 编码的。我的 csv 有这种格式:

1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 

我使用以下内容打开并阅读它:

def parseCSVCounter(csv_file):

with codecs.open(csv_file, "r", "utf-8-sig","strict", -1) as f:
    f = str(f)
    relayreader = csv.reader(f, delimiter=',')
    for row in relayreader:
        print(row)

        try:
            #row[0] = unicode(row[0], 'latin-1')
            counter(row)
            print('starting row..')

        except UnicodeDecodeError, e:
            print('something went wrong1')
            print e

        except Exception, e:
            print('something went wrong')
            print e

这会产生

Starting Command..
['<']
something went wrong
invalid literal for int() with base 10: '<'
['o']
something went wrong
invalid literal for int() with base 10: 'o'
........
starting row..
['9']
starting row..
['3']
starting row..
['8']
starting row..
['2']
starting row..
['8']
starting row..
['>']
something went wrong
invalid literal for int() with base 10: '>'`

我削减了这一点以证明我的观点。似乎它会自动为我生成字段名称。我csv.DictReader(fieldnames = 'foo')可以按顺序指定字段名称。如何csv.reader()忽略缺少字段名称?

4

1 回答 1

3

不需要打电话str(f)直接使用文件对象:

with codecs.open(csv_file, "r", "utf-8-sig", "strict") as f:
    relayreader = csv.reader(f, delimiter=',')

您正尝试将str(f)CSV 文件的输出读取为以下形式的字符串:

<open file '/path/to/file', mode 'rb' at 0x105f10d20>

您可以从错误输出中看到这一点;它一直拼写出<o等,一直到内存地址数字和结束>

请注意,utf-8-sig编解码器可以处理出现在文件开头的 UTF-8 编码的BOM,但除非期望该 BOM 存在,否则普通UTF-8编解码器就可以了。

于 2013-04-17T09:56:16.250 回答