0

我是一个菜鸟编码器,在使用 Python csv 模块解析 csv 文件时遇到问题。问题是我的输出显示该行中的字段值对于除第一个字段之外的所有字段都是“无”的。

这是我试图解析的丑陋 csv 文件中的第一行(其余行遵循相同的格式):

0,213726,NORTH FORK SLATE CREEK,CAMPGROUND,North Fork Slate Creek Campground | Idaho |      Public Lands Information Center | Recreation Search, http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268,NA,NA,NA,NA,(208)839-2211,"Nez Perce National Forest  Operating Days: 305<br>Total Capacity: 25<br>

5 campsites at the confluence of Slate Creek and its North Fork. A number of trails form loops in the area. These are open to most traffic, including trail bikes.","From Slate Creek, go 8 miles east on Forest Road 354.",NA,http://www.publiclands.org/explore/reg_nat_forest.php?region=7&forest_name=Nez%20Perce%20National%20Forest,NA,NA,NA,45.6,-116.1,NA,N,0,1103,2058

这是我为解析 csv 文件而编写的代码(它不能正常工作!):

import csv

#READER SETTINGS
f_path = '/Users/foo'
f_handler = open(f_path, 'rU').read().replace('\n',' ')
my_fieldnames = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 
'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 
'col16', 'col17', 'col18', 'col19', 'col20', 'col21', 'col22', 'col23', 
'col24','col25']
f_reader = csv.DictReader(f_handler, fieldnames=my_fieldnames, delimiter=',', dialect=csv.excel)

#NOW I TRY TO PARSE THE CSV FILE
i = 0
for row in f_reader:
    print "my first row was %s" % row
    i = i + 1
    if i > 0:
        break

这是输出。它说除了第一个之外的所有字段都是空白的,我不知道为什么!任何建议将不胜感激。

my first row was {'col14': None, 'col15': None, 'col16': None, 
'col17': None, 'col10': None, 'col11': None, 'col12': None, 
'col13': None, 'col18': None, 'col19': None, 'col2': None, 'col8': None, 
'col9': None, 'col6': None, 'col7': None, 'col4': None, 'col5': None, 
'col3': None, 'col1': '0', 'col25': None, 'col24': None, 
'col21': None, 'col20': None, 'col23': None, 'col22': None}
4

3 回答 3

3

尝试这个:

#!/usr/bin/env python

import csv

my_fieldnames = ['col' + str(i) for i in range(1,26)]

with open('input.csv', 'rb') as csvfile:
    my_reader = csv.DictReader(csvfile, fieldnames=my_fieldnames,
                               delimiter=',', dialect=csv.excel,
                               quoting=csv.QUOTE_NONE)

    for row in my_reader:
        for k,v in row.iteritems():
            print k, v

第一行输入的输出(记住字典是无序的):

col14 None
col15 None
col16 None
col17 None
col10 NA
col11 (208)839-2211
col12 "Nez Perce National Forest  Operating Days: 305<br>Total Capacity: 25<br>
col13 None
col18 None
col19 None
col8 NA
col9 NA
col6  http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268
col7 NA
col4 CAMPGROUND
col5 North Fork Slate Creek Campground | Idaho |      Public Lands Information Center | Recreation Search
col2 213726
col3 NORTH FORK SLATE CREEK
col1 0
col25 None
col24 None
col21 None
col20 None
col23 None
col22 None
于 2013-04-13T22:22:41.000 回答
3

不同的软件系统称为 CSV 的事物的世界变化很大。幸运的是,Python 出色的 CSV 模块非常擅长处理这些细节,因此您无需手动处理这些事情。

让我强调一些使用@metaperture 的答案的东西,但没有解释:您可以通过自动检测方言来避免在 Python 中读取 CSV 文件的所有猜测。一旦你确定了那个部分,就不会再有更多的错误了。

让我给你一个简单的例子:

    import csv

    with open(filename, 'rb') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(10024))
        csvfile.seek(0)
        qreader = csv.reader(csvfile, dialect)
        cnt = 0
        for item in qreader:
            if cnt >0:
                #process your data
            else:
                #the header of the csv file (field names)    
            cnt = cnt + 1
于 2013-05-07T15:57:27.263 回答
0

当你这样做时:

f_handler = open(f_path, 'rU').read().replace('\n',' ')

您正在删除所有换行符,这就是 csv.excel 方言检测新行的方式。由于文件只有一行,它只会返回一次。

此外,您正在执行以下操作:

if i > 0:
    break

在第一次迭代后终止你的 for 循环。

关于为什么它们是空白的,默认的 restval 是 None (参见http://docs.python.org/3.2/library/csv.html),所以键可能不匹配。尝试不包括 fieldnames 参数,您可能会看到您在这种方言中的键是沿着“col2”、“col3”或类似的行。

我用的一个可爱的小包装:

def iter_trim(dict_iter):
#return (dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()])) for row in dict_iter)
 for row in dict_iter:
    try:
        d =  dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()]))
        yield d
    except:
        print "row error:"
        print row

示例用法:

def csv_iter(filename):
    csv_fp = open(filename)
    guess_dialect = csv.Sniffer().sniff(csv_fp.read(16384))
    csv_fp.seek(0)
    csv_reader = csv.DictReader(csv_fp,dialect=guess_dialect)
    return iter_trim(csv_reader)
for row in csv_iter("some-file.csv"):
    # do something...
    print row
于 2013-05-07T15:27:46.420 回答