4

i just wondering how i can read special field from a CVS File with next structure:

40.0070222,116.2968604,2008-10-28,[["route"], ["sublocality","political"]]
39.9759505,116.3272935,2008-10-29,[["route"], ["establishment"], ["sublocality", "political"]]

the way that on reading cvs files i used to work with:

with open('routes/stayedStoppoints', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')

The problem with that is the first 3 fields no problem i can use:

for row in spamreader:

row[0],row[1],row[2] i can access without problem. but in the last field and i guess that with csv.reader(csvfile, delimiter=',', quotechar='"') split also for each sub-list:

so when i tried to access just show me:

[["route"] 

Anyone has a solution to handle the last field has a full list ( list of list indeed)

[["route"], ["sublocality","political"]]

in order to can access to each category.

Thanks

4

3 回答 3

3

您的格式接近 json。您只需将每一行括在括号中,并引用日期。对于每一行l,只需执行以下操作:

lst=json.loads(re.sub('([0-9]+-[0-9]+-[0-9]+)',r'"\1"','[%s]'%(l)))

结果lst

[40.0070222, 116.2968604, u'2008-10-28', [[u'route'], [u'sublocality', u'political']]]

您需要导入 json 解析器和正则表达式

import json
import re

编辑:您询问如何访问包含“路线”的元素。答案是

lst[3][0][0]

“政治”在

lst[3][1][1]

如果字符串('political' 和其他字符串)可能包含看起来像日期的字符串,您应该使用@unutbu 的解决方案

于 2013-06-16T18:36:49.253 回答
2

用于line.split(',', 3)分割前 3 个逗号

import json
with open(filename, 'rb') as csvfile:
    for line in csvfile:
        row = line.split(',', 3)
        row[3] = json.loads(row[3])
        print(row)

产量

['40.0070222', '116.2968604', '2008-10-28', [[u'route'], [u'sublocality', u'political']]]
['39.9759505', '116.3272935', '2008-10-29', [[u'route'], [u'establishment'], [u'sublocality', u'political']]]
于 2013-06-16T18:18:38.850 回答
2

这不是有效的 CSV 文件。该csv模块将无法读取此内容。

如果行结构总是这样(两个数字、一个日期和一个嵌套列表),你可以这样做:

import ast
result = []
with open('routes/stayedStoppoints') as infile:
    for line in infile:
        coord_x, coord_y, datestr, objstr = line.split(",", 3)
        result.append([float(coord_x), float(coord_y),
                      datestr, ast.literal_eval(objstr)])

结果:

>>> result
[[40.0070222, 116.2968604, '2008-10-28', [['route'], ['sublocality', 'political']]],
 [39.9759505, 116.3272935, '2008-10-29', [['route'], ['establishment'], ['sublocality', 'political']]]]
于 2013-06-16T18:19:06.850 回答