26

我对 Python 很陌生。我想解析一个 csv 文件,以便它能够识别引用的值 - 例如

1997年,福特,E350,“超豪华卡车”

应该拆分为

(“1997”、“福特”、“E350”、“超级豪华卡车”)

并不是

('1997', '福特', 'E350', '"超级', '豪华卡车"')

如果我使用类似str.split(,).

我该怎么做呢?最好将这些值存储在数组或其他数据结构中吗?因为在我从 csv 中获取这些值之后,我希望能够轻松选择,可以说任意两列并将其存储为另一个数组或其他一些数据结构。

4

5 回答 5

27

您应该使用该csv模块:

import csv
reader = csv.reader(['1997,Ford,E350,"Super, luxurious truck"'], skipinitialspace=True)
for r in reader:
    print r

输出:

['1997', 'Ford', 'E350', 'Super, luxurious truck']
于 2012-09-06T09:21:51.613 回答
19

以下方法完美运行

d = {}
d['column1name'] = []
d['column2name'] = []
d['column3name'] = []

dictReader = csv.DictReader(open('filename.csv', 'rb'), fieldnames = ['column1name', 'column2name', 'column3name'], delimiter = ',', quotechar = '"')

for row in dictReader:
    for key in row:
        d[key].append(row[key])

列存储在字典中,列名作为键。

于 2012-09-10T16:45:11.623 回答
5

您必须将双引号定义为quotechar语句中的内容csv.reader()

>>> with open(r'<path_to_csv_test_file>') as csv_file:
...     reader = csv.reader(csv_file, delimiter=',', quotechar='"')
...     print(reader.next())
... 
['1997', 'Ford', 'E350', 'Super, luxurious truck']
>>> 
于 2012-09-06T09:51:06.347 回答
3

如果您不想使用 CSV 模块,则需要使用正则表达式。试试这个:

import re
regex = ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"
string = '1997,Ford,E350,"Super, luxurious truck"'
array = re.split(regex, string)

print(array[3])
"Super, luxurious truck"
于 2014-11-12T13:18:46.537 回答
0

csv.py 模块可能很好 - 但如果您想查看和/或控制它的工作方式,这里有一个基于协程的小型 Python 解决方案:

def csv_parser(delimiter=','):
    field = []
    while True:
        char = (yield(''.join(field)))
        field = []

        leading_whitespace = []    
        while char and char == ' ':
            leading_whitespace.append(char)
            char = (yield)

        if char == '"' or char == "'":
            suround = char
            char = (yield)
            while True:
                if char == suround:
                    char = (yield)
                    if not char == suround:
                        break

                field.append(char)
                char = (yield)

            while not char == delimiter:
                if char == None:
                    (yield(''.join(field)))
                char = (yield)
        else:
            field = leading_whitespace
            while not char == delimiter:
                if char == None:
                    (yield(''.join(field)))
                field.append(char)
                char = (yield)

def parse_csv(csv_text):
    processor = csv_parser()
    processor.next() # start the processor coroutine

    split_result = []
    for c in list(csv_text) + [None]:
        emit = processor.send(c)
        if emit:
            split_result.append(emit)

    return split_result

print parse_csv('1997,Ford,E350,"Super, luxurious truck"')

在 python 2.7 上测试

于 2018-10-08T09:19:33.117 回答