1

我想从包含 unicode 字符串的 csv 文件中导出数据。

以前我尝试了一个 Python 脚本,它只适用于 ASCII 数据。但它也不支持 unicode 的东西:

#! /usr/bin/env python
import csv
csv.register_dialect('custom',delimiter=','
                     doublequote=True,
                     escapechar=None,
                     quotechar='"',
                     quoting=csv.QUOTE_MINIMAL, skipinitialspace=False)
with open('input.csv') as ifile:
 data = csv.reader(ifile, dialect='custom')
 for record in data:
  for i, field in enumerate(record):
   print (" <field%s>" % i + field + "</field%s>" % i)

Traceback(最近一次调用最后一次):用于数据记录:_csv.Error:行包含 NULL 字节

4

3 回答 3

2

改用这个 unicode-csv 库

https://github.com/jdunck/python-unicodecsv

import unicodecsv as csv

with open('input.csv') as ifile:
  rows = [row for row in csv.reader(ifile, encoding='utf-8')]

print rows
于 2013-05-15T06:09:25.357 回答
1

您可以将其包装csv.reader在一个类中为您处理它。以下内容取自 csv 文档示例,适用于我:

#! /usr/bin/env python
import csv, codecs

class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)

    def __iter__(self):
        return self

    def next(self):
        return self.reader.next().encode("utf-8")

class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)

    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]

    def __iter__(self):
        return self




csv.register_dialect('custom', delimiter=',',
                     doublequote=True,
                     escapechar=None,
                     quotechar='"',
                     quoting=csv.QUOTE_MINIMAL, skipinitialspace=False)

with open('input.csv') as ifile:
 data = UnicodeReader(ifile, dialect='custom')
 for record in data:
  for i, field in enumerate(record):
   print (" <field%s>" % i + field + "</field%s>" % i)

UnicodeWriter如果您需要该功能,那里还有一个类。

于 2013-05-15T07:22:00.543 回答
0

看来您正在使用 Python 3。按照文档中的第一个代码示例

#!/usr/bin/env python3
import csv

with open('input.csv', newline='', encoding=encoding) as csvfile:
    reader = csv.reader(csvfile, dialect="custom")
    for row in reader:
        print(", ".join(row))

其中“自定义”方言在您问题的代码中定义,并且encoding是您文件的字符编码,例如“utf-16”。如果你省略encoding参数;locale.getpreferredencoding(False)使用返回的编码

于 2013-05-15T08:07:09.430 回答