0

我在 Python 上执行以下代码:

from csv import reader, writer


def my_function(file1, file2, output, xs, stringL = 'k', delim = ','):

    with open(file1, 'r') as text, open(file2, 'r') as src, open(output, 'w') as dst:
        for l in text:
            for x in xs:
                if stringL in l:
                    print("found!")

        my_reader = reader(src, delimiter = delim)
        my_writer = writer(dst, delimiter = delim)

        columnNumber = 0
        for column in zip(*my_reader):
            print(column, columnNumber)
            columnNumber += 1


if __name__ == '__main__':
        from sys import argv
    if len(argv) == 5:
        my_function(argv[1], argv[2], argv[3], argv[4])
    elif len(argv) == 6:
        my_function(argv[1], argv[2], argv[3], argv[4], argv[5])
    elif len(argv) == 7:
        my_function(argv[1], argv[2], argv[3], argv[4], argv[5], argv[6])
    else:
        print("Invalid number of arguments")
    print("Done")

file1 是一个文本文件,例如:

a
k
k
a
k
k
a
a
a
z

a
a
a

file2 是任何 csv 文件

我遇到错误:

  File "error.py", line 16, in my_function
  for column in zip(*my_reader):
  File "/usr/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xde in position 12: invalid continuation byte

我在这里找到了相同的错误并提供了解决方案。但是,我无法将此解决方案适应我的代码...我尝试了几件事,例如

column = unicode(column, errors = 'replace')

但它仍然不起作用。

请你帮助我好吗?

4

1 回答 1

1

Python 3 默认以 UTF-8 格式打开文本文件以解码为 Unicode 值。但是,您的输入文件不是UTF-8,并且解码失败。

无法从错误消息或您的帖子中推断出正确的编码是什么,但您需要在打开文件时找出并指定它:

open(file2, 'r', encoding='*correct encoding for file2*', newline='') as src

也请注意newline='';请参阅csv.reader()文档

您的sys.argv处理过于冗长,只需使用:

if __name__ == '__main__':
    from sys import argv
    if 5 <= len(argv) <=7:
        my_function(*argv[1:])
    else:
        print("Invalid number of arguments")
    print("Done")
于 2013-06-30T21:48:26.703 回答