4

我正在尝试使用 Python 读取 .xls 文件。该文件包含多个非 ascii 字符(即 äöü)。我已经尝试过使用 openpyxls 和 xlrd(我对 xlrd 寄予厚望,因为它应该以 unicode 格式读取所有内容),但都没有工作。

在尝试从 xls 打印信息时,我发现了多个处理编码/解码的答案,但我什至似乎都无法做到这一点。此脚本在尝试读取文件后立即出错:

import xlrd
workbook = xlrd.open_workbook('export_data.xls')

导致:

Traceback (most recent call last):
  File "C:\Users\Administrator\workspace\tufinderxlstoxml\tufinderxlstoxml2.py", line 2, in <module>
    workbook = xlrd.open_workbook('export_data.xls')
  File "C:\Python27_32\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
    ragged_rows=ragged_rows,
  File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
    bk.get_sheets()
  File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 705, in get_sheets
    self.get_sheet(sheetno)
  File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 696, in get_sheet
    sh.read(self)
  File "C:\Python27_32\lib\site-packages\xlrd\sheet.py", line 796, in read
    strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
  File "C:\Python27_32\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
    return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 55: ordinal not in range(128)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'

我也试过:

workbook = xlrd.open_workbook('export_data.xls', encoding_override="utf-8")

导致:

Traceback (most recent call last):
  File "C:\Users\Administrator\workspace\tufinderxlstoxml\tufinderxlstoxml2.py", line 2, in <module>
    workbook = xlrd.open_workbook('export_data.xls', encoding_override="utf-8")
  File "C:\Python27_32\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
    ragged_rows=ragged_rows,
  File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
    bk.get_sheets()
  File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 705, in get_sheets
    self.get_sheet(sheetno)
  File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 696, in get_sheet
    sh.read(self)
  File "C:\Python27_32\lib\site-packages\xlrd\sheet.py", line 796, in read
    strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
  File "C:\Python27_32\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
    return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 55: invalid start byte
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero

并在顶部包括以下各种版本:

# -*- coding: utf-8 -*-

我在 Windows Server 2008 机器上的 python 2.7 上运行它。

4

3 回答 3

1

感谢大家的反馈!

我最终确实使用 encoding_override 函数修复了它。我无法找到 cp 代码对应于德语字符的 Microsoft 文档,因此我全部尝试了。最终我到了 cp1251 并且成功了!

workbook = xlrd.open_workbook(path, encoding_override="cp1251")
于 2014-12-03T18:22:05.103 回答
0

有点晚了,但我希望您尝试使用 unicodecsv进行编码。

于 2014-03-27T10:06:09.433 回答
0

根据我对 OOo 文档的阅读,xls 使用了 unicode 的 utf_16_le 风格,而不是 utf8(即每个字符恰好使用两个字节存储的 little-endian),所以请尝试:

workbook = xlrd.open_workbook('export_data.xls', encoding_override="utf_16_le")

(见http://www.openoffice.org/sc/excelfileformat.pdf第 17 页)

于 2013-10-13T20:32:36.027 回答