python-3.x - UnicodeDecodeError：“utf-8”编解码器无法解码位置 23 中的字节 0xea：无效的继续字节

Question

我无法摆脱这个错误。使用 pandas 读取 csv 时，我不断收到“UnicodeDecodeError：'utf-8' codec can't decode byte 0xea in position 23: invalid continuation byte”。

我已经尝试了我在网上看到的所有内容。我已经将 csv 文件转换为多种编码，但我仍然无法让这个错误消失。我已经使用 sublimetext 和记事本将文件转换为 UTF-8。

import tensorflow as tf
import pandas as pd

csv_path="C:\\Users\\diogo\\Transferências\\E0.csv"
dataset=pd.read_csv(csv_path,encoding="utf-8")

我希望正确读取数据集，但我总是显示此错误。同样，当我更改熊猫阅读器的编码时，我仍然收到错误“'utf-8'编解码器无法解码。这应该发生吗？当我更改'utf-时，错误是否应该变为另一个错误- 8' 编码？如果您知道将 csv 读取到 tensorflow 的任何替代方法，该信息也将不胜感激。谢谢。

score 2 · Accepted Answer

我最后发现编码是“cp1252”，代码如下：

with open('food.csv') as f:
    print(f)

当我用崇高的文本和记事本保存文件时，仍然不知道为什么编码没有更改为“utf-8”。

score 1 · Accepted Answer

这不需要任何模块导入，但您可以使用问题中指定的步骤重新打开。

with open('some_file.csv') as file:
    print(file.read()) # should return a (probably long) string
    print(file.decode('utf-8')) # remove the 'b' in the b'string'

score 0 · Accepted Answer

0

尝试使用

open(filepath_, 'rb')

代替

open(filepath_)

这对我有用Python 3.8.5

于 2021-06-02T17:55:03.680 回答

python-3.x - UnicodeDecodeError：“utf-8”编解码器无法解码位置 23 中的字节 0xea：无效的继续字节

3 回答 3

Related

Reference