python - 为什么 `toml.load(f)` 在 Windows 下（但在 Linux 上没有）出现此文件失败？

Question

我有一个TOML文件，我想用这个脚本处理它。

这曾经在 Linux 下运行良好。在 Windows ( Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:23:52) [MSC v.1900 32 bit (Intel)] on win32) 下，我收到以下错误：

Need to process 1 file(s)
Processing file test01.toml (1 of 1)
Traceback (most recent call last):
  File "py/process.py", line 27, in <module>
    add_text_fragment(input_dir + "/" + file)
  File "<string>", line 10, in add_text_fragment
  File "C:\Users\1\Anaconda3\lib\site-packages\toml\decoder.py", line 134, in lo
ad
    return loads(f.read(), _dict, decoder)
  File "C:\Users\1\Anaconda3\lib\encodings\cp1251.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 985: char
acter maps to <undefined>

我假设错误发生在这里的某个地方：

f = open(toml_file_name, "r")
pt = toml.load(f)
f.close()

根据 NotePad++，有问题的文件有 UTF-8 编码。

我该如何解决？

赏金条款

我将把这个赏金奖励给一个向我展示如何确保脚本process.py正确处理输入文件的人，即执行超出了If at this point pt以addTextFragment.py开头的注释

def add_text_fragment(toml_file_name):
    f = open(toml_file_name, "r")
    pt = toml.load(f)
    f.close()

    # If at this point pt contains dthe data of the input file,
    # then you have attained the goal.
    if (pt["type"] == "TA"):

并且变量pt包含来自输入文件的数据。

您的解决方案必须在 Windows 10, Python 3.7.6 下运行(default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32。

注意：对特定目录中的所有文件process.py执行。addTextFragment.py

score 2 · Accepted Answer

只需替换这一行：

f = open(toml_file_name, "r")

和：

f = open(toml_file_name, "r", encoding="utf-8")

正如您在错误消息中看到的那样，Python 正在尝试使用文件的默认系统编码读取文件 - 如果文件包含任何非 ASCII 字符并且在 Linux 中工作，则意味着它具有不同的编码 - 并且默认所有非 Windows 世界的编码都是 utf-8 。

score 1 · Accepted Answer

似乎 toml 试图解码您的数据但失败了。正如您所说，您的 toml 文件中的数据是 UTF-8 编码的。我会手动对其进行解码，以避免在 toml lib 中检测到可能的字符集。

with open(toml_file_name, 'rb') as f:
    pt = toml.loads(f.read().decode('utf-8'))

python - 为什么 `toml.load(f)` 在 Windows 下（但在 Linux 上没有）出现此文件失败？

2 回答 2

Related

Reference