python - 编码日文字符时出现charmap错误

Question

我正在制作一个程序，使用该replace()函数将特定的日文字符从外部文本文件翻译成英文拼写，但我遇到了一个奇怪的错误。

我确保对文本文件中的所有字符进行编码，然后将其放入变量中，然后在该变量上以字节级别启动替换过程，然后再次将其解码为字符串，然后写入新的文本文件.

path = input('Location: ').strip('"')
txt = ''
with open(path,'rb') as f:
    txt = f.read()

def convert(jchar,echar):
    ct = txt.replace(jchar.encode('utf-8'),echar.encode('utf-8'))
    return ct

txt = convert('ぁ','a')
txt = convert('っ','su')

with open('Translated.txt','w') as tf:   
    tf.write(txt.decode('utf-8'))

input('Done.')

如果文本文件包含脚本中可替换的所有日文字符，则一切正常，但如果文本文件包含脚本中不可替换的日文字符，则会出现此错误：

UnicodeEncodeError: 'charmap' codec can't encode character '\u306e' in position 6: character maps to <undefined>

这样一来，python 在编码后似乎无法再次解码日语字符的字节。

最糟糕的是，甚至还有一些其他非 Unicode 字符，即使我在 python 脚本上将其替换为可替换，我仍然会得到相同的错误，这意味着 python 甚至无法对其进行编码，但我现在主要关注的是为什么 python拒绝解码日语字符的字节，尽管它自己的 python 能够对其进行编码。

score 1 · Accepted Answer

打开要写入的文件时，您需要设置正确的编码，如下所示：

with open('Translated.txt','w', encoding='utf-8') as tf:

Python默认为基于您运行它的平台的特定编码。在 Windows 上，它可能是 ASCII。当您尝试将字符写入文件时，它会尝试将字节解码为 ASCII（或系统默认的任何非 Unicode 编码）字符串，但该字节没有 ASCII 字符，因此失败.

当您替换字符时它起作用的原因是罗马字符可以写为 ASCII，并且因为当您尝试写入文件时会发生错误。如果您查看打印的 Traceback，您会确切地看到它发生的位置：

Traceback (most recent call last):
  File ".\sandbox.py", line 61, in <module>
    tf.write(txt.decode('utf-8'))
  File "[...]\Python\Python37\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u3041' in position 11: character maps to <undefined>

score 0 · Accepted Answer

我找到了一个修复程序，但我不知道它为什么会起作用，我从最后一行删除了 .decode('utf-8') 并解决了整个问题，即使是我提到的最糟糕的问题，我认为使用 as 方法自动解码为字节

python - 编码日文字符时出现charmap错误

2 回答 2

Related

Reference