我正在尝试将包含具有 300 个维度向量的阿拉伯语单词的大型二进制文件转换为泡菜字典
到目前为止我写的是:
import pickle
ArabicDict = {}
with open('cc.ar.300.bin', encoding='utf-8') as lex:
for token in lex:
for line in lex.readlines():
data = line.split()
ArabicDict[data[0]] = float(data[1])
pickle.dump(ArabicDict,open("ArabicDictionary.p","wb"))
我得到的错误是:
Traceback (most recent call last):
File "E:\Dataset", line 4, in <module>
for token in lex:
File "E:\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte