我很难获得文件的字符编码。有问题的代码在这里:
rawdata = open(file, "r").read()
encoding = chardet.detect(rawdata.encode())['encoding']
#return encoding
(代码由 Ashish Greycube 提供:https ://github.com/frappe/frappe/pull/8061
我已将我正在处理的 csv 文件的一部分复制为更易于管理的“测试”文件。当我在上面运行上面的代码时,它说它是'ascii'。这可能是问题的一部分。基本上,我发现我需要知道这个程序的编码类型。
错误报告如下:
Traceback (most recent call last):
File ".\program.py", line 26, in <module>
my_encoding = get_file_encoding(data)
File ".\program.py", line 20, in get_file_encoding
encoding = chardet.detect(rawdata.encode())['encoding']
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\__init__.py", line 38, in detect
detector.feed(byte_str)
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\universaldetector.py", line 211, in feed
if prober.feed(byte_str) == ProbingState.FOUND_IT:
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\charsetgroupprober.py", line 71, in feed
state = prober.feed(byte_str)
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\hebrewprober.py", line 227, in feed
byte_str = self.filter_high_byte_only(byte_str)
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\charsetprober.py", line 63, in filter_high_byte_only
buf = re.sub(b'([\x00-\x7F])+', b' ', buf)
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 208, in sub
return _compile(pattern, flags).sub(repl, string, count)
MemoryError
PS C:\Users\stsho\dev\csv_sanitizer_1.2> python .\program.py
Please enter filename: ANQAR
Traceback (most recent call last):
File ".\program.py", line 26, in <module>
my_encoding = get_file_encoding(data)
File ".\program.py", line 19, in get_file_encoding
rawdata = open(file, "r").read()
FileNotFoundError: [Errno 2] No such file or directory: 'ANQAR.csv'
PS C:\Users\stsho\dev\csv_sanitizer_1.2> python .\program.py
Please enter filename: ANQAR
Traceback (most recent call last):
File ".\program.py", line 26, in <module>
my_encoding = get_file_encoding(data)
File ".\program.py", line 19, in get_file_encoding
rawdata = open(file, "r").read()
FileNotFoundError: [Errno 2] No such file or directory: 'ANQAR.csv'
PS C:\Users\stsho\dev\csv_sanitizer_1.2> python .\program.py
Please enter filename: ANQAR
Traceback (most recent call last):
File ".\program.py", line 26, in <module>
my_encoding = get_file_encoding(data)
File ".\program.py", line 20, in get_file_encoding
encoding = chardet.detect(rawdata.encode())['encoding']
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\__init__.py", line 38, in detect
detector.feed(byte_str)
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\universaldetector.py", line 211, in feed
if prober.feed(byte_str) == ProbingState.FOUND_IT:
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\charsetgroupprober.py", line 71, in feed
state = prober.feed(byte_str)
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\hebrewprober.py", line 227, in feed
byte_str = self.filter_high_byte_only(byte_str)
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\charsetprober.py", line 63, in filter_high_byte_only
buf = re.sub(b'([\x00-\x7F])+', b' ', buf)
File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 208, in sub
return _compile(pattern, flags).sub(repl, string, count)
MemoryError