0

我正在尝试读取希伯来语中的 srt 文件。编码应该是 cp1255 但它不是用这个来读取的。我可以用 utf-8 阅读它,但它不遵循希伯来语的规则。在使用 python 中的 'pysubs2' 库读取文件后,我想以 cp1255 格式存储文件。有没有办法做到这一点?

4

1 回答 1

1

老问题,但我想我会发布以防其他人试图这样做。我在下面做了类似的事情。

import chardet

# Sniff out encoding method
with open(subtitle_input_path, 'rb') as f:
  rawdata = b''.join([f.readline() for _ in range(10)])

# Encoding method and method whitelist
encoding_method = chardet.detect(rawdata)['encoding']
encoding_method_whitelist = ['utf8', 'ascii']

# If encoding method will cause issues, convert it to utf-8
if encoding_method not in encoding_method_whitelist:

  # Read the old file's content
  with open(subtitle_input_path, encoding=encoding_method) as subtitle_file:
    subtitle_text = subtitle_file.read()

  # Convert to utf-8 and write to file
  with open(subtitle_input_path,'w', encoding='utf8') as subtitle_file:
    subtitle_file.write(subtitle_text)
于 2020-10-26T17:34:26.490 回答