1

我试图使用字典来替换 Unicode txt 文件中的西里尔字母。我没想到替换单词会很困难,但是在处理西里尔文字时,有一个 16 字节或 8 字节的附加元素是一个问题。我尝试了许多不同的代码,但似乎都没有。我真的很感激任何帮助!

我的字典在一个名为“chars”的文件中,其中包含以下内容:

cyrillic_ordinals = {
u'первый' : u'one',
u'второй' : u'two',
u'третий' : u'three',
u'четвёртый' : u'four'  }

我不确定为什么我的代码不起作用。对于上下文,代码的开头是替换定义(有错误),后半部分代码仅用于指定输入和输出文件。

import sys
import codecs
import os
import chars

def replaceordinals(text, cyrillic_ordinals):
    for i, j in cyrillic_ordinals.iteritems():
        text = text.replace(i, j)
        return text

def readAndWrite(input_file, output_file):
    try:
        w_f = codecs.open(output_file, encoding='utf-8', mode='w+')
    except IOError:
        print("Can't create or edit output file. Do you have rights to create file here?")
        print("For unix systems try to use \"sudo python\" instead of \"python\"")

    try:
        i_f = codecs.open(input_file, encoding='utf-8')
        for line in i_f:
            w_f.write(replaceordinals(line, chars.cyrillic_ordinals))
    except IOError:
       print("Can't read input file. Check your path to input file")
    except:
        try:
            i_f = codecs.open(input_file, encoding='utf-16')
            for line in i_f:
                w_f.write(replaceordinals(line, chars.cyrillic_ordinals))
        except IOError:
            print("Can't read input file. Check your path to input file")


def main(argv):
    #If user didn't provide path to input and/or output file - show an error, otherwise - try to run processing
    if len(argv) != 3:
        print("Missing file arguments.\nFormat: python " + argv[0] + " /home/user/Desktop/input_file.txt /home/user/Desktop/output_file.txt")
    else:
        readAndWrite(argv[1], argv[2])


if __name__ == "__main__":
    main(sys.argv)

创建的输出文件不会改变,西里尔文文本不会被一、二等替换。有谁知道如何解决这个问题?

4

0 回答 0