6

我还没有找到关于如何在Windows上处理这个问题的很好的描述,所以我在这里做。

ı土耳其语( I) 和i( ) 中有两个字母İ被 python 错误处理。

>>> [char for char in 'Mayıs']
['M', 'a', 'y', 'i', 's']

>>> 'ı'.upper().lower()
'i'

鉴于语言环境是正确的,它应该是怎样的:

>>> [char for char in 'Mayıs']
['M', 'a', 'y', 'ı', 's']

>>> 'ı'.upper().lower()
'ı'

>>> 'i'.upper()
'İ'

>>> 'ı'.upper()
'I'

我尝试过locale.setlocale(locale.LC_ALL,'Turkish_Turkey.1254'),甚至尝试过,'ı'.encode('cp857')但没有帮助。

如何让 python 正确处理这两个字母?

4

3 回答 3

8

你应该使用PyICU

>>> from icu import UnicodeString, Locale
>>> tr = Locale("TR")
>>> s = UnicodeString("i")
>>> print(unicode(s.toUpper(tr)))
İ
>>> s = UnicodeString("I")
>>> print(unicode(s.toLower(tr)))
ı
>>>
于 2013-10-31T10:24:22.563 回答
4

您可以为土耳其语字符问题定义自己的硬编码函数。

import re

def tr_upper(self):
    self = re.sub(r"i", "İ", self)
    self = re.sub(r"ı", "I", self)
    self = re.sub(r"ç", "Ç", self)
    self = re.sub(r"ş", "Ş", self)
    self = re.sub(r"ü", "Ü", self)
    self = re.sub(r"ğ", "Ğ", self)
    self = self.upper() # for the rest use default upper
    return self


def tr_lower(self):
    self = re.sub(r"İ", "i", self)
    self = re.sub(r"I", "ı", self)
    self = re.sub(r"Ç", "ç", self)
    self = re.sub(r"Ş", "ş", self)
    self = re.sub(r"Ü", "ü", self)
    self = re.sub(r"Ğ", "ğ", self)
    self = self.lower() # for the rest use default lower
    return self

常规鞋面:

>>>print("ulvido".upper())
ULVIDO

我们的定制鞋面:

>>>print(tr_upper("ulvido"))
ULVİDO

如果您非常需要这种转换,您可以将其设为 .py 文件。例如:将其保存为 trtextstyle.py 并导入到您的项目中。

如果 trtextstyle.py 与您的文件是同一目录:

from .trtextstyle import tr_upper, tr_lower

希望这可以帮助。

于 2017-05-22T11:23:25.877 回答
0
def tr_capitalize(param_word):
    word_list = param_word.split(sep=" ")
    new_word = ""
    for word in word_list:
        first_letter = word[0]
        last_part = word[1:]

        first_letter = re.sub(r"i", "İ", first_letter)
        first_letter = re.sub(r"ı", "I", first_letter)
        first_letter = re.sub(r"ç", "Ç", first_letter)
        first_letter = re.sub(r"ş", "Ş", first_letter)
        first_letter = re.sub(r"ü", "Ü", first_letter)
        first_letter = re.sub(r"ğ", "Ğ", first_letter)



        last_part = re.sub(r"İ", "i", last_part)
        last_part = re.sub(r"I", "ı", last_part)
        last_part = re.sub(r"Ç", "ç", last_part)
        last_part = re.sub(r"Ş", "ş", last_part)
        last_part = re.sub(r"Ü", "ü", last_part)
        last_part = re.sub(r"Ğ", "ğ", last_part)


        rebuilt_word = first_letter + last_part
        rebuilt_word = rebuilt_word.capitalize()
        new_word = new_word + " " + rebuilt_word

        
    new_word = new_word.strip()
    return new_word
于 2020-09-21T07:00:14.827 回答