python - 如何比较python程序中的非英文（中文）字符？

Question

在我的一个 python 程序（python 2.7）中，我需要处理一些汉字：

我有一个文件A.txt，它有两列：“name”和“score”，“name”列可以取一些中文字符串，score是一个1到10之间的整数值。A.txt是编码的在GBK中，这是一种汉字编码。
我将A.txt的每一行插入到我的mysql表tb_name_score中，它有三列：ID、NAME、SCORE，它的NAME列的编码是latin1_swedish_ci
现在，我有另一个文件名B.txt，它也有两列，“name”和“score”，我需要根据B.txt更新tb_name_score的SCORE列。b.txt 也是用GBK编码的
因此，我遍历 B.txt，读取一行并使用它的“名称”值与 tb_name_score.NAME 中的记录进行比较，如果它们相等，则更新 tb_name_score.SCORE。但是，虽然 B.txt 中该行的“name”列与 tb_name_score.NAME 中的值是相同的中文字符串，但“=”返回 false，我只是无法更新表。有人可以帮忙吗？谢谢！

score 0 · Accepted Answer

希望能帮助到你：

Python 2.7.3 (default, Apr 10 2013, 06:20:15) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a=u'后者'
>>> b='后者'
>>> type(a)
<type 'unicode'>
>>> type(b)
<type 'str'>
>>> a==b
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> b
'\xe5\x90\x8e\xe8\x80\x85'
>>> a
u'\u540e\u8005'
>>> b.decode('utf8')
u'\u540e\u8005'
>>> a.encode('utf8')
'\xe5\x90\x8e\xe8\x80\x85'
>>>

score 0 · Accepted Answer

df_raw=pd.read_excel('/Users/zh/workspace/CityRealEstate/CityDataset20180521-4.xlsx')

df_train = df_raw.iloc[:,3:59]
print df_raw.loc[df_raw['Year'] <> 2016]

city = '深圳'
print df_raw['City'].values
df_train=df_raw.loc[df_raw['City'] == city.decode('utf8')]

这个对我有用

python - 如何比较python程序中的非英文（中文）字符？

2 回答 2

Related

Reference