python - 在 python 上使用希伯来语

Question

我在打印希伯来语单词时遇到问题。我正在使用计数器模块来计算给定文本（希伯来语）中的单词数。计数器确实会计算单词，并识别语言，因为我正在使用# -*- coding: utf-8 -*-

问题是，当我打印我的计数器时，我得到了奇怪的符号。（我正在使用 Eclipse）这是代码和打印：

# -*- coding: utf-8 -*-
import string
from collections import Counter
class classifier:
def __init__(self,filename):
    self.myFile = open(filename)
    self.cnt = Counter()

def generateList(self):
    exclude = set(string.punctuation)
    for lines in self.myFile:
        for word in lines.split():
            if word not in exclude:
                nWord = ""
                for letter in word:
                    if letter in exclude:
                        letter = ""
                        nWord += letter
                    else:
                        nWord += letter
                self.cnt[nWord]+=1
    print self.cnt

印刷品：

Counter({'\xd7\x97\xd7\x94': 465, '\xd7\x96\xd7\x95': 432, '\xd7\xa1\xd7\x92\xd7\x95\xd7\xa8': 421, '\xd7\x94\xd7\x92\xd7\x91': 413})

关于如何以正确的方式打印单词的任何想法？

score 1 · Accepted Answer

你得到的“奇怪的符号”是 python 表示 unicode 字符串的方式。

您需要对它们进行解码，例如：

>>>print '\xd7\x97\xd7\x94'.decode('UTF8')
חה

python - 在 python 上使用希伯来语

1 回答 1

Related

Reference