0

我正在使用 Polyglot 来检测英文文本,我在 pandas 数据框上应用了一个函数,但我得到了错误。这是我的代码:

def is_english(self, txt):
    # try:
    wrap = self.detector(txt)
    languages = wrap.languages  # The first language will be the most confident language, check if it's English and with more than 98% confidence!
    top_lan = languages[0]
    return top_lan.name == 'English' and top_lan.confidence >= 98

df = pd.read_csv('data.csv')
df = df[df.input_text.apply(is_english)]

错误是:

pycld2.error: input contains invalid UTF-8 around byte 1383 (of 22731)

我该如何解决这个问题?谢谢!

4

0 回答 0