python - Python 错误：TypeError：预期的字符串或类似字节的对象

Question

我目前正在使用 python 中的 nltk 进行情绪分析项目。我无法让我的脚本从我的 csv 中传递文本行来执行标记化。但是，如果我一次在一个条目中传递文本，它就可以正常工作。当我尝试传入整个 csv 时，我收到一个持久性错误：'TypeError: expected string or bytes-like object'。这是我正在使用的打印数据框和 python 代码。解决此问题的任何帮助都会很棒。

                              abstract
0    Allergic diseases are often triggered by envir...
1    omal lymphopoietin (TSLP) has important roles ...
2    of atrial premature beats, and a TSLP was high...
3     deposition may play an important role in the ...
4    ted by TsPLP was higher than that mediated by ...
5    nal Stat5 transcription factor in that TSLP st...

data = pd.read_csv('text.csv', sep=';', encoding = 'utf-8')
x = data.loc[:, 'abstract']
print(x.head())
tokens = nltk.word_tokenize(x)
print(tokens)

附件是完整的堆栈跟踪错误。编辑：打印声明

编辑：输出

score 1 · Accepted Answer

1

令牌 = [nltk.word_tokenize(line) for line in x]

于 2020-03-17T13:31:57.097 回答

score 1 · Accepted Answer

nltk文档提供了一个使用示例nltk.word_tokenize，您可能会注意到“句子”是string.

在您的情况下，x是一个Series（字符串）数据框，您需要在将其传递给nltk.word_tokenize.

解决此问题的一种方法是从以下位置创建您的nltk“句子” x：

x = data.loc[:, 'abstract']
sentence=' '.join(x)
tokens = nltk.word_tokenize(sentence)

编辑： 根据进一步的评论尝试此操作（请记住，这将是Series相应访问的令牌）：

tokens=x.apply(lambda sentence: nltk.word_tokenize(sentence))

python - Python 错误：TypeError：预期的字符串或类似字节的对象

2 回答 2

Related

Reference