python - Python FastText：如何从数据框列创建语料库

问问题 2017-06-27T13:07:00.457

1754 次

我需要为我的 Email Classifer 创建一个语料库。现在正在使用fasttext 0.8.3，但它需要文本文件作为输入，而我需要将数据帧作为输入传递。

它在我使用以下代码时显示错误：-

```

导入快速文本

x_val = df['Message']  
y_val = df['Categories']  
model = fasttext.skipgram(x_val, y_val)  
print model.words

TypeError:
<ipython-input-105-58241a9688b5> 
 <module>() 
----> 1 model = fasttext.skipgram(x_val, y_val) 
      2 print model.words # list of words in dictionary 
      fasttext/fasttext.pyx in fasttext.fasttext.skipgram (fasttext/fasttext.cpp:6451)() 
      fasttext/fasttext.pyx in fasttext.fasttext.train_wrapper (fasttext/fasttext.cpp:5223)() 
     /root/anaconda2/lib/python2.7/genericpath.pyc in isfile(path) 
           35 """Test whether a path is a regular file""" 
           36 try: 
      ---> 37 st = os.stat(path) 
           38 except os.error: 
           39 return False 
     TypeError: coercing to Unicode: need string or buffer, Series found

```

在上面的代码中，df['Message'] 和 df['Categories']分别是包含邮件和类别的数据框列。
数据框中有 30123 封邮件。
我已经阅读了 fasttext 的文档，但我觉得没什么用。

Fasttext 教程参考

谢谢您的帮助。

python - Python FastText：如何从数据框列创建语料库

0 回答 0

Related

Reference