给定许多电影及其相关标签(标签是关键字),我如何计算每部电影的TF或TF-IDF向量?他们是使用Graphlab或Python中的库自动执行此操作的吗?这是我的输入:
print HH_tag_5K
+---------+-----------------+
| movieId | tag |
+---------+-----------------+
| 2324 | bittersweet |
| 2324 | holocaust |
| 2324 | World War II |
| 357 | Garath |
| 260 | Science Fiction |
| 55267 | large family |
| 55267 | realistic |
| 55267 | romantic |
| 55267 | Steve Carell |
| 55267 | the music |
+---------+-----------------+
[194527 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
事实上,我认为sklearn.feature_extraction.text.TfidfVectorizer
这是这个问题的答案,但我还没有弄清楚如何将它用于我的问题?谢谢