cluster-analysis - 使用 weka 进行聚类

问问题 2015-07-02T08:15:47.317

146 次

我保存了 100 个结果的谷歌查询（标题和描述）。它有这种格式：

Title                Description
Spain - Wikipedia    Spain is a democracy organised in the form of a parliamentary government under a constitutional monarchy. It is a developed country with the world's fourteenth

你有个主意。我成功地将这个 CSV 文件加载到 weka 中。首先应用 NominalToString 过滤器（因为它在 Nominal 中加载）。然后使用以下选项应用 StringToWordVector：

IDFTransform - True
TFTTransform - T
normalaize - T
outputWordCounts - T
tokenizer - Alphabetical
WordstoKeep - 100

或多或少。然后我得到一个单词列表，有时我使用 NGramTokenizer 至少有 3 个单词。

之后我去集群并选择K-means。这不是很好，因为它将 90% 放在一个集群中。或者也许是对的......

当我选择在此处使用训练集时会发生什么，因为我还没有任何东西？我应该使用什么选项？我想在类别（旅游、体育、经济……）中形成集群。Weka 能像 Carrot2 那样做吗？或者至少形成集群。

谢谢。

cluster-analysis - 使用 weka 进行聚类

0 回答 0

Related

Reference