1

我是 weka 的新手。我需要基于 Weka 工具聚类方法从博客文档中识别出一组情绪。对于情绪检测,我使用不同的特征集值(我的特征表示为属性)。例如,我的数据集将是:

@relation emotion

@attribute pos real ->total no of times each part-of-speech(noun,verb,adjective,adverb) occur in the document / Total no of words in the document
@attribute Positive_Words real ->Count of positive words occur in the document / Total no of words in the document
@attribute Negative_Words real ->Count of Negative words occur in the document / Total no of words in the document
@attribute Emotion_Words real ->Count of Emotion words occur in the document / Total no of words in the document
@attribute First_Sent_Weight real ->Weight given to first sentence in each blog / Total no of sentences in the document

@data
0.4, 0.24, 0.43, 0.32, 0.65
0.32, 0.5, 0.74, 0.8, 0.43

我有 5000 个实例(通过为每 5000 个博客文档提供每个功能集,我创建了 5000 个实例)。将这些实例传入 Weka 工具中的 K-means 聚类算法,生成 6 个聚类。我的疑问是如何识别哪个集群属于哪种情绪。请提出任何想法。提前致谢。

4

0 回答 0