0

I would like to get a single probability distribution for a collection of documents, as I need to be able to use the KL-Divergence, is this possible?

In this example: http://mallet.cs.umass.edu/topics-devel.php with the method getTopicProbabilities() I get the probability distribution of each instance, but if I wanted to get a single distribution for a collection of documents?

Could this be the topic distribution of the documents?

  TopicInferencer inferencer = model.getInferencer();
  double[] testProbabilities = inferencer.getSampledDistribution(testing.get(0), 10, 1, 5);
4

1 回答 1

0

我认为我们可以对一组文档的每个主题概率进行一些平均。但这仅在文档相似时才有意义。可能您可以根据一些相似性阈值和这些文档的平均值对文档进行聚类。

于 2014-06-17T15:12:49.470 回答