java - Multi-Label Document Classification

Question

I have a database in which I store data based upon the following three fields: id, text, {labels}. Note that each text has been assigned to more than one label \ tag \ class. I want to build a model (weka \ rapidminer \ mahout) that will be able to recommend \ classify a bunch of labels \ tags \ classes to a given text.

I have heard about SVM and Naive Bayes Classifier, but not sure whether they support multi-label classification or not. Anything that guides me to the right direction is more than welcome!

score 4 · Accepted Answer

基本的多标签分类方法是 one-vs.-the-rest (OvR)，也称为二元相关性 (BR)。基本思想是，您采用现成的二元分类器，例如朴素贝叶斯或 SVM，然后创建它的K个实例来解决K个独立的分类问题。在类似 Python 的伪代码中：

for each class k:
    learner = SVM(settings)  # for example
    labels = [class_of(x) == k for x in samples]
    learner.learn(samples, labels)

然后在预测时，您只需在样本上运行每个二元分类器并收集它们预测为正的标签。

（很明显，训练和预测都可以并行完成，因为假设问题是独立的。请参阅Wikipedia以获取两个执行多标签分类的 Java 包的链接。）

score 1 · Accepted Answer

SVM 本质上是一个二元分类器，但有许多替代方案可以将其应用于多标签环境，基本上是通过组合 SVM 的多个二元实例。

一些示例在SVM Wikipedia 文章中的 multi-class 部分。我不确定你是否对细节感兴趣，但它们包含在 Weka 和 Rapidminer 中。例如，SMO分类器是将 SVM 应用于多标签问题的变体之一。

朴素贝叶斯可以直接应用于多标签环境。

score 0 · Accepted Answer

可以向您推荐一些工具，这些工具是对进行多标签分类的 weka 的扩展。

MEKA：WEKA 的多标签扩展
木兰：用于多标签学习的 Java 库

还有一个 SVM lib 扩展SVMLib。如果您对 python 包感到满意，scikit learning 还提供了一个用于多标签分类的

此外，ICML 2013 中的这篇最新论文“具有许多标签的高效多标签分类”应该可以帮助您实现。如果你想自己实现一个。

java - Multi-Label Document Classification

3 回答 3

Related

Reference