algorithm - 朴素贝叶斯和零频率问题

Question

我想我已经正确地实现了大部分。有一部分让我很困惑：

零频率问题：当每个类值都没有出现属性值时，每个属性值-类组合（拉普拉斯估计器）的计数加 1。

这是我的一些客户端代码：

//Clasify
string text = "Claim your free Macbook now!";
double posteriorProbSpam = classifier.Classify(text, "spam");
Console.WriteLine("-------------------------");
double posteriorProbHam = classifier.Classify(text, "ham");

现在说“免费”这个词出现在某处的训练数据中

//Training
classifier.Train("ham", "Attention: Collect your Macbook from store.");
*Lot more here*
classifier.Train("spam", "Free macbook offer expiring.");

但是这个词出现在我的“垃圾邮件”类别的训练数据中，但不在“火腿”中。所以当我去计算后验概率时，当我遇到“免费”这个词时我会怎么做。

在此处输入图像描述

score 6 · Accepted Answer

如果您考虑不添加一个意味着什么，那真的没有意义：在火腿中看到“免费”一次会降低在垃圾邮件中看到“免费”的可能性。

algorithm - 朴素贝叶斯和零频率问题

1 回答 1

Related

Reference