这是贝叶斯计算和一个示例/测试:
def estimateProbability(priorProbs, buyCount, noBuyCount):
# first, estimate the prob that the actual buy/nobuy counts would be observed
# given each of the priors (times a constant that's the same in each case and
# not worth the effort of computing;-)`
condProbs = [p**buyCount * (1.0-p)**noBuyCount for p in priorProbs]
# the normalization factor for the above-mentioned neglected constant
# can most easily be computed just once
normalize = 1.0 / sum(condProbs)
# so here's the probability for each of the prior (starting from a uniform
# metaprior)
priorMeta = [normalize * cp for cp in condProbs]
# so the result is the sum of prior probs weighed by prior metaprobs
return sum(pm * pp for pm, pp in zip(priorMeta, priorProbs))
def example(numProspects=4):
# the a priori prob of buying was either 0.3 or 0.7, how does it change
# depending on how 4 prospects bought or didn't?
for bought in range(0, numProspects+1):
result = estimateProbability([0.3, 0.7], bought, numProspects-bought)
print 'b=%d, p=%.2f' % (bought, result)
example()
输出是:
b=0, p=0.31
b=1, p=0.36
b=2, p=0.50
b=3, p=0.64
b=4, p=0.69
这与我对这个简单案例的手动计算一致。请注意,根据定义,购买概率将始终介于先验概率集合中的最低和最高之间;如果这不是您想要的,您可能想通过引入两种“伪产品”来引入一点软糖,一种没有人会购买(p=0.0),一种任何人都会购买(p=1.0)——这给出了对实际观察的重视程度更高,尽管它们可能很稀缺,对过去产品的统计数据的重视程度较低。如果我们在这里这样做,我们会得到:
b=0, p=0.06
b=1, p=0.36
b=2, p=0.50
b=3, p=0.64
b=4, p=0.94
中间水平的捏造(考虑到这种新产品可能比以前售出的任何产品都差,或比其中任何产品更好的可能性不大,但并非不可能)可以很容易地想象(对人为的 0.0 和 1.0 概率给予较低的权重) ,通过将向量priorWeights 添加到estimateProbability
的参数)。
这种事情是我整天工作的重要组成部分,现在我在商业智能中开发应用程序,但我就是无法获得足够的......!-)