对于分类任务,我想将伽马分布拟合到两对数据:类内和类之间的距离人口。这是为了确定理论上的错误接受率和错误拒绝率。
合身的 Scipy 回归让我很困惑。下面是数据图,其中圆圈表示类间距离内和类间距离的 x-es,实线是类内拟合 gamma,虚线是类间距离上的拟合 gamma。
我所期望的是伽玛曲线会在约 10 和约 30 处达到峰值,而不是两者都为 0。有谁看到这里出了什么问题?
这是我的代码:
pos = [7.4237931034482765, 70.522068965517235, 9.1634482758620681, 22.594137931034485, 7.3003448275862075, 6.3841379310344841, 10.693448275862071, 7.5237931034482761, 7.4079310344827594, 7.2696551724137928, 8.5551724137931036, 17.647241379310344, 7.8475862068965521, 14.397586206896554, 32.278965517241382]
neg = [32.951724137931038, 234.65724137931034, 25.530000000000001, 33.236551724137932, 258.49965517241378, 33.881724137931037, 18.853448275862071, 33.703103448275861, 33.655172413793103, 33.536551724137929, 37.950344827586207, 34.32586206896552, 42.997241379310346, 100.71379310344828, 32.875172413793102, 30.59344827586207, 19.857241379310345, 35.232758620689658, 30.822758620689655, 34.92896551724138, 29.619310344827586, 29.236551724137932, 32.668620689655171, 30.943448275862071, 30.80344827586207, 88.638965517241374, 25.518620689655172, 38.350689655172417, 27.378275862068971, 37.138620689655177, 215.63379310344828, 344.93896551724134, 225.93413793103446, 103.66758620689654, 81.92896551724138, 59.159999999999997, 463.89379310344827, 63.86827586206897, 50.453103448275861, 236.4603448275862, 273.53137931034485, 236.26103448275862, 216.26758620689654, 170.3003448275862, 340.60034482758618]
alpha1, loc1, beta1=ss.gamma.fit(pos, floc=0)
alpha2, loc2, beta2=ss.gamma.fit(neg, floc=0)
plt.plot(pos,[0.06]*len(pos),'ko')
plt.plot(neg,[0.04]*len(neg),'kx')
x = range(200)
plt.plot(x,ss.gamma.pdf(x, alpha1, scale=beta1), '-k')
plt.plot(x,ss.gamma.pdf(x, alpha2, scale=beta2), ':k')
plt.xlim((0,200))
我从这里得到的 floc=0 的技巧:为什么 SciPy 中的 Gamma 分布有三个参数?但它并不总是强制 loc1 和 loc2 为 0 :/