来自 Scipy 文档
scipy.stats.multivariate_normal
scipy.stats.multivariate_normal = <scipy.stats._multivariate.multivariate_normal_gen object at 0x2b23194d1c90>
多元正态随机变量。
mean
关键字指定平均值。cov
关键字指定协方差矩阵。参数::
x
array_like Quantiles,x 的最后一个轴表示分量。
mean
:array_like,可选分布均值(默认为零)
cov
:array_like,可选的分布协方差矩阵(默认一个)
allow_singular
: bool, optional 是否允许奇异协方差矩阵。(默认:假)
random_state
: None 或 int 或 np.random.RandomState 实例,可选 如果是 int 或 RandomState,则使用它来绘制随机变量。如果没有(或 np.random),则使用全局 np.random 状态。默认为无。或者,可以调用对象(作为函数)来固定均值和协方差参数,返回“冻结”多元正态随机变量:rv = multivariate_normal(mean=None, cov=1, allow_singular=False)
具有相同方法但保持给定均值和协方差固定的冻结对象。笔记
将参数 mean 设置为 None 等效于将 mean 设为零向量。参数 cov 可以是一个标量,在这种情况下,协方差矩阵是该值的单位乘以该值、协方差矩阵的对角线元素的向量,或者是一个二维 array_like。协方差矩阵 cov 必须是(对称)半正定矩阵。cov 的行列式和逆分别计算为伪行列式和伪逆,因此 cov 不需要具有满秩。
我的实现
mean_matrix = np.array([mu1['CS Score (USNews)'],mu2['Research Overhead %'],mu3['Admin Base Pay$'],mu4['Tuition(out-state)$']]);
print("Mean Matrix : ",mean_matrix);
logLikelihood = multivariate_normal.logpdf(data_frame_to_use, mean = mean_matrix, cov = covarianceMat, allow_singular='False');
print("Log matrix", logLikelihood);
print("Sum of log", sum(logLikelihood));
这给出了一个输出:
Mean Matrix :
[ 3.21428571e+00 5.33857143e+01 4.69178816e+05 2.97119592e+04]
Log matrix
[-25.89859216 -25.39255136 -24.90457203 -24.62044334 -25.97797326
-24.53094475 -24.86379124 -28.10541986 -25.17504371 -24.36097654
-27.56393633 -26.45706387 -24.73181091 -24.73103739 -25.35676354
-25.92874579 -27.37586004 -29.54768142 -24.49143024 -25.53990703
-25.57939464 -26.84501673 -25.33293111 -24.3236322 -24.62756871
-25.67609413 -26.81881766 -25.163922 -24.99671211 -24.94361195
-24.93544698 -24.72654802 -24.99845459 -27.3604362 -25.56750359
-26.8531682 -25.91679777 -27.4626466 -24.59908201 -27.17373079
-24.91116583 -26.78552165 -27.94191254 -25.32212942 -25.73247674
-26.51429465 -25.14545746 -24.43274555 -26.08543542]
Sum of log -1262.32720006
但是当我手动应用公式时:
#Calculating PDF for multivariate condition by implementing formula
import xlrd
filepath = "./DataSet/university_data.xlsx"
workbook=xlrd.open_workbook(filepath)
sheet=workbook.sheet_by_index(0)
print("\n")
print("PDF values for each row :")
sum=0
sum1=0
for row in range(1,50):
sum_array=[] #Taking each row as input
sum_array=sheet.row_values(row,2,6)
#using formula implementation
l=np.subtract(sum_array,mean_matrix)
m=np.matrix.transpose(l)
n=np.linalg.inv(covarianceMat)
ex=np.exp(-0.5*np.dot(np.dot(m,n),l))
f=1/(pow(2*3.14,2)*pow(np.linalg.det(covarianceMat),0.5))
pdf=f*ex
# print("PDF for row ",row," \t:\t\t ",pdf)
lpdf=mt.log(pdf)
print("LogPDF for row ",row," \t:\t\t ",lpdf)
sum=sum+lpdf
print("\n")
print("Loglikelihood(formula implementation) : ",'%.3f' % float(sum))
这给出了输出:
> PDF values for each row :
>
> LogPDF for row 1 : -30.19371482325026
>
> LogPDF for row 2 : -27.067463935229377
>
> LogPDF for row 3 : -26.980621478218485
>
> LogPDF for row 4 : -26.08324487529128
>
> LogPDF for row 5 : -27.26413815992288
>
> LogPDF for row 6 : -26.70724095561742
>
> LogPDF for row 7 : -25.92834293415632
>
> LogPDF for row 8 : -28.712707460803784
>
> LogPDF for row 9 : -25.909899171974057
>
> LogPDF for row 10 : -25.92659040666956
>
> LogPDF for row 11 : -28.211108799821343
>
> LogPDF for row 12 : -26.83353978964118
>
> LogPDF for row 13 : -25.328814199145395
>
> LogPDF for row 14 : -25.111128380106702
>
> LogPDF for row 15 : -25.749393390363533
>
> LogPDF for row 16 : -26.693092607802864
>
> LogPDF for row 17 : -28.170941906181092
>
> LogPDF for row 18 : -29.927055551899382
>
> LogPDF for row 19 : -24.86790828974201
>
> LogPDF for row 20 : -26.137292962654886
>
> LogPDF for row 21 : -25.96539688885015
>
> LogPDF for row 22 : -27.35065704563797
>
> LogPDF for row 23 : -25.92368769313743
>
> LogPDF for row 24 : -24.70248175116109
>
> LogPDF for row 25 : -25.06362483723621
>
> LogPDF for row 26 : -26.268885215917194
>
> LogPDF for row 27 : -27.864039083501908
>
> LogPDF for row 28 : -25.57129689424411
>
> LogPDF for row 29 : -25.389635711130655
>
> LogPDF for row 30 : -25.328718626588117
>
> LogPDF for row 31 : -25.908499721053612
>
> LogPDF for row 32 : -25.274591177021158
>
> LogPDF for row 33 : -25.864730872696875
>
> LogPDF for row 34 : -28.250699466070667
>
> LogPDF for row 35 : -26.427070541801417
>
> LogPDF for row 36 : -28.480271709879336
>
> LogPDF for row 37 : -26.304600263886595
>
> LogPDF for row 38 : -29.079517786952714
>
> LogPDF for row 39 : -25.167192328059414
>
> LogPDF for row 40 : -27.552414021501523
>
> LogPDF for row 41 : -25.576316408257583
>
> LogPDF for row 42 : -27.164966750624057
>
> LogPDF for row 43 : -28.738620620446113
>
> LogPDF for row 44 : -25.85303976738355
>
> LogPDF for row 45 : -26.28781267577098
>
> LogPDF for row 46 : -27.08876652783593
>
> LogPDF for row 47 : -25.81071248417473
>
> LogPDF for row 48 : -25.77382834320652
>
> LogPDF for row 49 : -26.892236096254386
>
>
> Loglikelihood(formula implementation) : -1304.729
输入数据: - 上面使用的数据集
Python Notebook - 我的实现