1

来自 Scipy 文档

scipy.stats.multivariate_normal

scipy.stats.multivariate_normal = <scipy.stats._multivariate.multivariate_normal_gen object at 0x2b23194d1c90>

多元正态随机变量。

mean关键字指定平均值。cov关键字指定协方差矩阵。

参数::
xarray_like Quantiles,x 的最后一个轴表示分量。

mean:array_like,可选分布均值(默认为零)

cov:array_like,可选的分布协方差矩阵(默认一个)

allow_singular: bool, optional 是否允许奇异协方差矩阵。(默认:假)

random_state: None 或 int 或 np.random.RandomState 实例,可选 如果是 int 或 RandomState,则使用它来绘制随机变量。如果没有(或 np.random),则使用全局 np.random 状态。默认为无。或者,可以调用对象(作为函数)来固定均值和协方差参数,返回“冻结”多元正态随机变量: rv = multivariate_normal(mean=None, cov=1, allow_singular=False) 具有相同方法但保持给定均值和协方差固定的冻结对象。

笔记

将参数 mean 设置为 None 等效于将 mean 设为零向量。参数 cov 可以是一个标量,在这种情况下,协方差矩阵是该值的单位乘以该值、协方差矩阵的对角线元素的向量,或者是一个二维 array_like。协方差矩阵 cov 必须是(对称)半正定矩阵。cov 的行列式和逆分别计算为伪行列式和伪逆,因此 cov 不需要具有满秩。

我的实现

mean_matrix = np.array([mu1['CS Score (USNews)'],mu2['Research Overhead %'],mu3['Admin Base Pay$'],mu4['Tuition(out-state)$']]);
print("Mean Matrix : ",mean_matrix); 

logLikelihood = multivariate_normal.logpdf(data_frame_to_use, mean = mean_matrix, cov = covarianceMat, allow_singular='False');
print("Log matrix", logLikelihood);
print("Sum of log", sum(logLikelihood));

这给出了一个输出:

 Mean Matrix :  
 [  3.21428571e+00   5.33857143e+01   4.69178816e+05   2.97119592e+04]

 Log matrix 
 [-25.89859216 -25.39255136 -24.90457203 -24.62044334 -25.97797326
  -24.53094475 -24.86379124 -28.10541986 -25.17504371 -24.36097654
  -27.56393633 -26.45706387 -24.73181091 -24.73103739 -25.35676354
  -25.92874579 -27.37586004 -29.54768142 -24.49143024 -25.53990703
  -25.57939464 -26.84501673 -25.33293111 -24.3236322  -24.62756871
  -25.67609413 -26.81881766 -25.163922   -24.99671211 -24.94361195
  -24.93544698 -24.72654802 -24.99845459 -27.3604362  -25.56750359
  -26.8531682  -25.91679777 -27.4626466  -24.59908201 -27.17373079
  -24.91116583 -26.78552165 -27.94191254 -25.32212942 -25.73247674
  -26.51429465 -25.14545746 -24.43274555 -26.08543542]

 Sum of log -1262.32720006

但是当我手动应用公式时:

    #Calculating PDF for multivariate condition by implementing formula
import xlrd
filepath = "./DataSet/university_data.xlsx"
workbook=xlrd.open_workbook(filepath)
sheet=workbook.sheet_by_index(0)

print("\n")
print("PDF values for each row :")
sum=0
sum1=0
for row in range(1,50):
    sum_array=[] #Taking each row as input
    sum_array=sheet.row_values(row,2,6)
    #using formula implementation
    l=np.subtract(sum_array,mean_matrix)
    m=np.matrix.transpose(l)
    n=np.linalg.inv(covarianceMat)
    ex=np.exp(-0.5*np.dot(np.dot(m,n),l))
    f=1/(pow(2*3.14,2)*pow(np.linalg.det(covarianceMat),0.5))
    pdf=f*ex
#     print("PDF for row ",row," \t:\t\t ",pdf)
    lpdf=mt.log(pdf)
    print("LogPDF for row ",row," \t:\t\t ",lpdf)
    sum=sum+lpdf

print("\n")
print("Loglikelihood(formula implementation) : ",'%.3f' % float(sum))

这给出了输出:

>      PDF values for each row :
> 
> LogPDF for row  1     :         -30.19371482325026
> 
> LogPDF for row  2     :         -27.067463935229377
> 
> LogPDF for row  3     :         -26.980621478218485
> 
> LogPDF for row  4     :         -26.08324487529128
> 
> LogPDF for row  5     :         -27.26413815992288
> 
> LogPDF for row  6     :         -26.70724095561742
> 
> LogPDF for row  7     :         -25.92834293415632
> 
> LogPDF for row  8     :         -28.712707460803784
> 
> LogPDF for row  9     :         -25.909899171974057
> 
> LogPDF for row  10    :         -25.92659040666956
> 
> LogPDF for row  11    :         -28.211108799821343
> 
> LogPDF for row  12    :         -26.83353978964118
> 
> LogPDF for row  13    :         -25.328814199145395
> 
> LogPDF for row  14    :         -25.111128380106702
> 
> LogPDF for row  15    :         -25.749393390363533
> 
> LogPDF for row  16    :         -26.693092607802864
> 
> LogPDF for row  17    :         -28.170941906181092
> 
> LogPDF for row  18    :         -29.927055551899382
> 
> LogPDF for row  19    :         -24.86790828974201
> 
> LogPDF for row  20    :         -26.137292962654886
> 
> LogPDF for row  21    :         -25.96539688885015
> 
> LogPDF for row  22    :         -27.35065704563797
> 
> LogPDF for row  23    :         -25.92368769313743
> 
> LogPDF for row  24    :         -24.70248175116109
> 
> LogPDF for row  25    :         -25.06362483723621
> 
> LogPDF for row  26    :         -26.268885215917194
> 
> LogPDF for row  27    :         -27.864039083501908
> 
> LogPDF for row  28    :         -25.57129689424411
> 
> LogPDF for row  29    :         -25.389635711130655
> 
> LogPDF for row  30    :         -25.328718626588117
> 
> LogPDF for row  31    :         -25.908499721053612
> 
> LogPDF for row  32    :         -25.274591177021158
> 
> LogPDF for row  33    :         -25.864730872696875
> 
> LogPDF for row  34    :         -28.250699466070667
> 
> LogPDF for row  35    :         -26.427070541801417
> 
> LogPDF for row  36    :         -28.480271709879336
> 
> LogPDF for row  37    :         -26.304600263886595
> 
> LogPDF for row  38    :         -29.079517786952714
> 
> LogPDF for row  39    :         -25.167192328059414
> 
> LogPDF for row  40    :         -27.552414021501523
> 
> LogPDF for row  41    :         -25.576316408257583
> 
> LogPDF for row  42    :         -27.164966750624057
> 
> LogPDF for row  43    :         -28.738620620446113
> 
> LogPDF for row  44    :         -25.85303976738355
> 
> LogPDF for row  45    :         -26.28781267577098
> 
> LogPDF for row  46    :         -27.08876652783593
> 
> LogPDF for row  47    :         -25.81071248417473
> 
> LogPDF for row  48    :         -25.77382834320652
> 
> LogPDF for row  49    :         -26.892236096254386
> 
> 
> Loglikelihood(formula implementation) :  -1304.729

输入数据: - 上面使用的数据集

Python Notebook - 我的实现

4

0 回答 0