python - Python中的多元正态密度？

Question

是否有任何 python 包可以有效计算多元正态分布的 PDF（概率密度函数）？

它似乎没有包含在 Numpy/Scipy 中，令人惊讶的是，谷歌搜索并没有找到任何有用的东西。

score 84 · Accepted Answer

多元法线现在可用于SciPy 0.14.0.dev-16fc0af：

from scipy.stats import multivariate_normal
var = multivariate_normal(mean=[0,0], cov=[[1,0],[0,1]])
var.pdf([1,0])

score 31 · Accepted Answer

我只是为了我的目的做了一个，所以我想分享一下。它是使用 numpy 的“力量”构建的，基于http://en.wikipedia.org/wiki/Multivariate_normal_distribution的非退化案例的公式，它也验证了输入。

这是代码以及示例运行

from numpy import *
import math
# covariance matrix
sigma = matrix([[2.3, 0, 0, 0],
           [0, 1.5, 0, 0],
           [0, 0, 1.7, 0],
           [0, 0,   0, 2]
          ])
# mean vector
mu = array([2,3,8,10])

# input
x = array([2.1,3.5,8, 9.5])

def norm_pdf_multivariate(x, mu, sigma):
    size = len(x)
    if size == len(mu) and (size, size) == sigma.shape:
        det = linalg.det(sigma)
        if det == 0:
            raise NameError("The covariance matrix can't be singular")

        norm_const = 1.0/ ( math.pow((2*pi),float(size)/2) * math.pow(det,1.0/2) )
        x_mu = matrix(x - mu)
        inv = sigma.I        
        result = math.pow(math.e, -0.5 * (x_mu * inv * x_mu.T))
        return norm_const * result
    else:
        raise NameError("The dimensions of the input don't match")

print norm_pdf_multivariate(x, mu, sigma)

score 17 · Accepted Answer

如果仍然需要，我的实现将是

import numpy as np

def pdf_multivariate_gauss(x, mu, cov):
    '''
    Caculate the multivariate normal density (pdf)

    Keyword arguments:
        x = numpy array of a "d x 1" sample vector
        mu = numpy array of a "d x 1" mean vector
        cov = "numpy array of a d x d" covariance matrix
    '''
    assert(mu.shape[0] > mu.shape[1]), 'mu must be a row vector'
    assert(x.shape[0] > x.shape[1]), 'x must be a row vector'
    assert(cov.shape[0] == cov.shape[1]), 'covariance matrix must be square'
    assert(mu.shape[0] == cov.shape[0]), 'cov_mat and mu_vec must have the same dimensions'
    assert(mu.shape[0] == x.shape[0]), 'mu and x must have the same dimensions'
    part1 = 1 / ( ((2* np.pi)**(len(mu)/2)) * (np.linalg.det(cov)**(1/2)) )
    part2 = (-1/2) * ((x-mu).T.dot(np.linalg.inv(cov))).dot((x-mu))
    return float(part1 * np.exp(part2))

def test_gauss_pdf():
    x = np.array([[0],[0]])
    mu  = np.array([[0],[0]])
    cov = np.eye(2) 

    print(pdf_multivariate_gauss(x, mu, cov))

    # prints 0.15915494309189535

if __name__ == '__main__':
    test_gauss_pdf()

如果我将来进行更改，代码在 GitHub 上

score 8 · Accepted Answer

scipy.stats.norm在对角协方差矩阵的常见情况下，可以通过简单地将实例返回的单变量 PDF 值相乘来获得多变量 PDF 。如果您需要一般情况，您可能必须自己编写代码（这应该不难）。

score 5 · Accepted Answer

您可以使用 numpy 轻松计算。为了机器学习课程的目的，我已经实现了如下，并想分享，希望它对某人有所帮助。

import numpy as np
X = np.array([[13.04681517, 14.74115241],[13.40852019, 13.7632696 ],[14.19591481, 15.85318113],[14.91470077, 16.17425987]])

def est_gaus_par(X):
    mu = np.mean(X,axis=0)
    sig = np.std(X,axis=0)
    return mu,sig

mu,sigma = est_gaus_par(X)

def est_mult_gaus(X,mu,sigma):
    m = len(mu)
    sigma2 = np.diag(sigma)
    X = X-mu.T
    p = 1/((2*np.pi)**(m/2)*np.linalg.det(sigma2)**(0.5))*np.exp(-0.5*np.sum(X.dot(np.linalg.pinv(sigma2))*X,axis=1))

    return p

p = est_mult_gaus(X, mu, sigma)

score 3 · Accepted Answer

我知道有几个在内部使用它的 python 包，具有不同的通用性和不同的用途，但我不知道它们中的任何一个是否适用于用户。

例如，statsmodels 具有以下隐藏函数和类，但它不被 statsmodels 使用：

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/miscmodels/try_mlecov.py#L36

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/mv_normal.py#L777

本质上，如果您需要快速评估，请为您的用例重写它。

score 3 · Accepted Answer

我使用以下代码计算 logpdf 值，这对于更大的维度更可取。它也适用于 scipy.sparse 矩阵。

import numpy as np
import math
import scipy.sparse as sp
import scipy.sparse.linalg as spln

def lognormpdf(x,mu,S):
    """ Calculate gaussian probability density of x, when x ~ N(mu,sigma) """
    nx = len(S)
    norm_coeff = nx*math.log(2*math.pi)+np.linalg.slogdet(S)[1]

    err = x-mu
    if (sp.issparse(S)):
        numerator = spln.spsolve(S, err).T.dot(err)
    else:
        numerator = np.linalg.solve(S, err).T.dot(err)

    return -0.5*(norm_coeff+numerator)

代码来自pyParticleEst，如果您想要 pdf 值而不是 logpdf，只需在返回值上使用 math.exp()

score 2 · Accepted Answer

可以使用 numpy 函数和此页面上的公式以非常简单的方式计算密度：http ://en.wikipedia.org/wiki/Multivariate_normal_distribution 。您可能还想使用似然函数（对数概率），它对于大尺寸不太可能下溢，并且计算起来更简单。两者都只涉及能够计算矩阵的行列式和逆矩阵。

另一方面，CDF 是一种完全不同的动物……

score 0 · Accepted Answer

下面的代码帮助我解决，当给定一个向量时，向量处于多元正态分布的可能性是多少。

import numpy as np
from scipy.stats import multivariate_normal

包含所有向量的数据

d= np.array([[1,2,1],[2,1,3],[4,5,4],[2,2,1]])

向量形式的数据的平均值，其长度与输入向量相同（此处为 3）

mean = sum(d,axis=0)/len(d)

OR
mean=np.average(d , axis=0)
mean.shape

找到形状为 [输入向量形状 X 输入向量形状] 的向量的协方差，这里是 3x3

cov = 0
for e in d:
  cov += np.dot((e-mean).reshape(len(e), 1), (e-mean).reshape(1, len(e)))
cov /= len(d)
cov.shape

从均值和协方差准备多元高斯分布

dist = multivariate_normal(mean,cov)

寻找概率分布函数。

print(dist.pdf([1,2,3]))

3.050863384798471e-05

上面的值给出了可能性。

score 0 · Accepted Answer

在这里，我详细说明了如何准确使用 scipy 包中的multivariate_normal()：

# Import packages
import numpy as np
from scipy.stats import multivariate_normal

# Prepare your data
x = np.linspace(-10, 10, 500)
y = np.linspace(-10, 10, 500)
X, Y = np.meshgrid(x,y)

# Get the multivariate normal distribution
mu_x = np.mean(x)
sigma_x = np.std(x)
mu_y = np.mean(y)
sigma_y = np.std(y)
rv = multivariate_normal([mu_x, mu_y], [[sigma_x, 0], [0, sigma_y]])

# Get the probability density
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X
pos[:, :, 1] = Y
pd = rv.pdf(pos)

python - Python中的多元正态密度？

10 回答 10

包含所有向量的数据

向量形式的数据的平均值，其长度与输入向量相同（此处为 3）

找到形状为 [输入向量形状 X 输入向量形状] 的向量的协方差，这里是 3x3

从均值和协方差准备多元高斯分布

寻找概率分布函数。

Related

Reference