8

如何生成由从二维高斯分布N = 100中提取的二维样本组成的数据集,均值x = (x1,x2)T ∈ R2

µ = (1,1)T

和协方差矩阵

Σ = (0.3 0.2 
     0.2 0.2)

我听说你可以使用Matlab函数randn,但不知道如何在 Python 中实现它?

4

3 回答 3

11

只是为了详细说明@EamonNerbonne 的答案:以下使用 协方差矩阵的Cholesky 分解从不相关的正态分布随机变量生成相关变量。

import numpy as np
import matplotlib.pyplot as plt
linalg = np.linalg

N = 1000
mean = [1,1]
cov = [[0.3, 0.2],[0.2, 0.2]]
data = np.random.multivariate_normal(mean, cov, N)
L = linalg.cholesky(cov)
# print(L.shape)
# (2, 2)
uncorrelated = np.random.standard_normal((2,N))
data2 = np.dot(L,uncorrelated) + np.array(mean).reshape(2,1)
# print(data2.shape)
# (2, 1000)
plt.scatter(data2[0,:], data2[1,:], c='green')    
plt.scatter(data[:,0], data[:,1], c='yellow')
plt.show()

在此处输入图像描述

黄点是由 生成的np.random.multivariate_normal。绿点是通过将正态分布点乘以 Cholesky 分解矩阵生成的L

于 2013-02-17T13:14:43.133 回答
5

您正在寻找numpy.random.multivariate_normal

代码

>>> import numpy
>>> print numpy.random.multivariate_normal([1,1], [[0.3, 0.2],[0.2, 0.2]], 100)
[[ 0.02999043  0.09590078]
 [ 1.35743021  1.08199363]
 [ 1.15721179  0.87750625]
 [ 0.96879114  0.94503228]
 [ 1.23989167  1.13473083]
 [ 1.55917608  0.81530847]
 [ 0.89985651  0.7071519 ]
 [ 0.37494324  0.739433  ]
 [ 1.45121732  1.17168444]
 [ 0.69680785  1.2727178 ]
 [ 0.35600769  0.46569276]
 [ 2.14187488  1.8758589 ]
 [ 1.59276393  1.54971412]
 [ 1.71227009  1.63429704]
 [ 1.05013136  1.1669758 ]
 [ 1.34344004  1.37369725]
 [ 1.82975724  1.49866636]
 [ 0.80553877  1.26753018]
 [ 1.74331784  1.27211784]
 [ 1.23044292  1.18110192]
 [ 1.07675493  1.05940509]
 [ 0.15495771  0.64536509]
 [ 0.77409745  1.0174171 ]
 [ 1.20062726  1.3870498 ]
 [ 0.39619719  0.77919884]
 [ 0.87209168  1.00248145]
 [ 1.32273339  1.54428262]
 [ 2.11848535  1.44338789]
 [ 1.45226461  1.42061198]
 [ 0.33775737  0.24968543]
 [ 1.06982557  0.64674411]
 [ 0.92113229  1.0583153 ]
 [ 0.54987592  0.73198037]
 [ 1.06559727  0.77891362]
 [ 0.84371805  0.72957046]
 [ 1.83614557  1.40582746]
 [ 0.53146009  0.72294094]
 [ 0.98927818  0.73732053]
 [ 1.03984002  0.89426628]
 [ 0.38142362  0.32471126]
 [ 1.44464929  1.15407227]
 [-0.22601279  0.21045592]
 [-0.01995875  0.45051782]
 [ 0.58779449  0.44486237]
 [ 1.31335981  0.92875936]
 [ 0.42200098  0.6942829 ]
 [ 0.10714426  0.11083002]
 [ 1.44997839  1.19052704]
 [ 0.78630506  0.45877582]
 [ 1.63432202  1.95066539]
 [ 0.56680926  0.92203111]
 [ 0.08841491  0.62890576]
 [ 1.4703602   1.4924649 ]
 [ 1.01118864  1.44749407]
 [ 1.19936276  1.02534702]
 [ 0.67893239  0.8482461 ]
 [ 0.71537211  0.53279103]
 [ 1.08031573  1.00779064]
 [ 0.66412568  0.57121041]
 [ 0.96098528  0.72318386]
 [ 0.7690299   0.76058713]
 [ 0.77466896  0.77559282]
 [ 0.47906664  0.58602633]
 [ 0.52481326  0.78486453]
 [-0.40240438  0.17374116]
 [ 0.75730444  0.22365892]
 [ 0.67811008  1.17730408]
 [ 1.62245699  1.71775386]
 [ 1.12317847  1.04252136]
 [-0.06461117  0.23557416]
 [ 0.46299482  0.51585414]
 [ 0.88125676  1.23284201]
 [ 0.57920534  0.63765861]
 [ 0.88239858  1.32092112]
 [ 0.63500551  0.94788141]
 [ 1.76588148  1.63856465]
 [ 0.65026599  0.6899672 ]
 [ 0.06854287  0.29712499]
 [ 0.61575737  0.87526625]
 [ 0.30057552  0.54475194]
 [ 0.66578769  0.21034844]
 [ 0.94670438  0.7699764 ]
 [ 0.39870371  0.91681577]
 [ 1.37531351  1.62337899]
 [ 1.92350877  1.34382017]
 [ 0.56631877  0.77456137]
 [ 1.18702642  0.63700271]
 [ 0.74002244  1.04535471]
 [ 0.3272063   0.75097037]
 [ 1.57583435  1.55809705]
 [ 0.44325124  0.39620769]
 [ 0.59762516  0.58304621]
 [ 0.72253698  0.68302097]
 [ 0.93459597  1.01101948]
 [ 0.50139577  0.52500942]
 [ 0.84696441  0.68679341]
 [ 0.63483432  0.22205385]
 [ 1.43642478  1.34724612]
 [ 1.58663111  1.49941374]
 [ 0.73832806  0.95690866]]
>>> 
于 2013-02-17T11:00:14.023 回答
3

尽管 numpy 有方便的实用函数,但您始终可以“重新缩放”多个独立的正态分布变量以匹配给定的协方差矩阵。因此,如果您可以生成一个列向量x(或在矩阵中分组的许多向量),其中每个元素都是正态分布的,并且您按 matrix 进行缩放M,结果将具有 covariance M M^T。相反,如果您将协方差分解C为形式M M^T,那么即使没有 numpy 提供的实用函数(只需将您的一堆正态分布向量乘以M),生成这样的分布也非常简单。

这可能不是您要直接寻找的答案,但请牢记这一点很有用,例如:

  • 如果您发现自己缩放随机生成的结果,则可以将缩放与初始协方差结合起来
  • 如果您需要将代码移植到不直接支持这种实用方法的库,那么您自己很容易实现。
于 2013-02-17T12:00:12.363 回答