如何生成由从二维高斯分布N = 100
中提取的二维样本组成的数据集,均值x = (x1,x2)T ∈ R2
µ = (1,1)T
和协方差矩阵
Σ = (0.3 0.2
0.2 0.2)
我听说你可以使用Matlab函数randn
,但不知道如何在 Python 中实现它?
如何生成由从二维高斯分布N = 100
中提取的二维样本组成的数据集,均值x = (x1,x2)T ∈ R2
µ = (1,1)T
和协方差矩阵
Σ = (0.3 0.2
0.2 0.2)
我听说你可以使用Matlab函数randn
,但不知道如何在 Python 中实现它?
只是为了详细说明@EamonNerbonne 的答案:以下使用 协方差矩阵的Cholesky 分解从不相关的正态分布随机变量生成相关变量。
import numpy as np
import matplotlib.pyplot as plt
linalg = np.linalg
N = 1000
mean = [1,1]
cov = [[0.3, 0.2],[0.2, 0.2]]
data = np.random.multivariate_normal(mean, cov, N)
L = linalg.cholesky(cov)
# print(L.shape)
# (2, 2)
uncorrelated = np.random.standard_normal((2,N))
data2 = np.dot(L,uncorrelated) + np.array(mean).reshape(2,1)
# print(data2.shape)
# (2, 1000)
plt.scatter(data2[0,:], data2[1,:], c='green')
plt.scatter(data[:,0], data[:,1], c='yellow')
plt.show()
黄点是由 生成的np.random.multivariate_normal
。绿点是通过将正态分布点乘以 Cholesky 分解矩阵生成的L
。
您正在寻找numpy.random.multivariate_normal
代码
>>> import numpy
>>> print numpy.random.multivariate_normal([1,1], [[0.3, 0.2],[0.2, 0.2]], 100)
[[ 0.02999043 0.09590078]
[ 1.35743021 1.08199363]
[ 1.15721179 0.87750625]
[ 0.96879114 0.94503228]
[ 1.23989167 1.13473083]
[ 1.55917608 0.81530847]
[ 0.89985651 0.7071519 ]
[ 0.37494324 0.739433 ]
[ 1.45121732 1.17168444]
[ 0.69680785 1.2727178 ]
[ 0.35600769 0.46569276]
[ 2.14187488 1.8758589 ]
[ 1.59276393 1.54971412]
[ 1.71227009 1.63429704]
[ 1.05013136 1.1669758 ]
[ 1.34344004 1.37369725]
[ 1.82975724 1.49866636]
[ 0.80553877 1.26753018]
[ 1.74331784 1.27211784]
[ 1.23044292 1.18110192]
[ 1.07675493 1.05940509]
[ 0.15495771 0.64536509]
[ 0.77409745 1.0174171 ]
[ 1.20062726 1.3870498 ]
[ 0.39619719 0.77919884]
[ 0.87209168 1.00248145]
[ 1.32273339 1.54428262]
[ 2.11848535 1.44338789]
[ 1.45226461 1.42061198]
[ 0.33775737 0.24968543]
[ 1.06982557 0.64674411]
[ 0.92113229 1.0583153 ]
[ 0.54987592 0.73198037]
[ 1.06559727 0.77891362]
[ 0.84371805 0.72957046]
[ 1.83614557 1.40582746]
[ 0.53146009 0.72294094]
[ 0.98927818 0.73732053]
[ 1.03984002 0.89426628]
[ 0.38142362 0.32471126]
[ 1.44464929 1.15407227]
[-0.22601279 0.21045592]
[-0.01995875 0.45051782]
[ 0.58779449 0.44486237]
[ 1.31335981 0.92875936]
[ 0.42200098 0.6942829 ]
[ 0.10714426 0.11083002]
[ 1.44997839 1.19052704]
[ 0.78630506 0.45877582]
[ 1.63432202 1.95066539]
[ 0.56680926 0.92203111]
[ 0.08841491 0.62890576]
[ 1.4703602 1.4924649 ]
[ 1.01118864 1.44749407]
[ 1.19936276 1.02534702]
[ 0.67893239 0.8482461 ]
[ 0.71537211 0.53279103]
[ 1.08031573 1.00779064]
[ 0.66412568 0.57121041]
[ 0.96098528 0.72318386]
[ 0.7690299 0.76058713]
[ 0.77466896 0.77559282]
[ 0.47906664 0.58602633]
[ 0.52481326 0.78486453]
[-0.40240438 0.17374116]
[ 0.75730444 0.22365892]
[ 0.67811008 1.17730408]
[ 1.62245699 1.71775386]
[ 1.12317847 1.04252136]
[-0.06461117 0.23557416]
[ 0.46299482 0.51585414]
[ 0.88125676 1.23284201]
[ 0.57920534 0.63765861]
[ 0.88239858 1.32092112]
[ 0.63500551 0.94788141]
[ 1.76588148 1.63856465]
[ 0.65026599 0.6899672 ]
[ 0.06854287 0.29712499]
[ 0.61575737 0.87526625]
[ 0.30057552 0.54475194]
[ 0.66578769 0.21034844]
[ 0.94670438 0.7699764 ]
[ 0.39870371 0.91681577]
[ 1.37531351 1.62337899]
[ 1.92350877 1.34382017]
[ 0.56631877 0.77456137]
[ 1.18702642 0.63700271]
[ 0.74002244 1.04535471]
[ 0.3272063 0.75097037]
[ 1.57583435 1.55809705]
[ 0.44325124 0.39620769]
[ 0.59762516 0.58304621]
[ 0.72253698 0.68302097]
[ 0.93459597 1.01101948]
[ 0.50139577 0.52500942]
[ 0.84696441 0.68679341]
[ 0.63483432 0.22205385]
[ 1.43642478 1.34724612]
[ 1.58663111 1.49941374]
[ 0.73832806 0.95690866]]
>>>
尽管 numpy 有方便的实用函数,但您始终可以“重新缩放”多个独立的正态分布变量以匹配给定的协方差矩阵。因此,如果您可以生成一个列向量x
(或在矩阵中分组的许多向量),其中每个元素都是正态分布的,并且您按 matrix 进行缩放M
,结果将具有 covariance M M^T
。相反,如果您将协方差分解C
为形式M M^T
,那么即使没有 numpy 提供的实用函数(只需将您的一堆正态分布向量乘以M
),生成这样的分布也非常简单。
这可能不是您要直接寻找的答案,但请牢记这一点很有用,例如: