python - 集成二维核密度估计

Question

我有一个通过scipy.stats.gaussian_kdex,y获得的点分布。这是我的代码和输出的样子（数据可以从这里获得）：KDEx,y

import numpy as np
from scipy import stats

# Obtain data from file.
data = np.loadtxt('data.dat', unpack=True)
m1, m2 = data[0], data[1]
xmin, xmax = min(m1), max(m1)
ymin, ymax = min(m2), max(m2)

# Perform a kernel density estimate (KDE) on the data
x, y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([x.ravel(), y.ravel()])
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values)
f = np.reshape(kernel(positions).T, x.shape)

# Define the number that will determine the integration limits
x1, y1 = 2.5, 1.5

# Perform integration?

# Plot the results:
import matplotlib.pyplot as plt
# Set limits
plt.xlim(xmin,xmax)
plt.ylim(ymin,ymax)
# KDE density plot
plt.imshow(np.rot90(f), cmap=plt.cm.gist_earth_r, extent=[xmin, xmax, ymin, ymax])
# Draw contour lines
cset = plt.contour(x,y,f)
plt.clabel(cset, inline=1, fontsize=10)
plt.colorbar()
# Plot point
plt.scatter(x1, y1, c='r', s=35)
plt.show()

带有坐标的红点（与 2D 图中的每个点一样）具有由（内核或）给出的 0 到 0.42 之间(x1, y1)的关联值。让我们这么说吧。fKDEf(x1, y1) = 0.08

我需要与那些评估为小于f的区域中的积分限制相结合，即：.xyff(x1, y1)f(x, y)<0.08

对于我所看到python的可以通过数值积分来执行函数和一维数组的积分，但是我还没有看到任何可以让我在二维数组（f内核）上执行数值积分的东西此外，我不确定如何我什至会识别该特定条件给出的区域（即：f(x, y)小于给定值）

这完全可以做到吗？

score 7 · Accepted Answer

Here is a way to do it using monte carlo integration. It is a little slow, and there is randomness in the solution. The error is inversely proportional to the square root of the sample size, while the running time is directly proportional to the sample size (where sample size refers to the monte carlo sample (10000 in my example below), not the size of your data set). Here is some simple code using your kernel object.

#Compute the point below which to integrate
iso = kernel((x1,y1))

#Sample from your KDE distribution
sample = kernel.resample(size=10000)

#Filter the sample
insample = kernel(sample) < iso

#The integral you want is equivalent to the probability of drawing a point 
#that gets through the filter
integral = insample.sum() / float(insample.shape[0])
print integral

I get approximately 0.2 as the answer for your data set.

score 3 · Accepted Answer

3

目前，它是可用的

kernel.integrate_box([-np.inf,-np.inf], [2.5,1.5])

于 2019-11-01T13:25:42.917 回答

score 1 · Accepted Answer

一个直接的方法是integrate

import matplotlib.pyplot as plt
import sklearn
from scipy import integrate
import numpy as np

mean = [0, 0]
cov = [[5, 0], [0, 10]]
x, y = np.random.multivariate_normal(mean, cov, 5000).T
plt.plot(x, y, 'o')
plt.show()

sample = np.array(zip(x, y))
kde = sklearn.neighbors.KernelDensity().fit(sample)
def f_kde(x,y):
    return np.exp((kde.score_samples([[x,y]])))

point = x1, y1
integrate.nquad(f_kde, [[-np.inf, x1],[-np.inf, y1]])

问题是，如果你大规模地这样做，这会很慢。例如，如果要x,y在 x (0,100) 处绘制线，则需要很长时间来计算。

注意：我使用kde了 from sklearn，但我相信您也可以将其更改为其他形式。

使用kernel原始问题中定义的：

import numpy as np
from scipy import stats
from scipy import integrate

def integ_func(kde, x1, y1):

    def f_kde(x, y):
        return kde((x, y))

    integ = integrate.nquad(f_kde, [[-np.inf, x1], [-np.inf, y1]])

    return integ

# Obtain data from file.
data = np.loadtxt('data.dat', unpack=True)
# Perform a kernel density estimate (KDE) on the data
kernel = stats.gaussian_kde(data)

# Define the number that will determine the integration limits
x1, y1 = 2.5, 1.5
print integ_func(kernel, x1, y1)

python - 集成二维核密度估计

3 回答 3

Related

Reference