4

Let's assume we have drawn n=10000 samples of the standard normal distribution.

Now I want to calculate its entropy using histograms to calculate the probabilities.

1) calculate probabilities (for example using matlab)

[p,x] = hist(samples,binnumbers);
area = (x(2)-x(1))*sum(p);
p = p/area;

(binnumbers is determined due to some rule)

2) estimate entropy

H = -sum(p.*log2(p))

which gives 58.6488

Now when i use the direct formula to calculate the entropy of normal data

H = 0.5*log2(2*pi*exp(1)) = 2.0471

What do i do wrong when using the histograms + entropy formula? Thank you very much for any help!!

4

1 回答 1

3

您缺少dp总和中的术语

dp = (x(2)-x(1));
area = sum(p)*dp;
H = -sum( (p*dp) * log2(p) );

这应该让你足够接近......

PS,
当你拿的时候要小心,log2(p)因为有时你可能有空的垃圾箱。你可能会觉得nansum有用。

于 2013-05-13T19:32:33.580 回答