0

I have discrete empirical data which forms a histogram with gaps. I.e. no observations were made of certain values. However in reality those values may well occur. This is a fig of the scatter graph.

enter image description here

So my question is, SHOULD I interpolate between xaxis values to make bins for the histogram ? If so what would you suggest to be best practice?

Regards,

4

1 回答 1

2

Don't do it.

With that many sample points, the probability (p-value) of getting empty bins if the distribution is smooth is quite low. There's some underlying reason they're empty, which you may want to investigate. I can think of two possibilities:

  1. Your data actually is discrete (perhaps someone rounded off to 1 signficant figure during data collection, or quantization error was significantly in an ADC) and then unit conversion caused irregular gaps. Even conversion from .12 and .13 to 12,13 as shown could cause this issue, if .12 is actually represented as .11111111198 inside the computer. But this would tend to double-up in a neighboring bin and the gaps would tend to be regularly spaced, so I doubt this is the cause. (For example, if 128 trials of a Bernoulli coin-flip experiment were done for each data point, and someone recorded the percentage of heads in each series to the nearest 1%, you could multiply by 1.28/% to try to recover the actual number of heads, but there'd be 28 empty bins)

  2. Your distribution has real lobes. Because the frequency is significantly reduced following each empty bin, I favor this explanation.

But these are just starting suggestions for your own investigation.

于 2013-02-21T18:36:13.483 回答