gnuplot - 使用gnuplot的直方图？

Question

如果我的 .dat 文件已经有正确的分箱数据，我知道如何在 gnuplot 中创建直方图（只需使用“带框”）。有没有办法获取数字列表并让 gnuplot 根据用户提供的范围和 bin 大小提供直方图？

score 236 · Accepted Answer

是的，虽然非常隐蔽，但它快速简单：

binwidth=5
bin(x,width)=width*floor(x/width)

plot 'datafile' using (bin($1,binwidth)):(1.0) smooth freq with boxes

看看help smooth freq为什么上面会做一个直方图

处理范围只需设置 xrange 变量。

score 96 · Accepted Answer

我对 Born2Smile 非常有用的答案进行了一些更正/补充：

空垃圾箱导致相邻垃圾箱的盒子错误地延伸到其空间中；避免这种使用set boxwidth binwidth
在 Born2Smile 的版本中，bin 以它们的下限为中心呈现。严格来说，它们应该从下限延伸到上限。这可以通过修改bin函数来纠正：bin(x,width)=width*floor(x/width) + width/2.0

score 83 · Accepted Answer

要非常小心：此页面上的所有答案都隐含地决定了分箱的开始位置 - 如果您愿意，最左边的分箱的左侧边缘 - 脱离了用户的手中。如果用户将这些用于分箱数据的任何功能与他/她自己关于分箱开始位置的决定相结合（如在上面链接到的博客上所做的那样），上述功能都是不正确的。对于分箱“Min”的任意起点，正确的函数是：

bin(x) = width*(floor((x-Min)/width)+0.5) + Min

您可以看到为什么这是按顺序正确的（它有助于在其中一个的某个地方绘制几个 bin 和一个点）。从您的数据点中减去 Min 以查看它在分箱范围内的距离。然后除以 binwidth，以便您有效地以“bins”为单位工作。然后将结果“地板”到该箱的左侧边缘，添加 0.5 到箱的中间，乘以宽度，这样您就不再以箱为单位工作，而是以绝对比例再次，然后最后加上你在开始时减去的最小偏移量。

考虑这个函数：

Min = 0.25 # where binning starts
Max = 2.25 # where binning ends
n = 2 # the number of bins
width = (Max-Min)/n # binwidth; evaluates to 1.0
bin(x) = width*(floor((x-Min)/width)+0.5) + Min

例如，值 1.1 真正落在左边的 bin 中：

此函数正确地将其映射到左侧 bin 的中心 (0.75)；
Born2Smile 的答案 bin(x)=width*floor(x/width) 错误地将其映射到 1；
mas90 的答案 bin(x)=width*floor(x/width) + binwidth/2.0 错误地将其映射到 1.5。

Born2Smile 的答案仅在 bin 边界出现在 (n+0.5)*binwidth （其中 n 超过整数）时才是正确的。只有当 bin 边界出现在 n*binwidth 时，mas90 的答案才是正确的。

score 52 · Accepted Answer

你想绘制这样的图表吗？在此处输入图像描述是的？那你可以看看我的博客文章： http: //gnuplot-surprising.blogspot.com/2011/09/statistic-analysis-and-histogram.html

代码中的关键行：

n=100 #number of intervals
max=3. #max value
min=-3. #min value
width=(max-min)/n #interval width
#function used to map a value to the intervals
hist(x,width)=width*floor(x/width)+width/2.0
set boxwidth width*0.9
set style fill solid 0.5 # fill style

#count and plot
plot "data.dat" u (hist($1,width)):(1.0) smooth freq w boxes lc rgb"green" notitle

score 10 · Accepted Answer

像往常一样，Gnuplot 是绘制漂亮图形的绝佳工具，它可以用来执行各种计算。但是，它的目的是绘制数据而不是用作计算器，并且通常更容易使用外部程序（例如 Octave）进行更“复杂”的计算，将这些数据保存在文件中，然后使用 Gnuplot 生成图表。对于上述问题，检查“hist”函数是 Octave using [freq,bins]=hist(data)，然后在 Gnuplot 中使用

set style histogram rowstacked gap 0
set style fill solid 0.5 border lt -1
plot "./data.dat" smooth freq with boxes

score 7 · Accepted Answer

我发现这个讨论非常有用，但我遇到了一些“四舍五入”的问题。

更准确地说，使用 0.05 的 binwidth，我注意到，使用上面介绍的技术，读取 0.1 和 0.15 的数据点落在同一个 bin 中。这（显然是不受欢迎的行为）很可能是由于“地板”功能。

此后是我试图规避这一点的小贡献。

bin(x,width,n)=x<=n*width? width*(n-1) + 0.5*binwidth:bin(x,width,n+1)
binwidth = 0.05
set boxwidth binwidth
plot "data.dat" u (bin($1,binwidth,1)):(1.0) smooth freq with boxes

此递归方法适用于 x >=0；人们可以用更多的条件语句来概括这一点，以获得更普遍的东西。

score 6 · Accepted Answer

我们不需要使用递归方法，它可能会很慢。我的解决方案是使用内部函数 int 或 floor 的用户定义函数 rint instesd。

rint(x)=(x-int(x)>0.9999)?int(x)+1:int(x)

这个函数会给出rint(0.0003/0.0001)=3，而int(0.0003/0.0001)=floor(0.0003/0.0001)=2。

为什么？请查看Perl int 函数和填充零

score 5 · Accepted Answer

我对 Born2Smile 的解决方案稍作修改。

我知道这没有多大意义，但您可能需要它以防万一。如果您的数据是整数并且您需要一个浮动箱大小（可能是为了与另一组数据进行比较，或者在更精细的网格中绘制密度），您需要在楼层内添加一个介于 0 和 1 之间的随机数。否则，由于向上舍入误差会出现尖峰。floor(x/width+0.5)不会这样做，因为它会创建与原始数据不相符的模式。

binwidth=0.3
bin(x,width)=width*floor(x/width+rand(0))

score 3 · Accepted Answer

关于分箱功能，我没想到到目前为止提供的功能的结果。即，如果我的 binwidth 为 0.001，这些函数将 bin 以 0.0005 点为中心，而我觉得将 bin 以 0.001 边界为中心更直观。

换句话说，我想拥有

Bin 0.001 contain data from 0.0005 to 0.0014
Bin 0.002 contain data from 0.0015 to 0.0024
...

我想出的分箱功能是

my_bin(x,width)     = width*(floor(x/width+0.5))

这是一个脚本，用于将一些提供的 bin 函数与这个进行比较：

rint(x) = (x-int(x)>0.9999)?int(x)+1:int(x)
bin(x,width)        = width*rint(x/width) + width/2.0
binc(x,width)       = width*(int(x/width)+0.5)
mitar_bin(x,width)  = width*floor(x/width) + width/2.0
my_bin(x,width)     = width*(floor(x/width+0.5))

binwidth = 0.001

data_list = "-0.1386 -0.1383 -0.1375 -0.0015 -0.0005 0.0005 0.0015 0.1375 0.1383 0.1386"

my_line = sprintf("%7s  %7s  %7s  %7s  %7s","data","bin()","binc()","mitar()","my_bin()")
print my_line
do for [i in data_list] {
    iN = i + 0
    my_line = sprintf("%+.4f  %+.4f  %+.4f  %+.4f  %+.4f",iN,bin(iN,binwidth),binc(iN,binwidth),mitar_bin(iN,binwidth),my_bin(iN,binwidth))
    print my_line
}

这是输出

   data    bin()   binc()  mitar()  my_bin()
-0.1386  -0.1375  -0.1375  -0.1385  -0.1390
-0.1383  -0.1375  -0.1375  -0.1385  -0.1380
-0.1375  -0.1365  -0.1365  -0.1375  -0.1380
-0.0015  -0.0005  -0.0005  -0.0015  -0.0010
-0.0005  +0.0005  +0.0005  -0.0005  +0.0000
+0.0005  +0.0005  +0.0005  +0.0005  +0.0010
+0.0015  +0.0015  +0.0015  +0.0015  +0.0020
+0.1375  +0.1375  +0.1375  +0.1375  +0.1380
+0.1383  +0.1385  +0.1385  +0.1385  +0.1380
+0.1386  +0.1385  +0.1385  +0.1385  +0.1390

score 0 · Accepted Answer

同一数据集上不同数量的 bin 可以揭示数据的不同特征。

不幸的是，没有通用的最佳方法可以确定垃圾箱的数量。

一种强大的方法是Freedman–Diaconis 规则，它根据给定数据集的统计数据自动确定 bin 的数量，以及许多其他替代方法。

因此，以下可用于在gnuplot脚本中利用 Freedman–Diaconis 规则：

假设您有一个包含单列样本的文件samplesFile：

# samples
0.12345
1.23232
...

以下（基于ChrisW 的回答）可以嵌入到现有gnuplot脚本中：

...
## preceeding gnuplot commands
...

#
samples="$samplesFile"
stats samples nooutput
N = floor(STATS_records)
samplesMin = STATS_min
samplesMax = STATS_max
# Freedman–Diaconis formula for bin-width size estimation
    lowQuartile = STATS_lo_quartile
    upQuartile = STATS_up_quartile
    IQR = upQuartile - lowQuartile
    width = 2*IQR/(N**(1.0/3.0))
    bin(x) = width*(floor((x-samplesMin)/width)+0.5) + samplesMin

plot \
    samples u (bin(\$1)):(1.0/(N*width)) t "Output" w l lw 1 smooth freq

gnuplot - 使用gnuplot的直方图？

10 回答 10

Related

Reference