gnuplot - 根据 gnuplot 上的数据绘制“完美”的 Zipf 分布

Question

我的目标是拥有一个简单的 .dat 文件，并从中绘制出完美 Zipf 分布的实际数据和理论点，即每个项目的值都等于 1/(rank) 的分布。

例如，我关注最多的 Instagram 帐户的数据是：

# List of most followed users on instagram
# By rank and millions of followers
# From Wikipedia
# https://en.wikipedia.org/wiki/List_of_most_followed_users_on_Instagram
# rank, millions of followers

1 222
2 120
3 105
4 101
5 101
6 100
7 99 
8 93 
9 86 
10 85
11 80
12 79
13 76
14 73
15 71
16 69
17 67
18 65
19 63
20 63

从另一个线程我了解到，我可以添加一个新列，其中包含每个等级的理想 Zipf 分布值（在本例中为 222、111、74、55.5 等），然后运行第二个图，,'' using 1:3但这需要手动进行计算和将其附加到原始文件中，这是我试图避免的步骤。这可能吗？我如何将其扩展到其他数据分布/计算？

score 0 · Accepted Answer

用于stats计算第二列的最大值

stats 'file.dat' u 2 nooutput
max = STATS_max

然后你计算 Zipf 分布(max/$1)

plot 'file.dat' u 1:2 pt 7 t 'data',\
     '' u 1:(max/$1) w l t 'ideal Zipf'

gnuplot - 根据 gnuplot 上的数据绘制“完美”的 Zipf 分布

1 回答 1

Related

Reference