我正在尝试计算和绘制维基百科投票网络的出度和入度分布(包含在网络数据集的 SNAP 集合中)。这是一个有向图,表示为边列表。
要读取和存储图形数据:
%Read the data file.
G = importdata('Wiki-Vote.txt', ' ', 4);
%G is a structure that contains:
% - data: a <num_of_edges,2> matrix filled with node (wiki users) ids
% - textdata: a cell matrix that contains the header strings (first 4
% lines).
% - colheaders: a cell matrix that contains the last descriptive string
% (fourth line).
%All the useful information is contained into data matrix.
%Split directed edge list into 'from' and 'to' nodes lists.
Nfrom = G.data(:,1); %Will be used to compute out-degree
Nto = G.data(:,2); % "..." in-degree
受这个问题的启发,我按照这种方式计算出度
%Remove duplicate entries from Nfrom and Nto lists.
Nfrom = unique(Nfrom); %Will be used to compute the outdegree distribution.
Nto = unique(Nto); %Will be used to compute the indegree distribution.
%Out-degree: count the number of occurances of each element (node-user id)
%contained into Nfrom to G.data(:,1).
outdegNsG = histc(G.data(:,1), Nfrom);
odG = hist(outdegNsG, 1:size(Nfrom));
figure;
plot(odG)
title('linear-linear scale plot: outdegree distribution');
figure;
loglog(odG)
title('log-log scale plot: outdegree distribution');
计算入度也要做同样的事情。但我采用的线性情节远远不能令人满意,这让我怀疑我的方法是否不正确。
线性比例:
在对数刻度中:
以线性比例放大分布图可以清楚地看出它接近幂律:
我的问题是我计算学位分布的方法是否正确,因为我没有任何帮助来确保这一点。具体来说,我想知道是否使用较少数量的 binhistc
会提供更清晰的图表,而不会丢失任何有价值的信息。