python - 在有向图 networkx 的大型网络实例上最快的迭代是多少？

Question

我正在编写一个从 Python 中的开源 networkx 包继承自 DiGraph.py 的类。

在我课堂上的某些方法中，我需要搜索具有一定度数的节点（有向图的出度或入度）并返回它们。

此类将与数据挖掘项目\自然语言处理一起使用，它将用于超大型网络。我需要的是快速实现所描述的方法（返回具有一定出度或一定度数的节点列表）。

超类中已经定义了一些东西： 1. 方法network.outdegree()：返回一个带有节点键和出度值的字典。

{'school': 4, 'middle school': 0, 'university': 0, 'commercial': 0, 'private': 5, 'institution': 2, 'high school': 0, 'college': 0, 'elementary school': 0, 'central': 0, 'company': 0, 'public': 3, 'bank': 2}

一种方法是

network.out_degree_iter()

<generator object out_degree_iter at 0x02EEB328>

我不知道如何使用这种方法，如果有人可以解释如何使用，我将不胜感激。

3.我有一个属性network.nodes，它是我网络中所有节点的列表。

问题：我可以遍历所有节点并返回出度为 2 的节点，例如，通过对 network.nodes 进行列表推导，或者我可以遍历我的字典并返回值为 2 的节点列表，或者使用out_degree_iter()我不知道它是如何使用的，或者它与在 for 循环中迭代字典项目有什么不同（for k,v in dict.iteritems()）？对于非常大的节点和边缘网络，其中哪一个会更快，为什么？

谢谢

score 2 · Accepted Answer

迭代器更适合大型图，因为您不构造字典的副本。像这样的东西怎么样：

list_of_2 = []
for g in G.out_degree_iter():
    if g[1]==2:
        list_of_2.append(g[0])

或者，

list_of_2 = map(lambda x:x[0],filter(lambda x:(x[1]==2),G.out_degree_iter()))

score 2 · Accepted Answer

最简单的方法是使用 out_degree_iter() 方法和您建议的列表理解。方法如下：

import networkx as nx
G=nx.DiGraph(nx.gnp_random_graph(1000,0.001))
t1=[n for n,k in G.out_degree_iter() if k==2

最快的方法需要访问内部数据结构：

t2=[n for n,nbrs in G.succ.items() if len(nbrs)==2]

对于度数，我们使用 in_degree_iter() 和 G.pred.items()。

这里有一些时间

In [41]: %timeit t1=[n for n,k in G.out_degree_iter() if k==2]
1000 loops, best of 3: 368 us per loop

In [42]: %timeit s2=[n for n,nbrs in G.succ.items() if len(nbrs)==2]
1000 loops, best of 3: 198 us per loop

python - 在有向图 networkx 的大型网络实例上最快的迭代是多少？

2 回答 2

Related

Reference