16

有谁知道 Networkx 中三个不同的 pagerank 函数在准确性上的差异?

我有一个包含 1000 个节点和 139732 个边的图,“普通”pagerank函数似乎根本不起作用——除了两个节点之外,所有节点都有相同的 PG,所以我假设这个函数不能正常工作以及大图?

pagerank_numpypagerank_scipy的价值观似乎也比' 的价值观更加分散。这个函数的文档说“这对于小图来说是最快和最准确的”。“小”图是什么意思?

另外,为什么 pagerank_numpy 不允许使用max_itertol参数?

4

1 回答 1

30

Each of the three functions uses a different approach to solving the same problem:

networkx.pagerank() is a pure-Python implementation of the power-method to compute the largest eigenvalue/eigenvector or the Google matrix. It has two parameters that control the accuracy - tol and max_iter.

networkx.pagerank_scipy() is a SciPy sparse-matrix implementation of the power-method. It has the same two accuracy parameters.

networkx.pagerank_numpy() is a NumPy (full) matrix implementation that calls the numpy.linalg.eig() function to compute the largest eigenvalue and eigenvector. That function is an interface to the LAPACK dgeev function which is uses a matrix decomposition (direct) method with no tunable parameters.

All three should produce the same answer (within numerical roundoff) for well-behaved graphs if the tol parameter is small enough and the max_iter parameter is large enough. Which one is faster depends on the size of your graph and how well the power method works on your graph.

In [12]: import networkx as nx

In [13]: G=nx.gnp_random_graph(1000,0.01,directed=True)

In [14]: %timeit nx.pagerank(G,tol=1e-10)
10 loops, best of 3: 157 ms per loop

In [15]: %timeit nx.pagerank_scipy(G,tol=1e-10)
100 loops, best of 3: 14 ms per loop

In [16]: %timeit nx.pagerank(G)
10 loops, best of 3: 137 ms per loop
于 2012-10-24T00:06:46.997 回答