python - 使用 Networkx 读取 Pajek 分区文件（.clu 格式）

Question

我正在尝试使用 NetworkX python 库读取 pajek 分区文件（换句话说，它是一个 .clu 文件），但我不知道该怎么做。我可以使用 read_pajek 方法读取 pajek 网络（.net 格式），但我没有找到读取 .clu 文件的方法。

非常感谢！

score 0 · Accepted Answer

.clu 文件遵循以下格式：

第一行：*顶点 NUMBER_OF_VERTICES
第二行：顶点0的分区
第三行：顶点 1 的分区

依此类推，直到所有 NUMBER_OF_VERTICES 都被定义到一个分区中

从networkx（https://networkx.github.io/documentation/stable/reference/algorithms/community.html）读取社区检测算法networkx中的首选格式是可迭代的（即列表或元组）分组顶点数每个分区，例如：

[[0, 1, 2, 3, 4], [5], [6, 7, 8, 9, 10]]

这意味着第一个分区由顶点 0、1、2、3 和 4 组成。

因此，读取 .clu 文件是将文件转换为该结构的任务。

我在https://networkx.github.io/documentation/networkx-1.10/_modules/networkx/readwrite/pajek.html#read_pajek获取了 read_pajek 函数并将其转换为一个工作 read_pajek_clu 函数（您需要从集合中导入 defaultdict ）。

def parse_pajek_clu(lines):
    """Parse Pajek format partition from string or iterable.
    Parameters
    ----------
    lines : string or iterable
       Data in Pajek partition format.
    Returns
    -------
    communities (generator) – Yields sets of the nodes in each community.
    See Also
    --------
    read_pajek_clu()
    """
    if isinstance(lines, str):
        lines = iter(lines.split('\n'))
    lines = iter([line.rstrip('\n') for line in lines])

    labels = []  # in the order of the file, needed for matrix
    while lines:
        try:
            l = next(lines)
        except:  # EOF
            break
        if l.lower().startswith("*vertices"):
            l, nnodes = l.split()
            communities = defaultdict(list)
            for vertice in range(int(nnodes)):
                l = next(lines)
                community = int(l)
                communities.setdefault(community, []).append(vertice)
        else:
            break

    return [ v for k,v in dict(communities).items() ]

您可以在存储库中查看一个工作示例：

https://github.com/joaquincabezas/networkx_pajek_util

此外，一旦你有了分区，使用 Paul Broderson 的这个想法来绘制它是一个好的开始：

如何用networkx绘制社区

我希望这有帮助！

python - 使用 Networkx 读取 Pajek 分区文件（.clu 格式）

1 回答 1

Related

Reference