3

由于试图回答jar files 的 Graph isomorphism问题,自然引起了关于如何使用 Python 将 jar 文件表示为图形的争论。

问题:给定一个 jar 文件,读取其中包含的文件并将内容表示为 (a) 数据结构和 (b) 图形,它们都适合进一步研究和操作,例如,例如,使用另一个 jar 文件评估同构。在图中,目录树应该是根节点和分支节点,以文件作为叶子节点。

为了标准化答案,我使用从这个 OpenProcessing 草图verletphysics.jar下载的文件。

4

1 回答 1

8

解决方案

鉴于 jar 文件基本上是压缩档案,使用Python 标准库中zipfile模块来读取内容并准备 jar 内容关系的文本和图形表示。

文本表示

对于verletphysics.jar问题中提到的文件,下面的代码会生成以下内容列表:

META-INF/
META-INF/MANIFEST.MF
toxi/
toxi/physics/
toxi/physics/behaviors/
toxi/physics/constraints/
toxi/physics2d/
toxi/physics2d/behaviors/
toxi/physics2d/constraints/
toxi/physics/ParticlePath.class
toxi/physics/ParticleString.class
toxi/physics/PullBackString.class
toxi/physics/VerletConstrainedSpring.class
toxi/physics/VerletMinDistanceSpring.class
toxi/physics/VerletParticle.class
toxi/physics/VerletPhysics.class
toxi/physics/VerletSpring.class
toxi/physics/behaviors/AttractionBehavior.class
toxi/physics/behaviors/ConstantForceBehavior.class
toxi/physics/behaviors/GravityBehavior.class
toxi/physics/behaviors/ParticleBehavior.class
toxi/physics/constraints/AxisConstraint.class
toxi/physics/constraints/BoxConstraint.class
toxi/physics/constraints/CylinderConstraint.class
toxi/physics/constraints/MaxConstraint.class
toxi/physics/constraints/MinConstraint.class
toxi/physics/constraints/ParticleConstraint.class
toxi/physics/constraints/PlaneConstraint.class
toxi/physics/constraints/SoftBoxConstraint.class
toxi/physics/constraints/SphereConstraint.class
toxi/physics2d/ParticlePath2D.class
toxi/physics2d/ParticleString2D.class
toxi/physics2d/PullBackString2D.class
toxi/physics2d/VerletConstrainedSpring2D.class
toxi/physics2d/VerletMinDistanceSpring2D.class
toxi/physics2d/VerletParticle2D.class
toxi/physics2d/VerletPhysics2D.class
toxi/physics2d/VerletSpring2D.class
toxi/physics2d/behaviors/AttractionBehavior.class
toxi/physics2d/behaviors/ConstantForceBehavior.class
toxi/physics2d/behaviors/GravityBehavior.class
toxi/physics2d/behaviors/ParticleBehavior2D.class
toxi/physics2d/constraints/AngularConstraint.class
toxi/physics2d/constraints/AxisConstraint.class
toxi/physics2d/constraints/CircularConstraint.class
toxi/physics2d/constraints/MaxConstraint.class
toxi/physics2d/constraints/MinConstraint.class
toxi/physics2d/constraints/ParticleConstraint2D.class
toxi/physics2d/constraints/RectConstraint.class
verletphysics.mf

钥匙

上述路径名中的每个节点都被提取出来,并由代码赋予一个唯一的 id,如下所示:

 Index  File
     0  behaviors
     1  BoxConstraint.class
     2  MaxConstraint.class
     3  VerletParticle.class
     4  ParticleConstraint2D.class
     5  ConstantForceBehavior.class
     6  META-INF
     7  VerletMinDistanceSpring2D.class
     8  AxisConstraint.class
     9  AttractionBehavior.class
    10  physics2d
    11  VerletPhysics.class
    12  PullBackString.class
    13  VerletSpring.class
    14  VerletConstrainedSpring.class
    15  ParticleString2D.class
    16  verletphysics.mf
    17  ParticleBehavior2D.class
    18  ParticleString.class
    19  RectConstraint.class
    20  CylinderConstraint.class
    21  toxi
    22  VerletMinDistanceSpring.class
    23  VerletSpring2D.class
    24  VerletParticle2D.class
    25  ParticlePath2D.class
    26  CircularConstraint.class
    27  ParticlePath.class
    28  MinConstraint.class
    29  MANIFEST.MF
    30  ParticleConstraint.class
    31  GravityBehavior.class
    32  VerletPhysics2D.class
    33  SoftBoxConstraint.class
    34  ParticleBehavior.class
    35  VerletConstrainedSpring2D.class
    36  PlaneConstraint.class
    37  PullBackString2D.class
    38  SphereConstraint.class
    39  physics
    40  AngularConstraint.class
    41  constraints

图表

路径名被翻译成使用NetworkX构建到该网络中并使用matplotlib绘制的边。

jar文件内容的网络图

编码

import zipfile
import networkx as nx
import matplotlib.pyplot as plt

# Download the code from
# http://www.openprocessing.org/sketch/46757
# Unzip and find the jar file: verletphysics.jar
# This example uses that file for demo

def get_edges(fName):
    edges = []
    nodes = []

    jar = zipfile.ZipFile(fName, "r")
    for name in jar.namelist():
        print name # prints the list of files in the jar
        if name.endswith('/'): name = name[:-1]
        parts = name.split('/')
        nodes.extend( parts )
        if len(parts) > 1:
            edges += zip(nodes[:-1], nodes[1:]) 

    nodes = set(nodes)
    nodes = dict( zip(nodes, range(len(nodes)) ) )
    edges = [ (nodes[ edge[0] ], nodes[ edge[1] ])
              for edge in edges ]
    nodes = [ (index, label) for label, index in nodes.iteritems() ]
    nodes = sorted( nodes, key = lambda node: node[0] )
    return set( edges ), nodes

if __name__ == '__main__':
    fName = 'verletphysics.jar'
    edges, nodes = get_edges(fName)

    # print list of nodes
    # serving as a key to the graph
    print '%10s  %s' % ('Index', 'File')
    for node in nodes:
        print '%10s  %s' % (node[0], node[1])

    # Plot the network graph 
    G = nx.Graph()
    G.add_edges_from( edges )
    nx.draw_networkx(G, pos=nx.spring_layout(G))
    plt.axis('off')
    plt.show()
于 2012-06-11T18:29:12.437 回答