python - How to parse a DOT file in Python

Question

I have a transducer saved in the form of a DOT file. I can see a graphical representation of the graphs using gvedit, but what if I want to convert the DOT file to an executable transducer, so that I can test the transducer and see what strings it accepts and what it doesn't.

In most of the tools I have seen in Openfst, Graphviz, and their Python extensions, DOT files are only used to create a graphical representation, but what if I want to parse the file to get an interactive program where I can test the strings against the transducer?

Are there any libraries out there that would do the task or should I just write it from scratch?

As I said, the DOT file is related to a transducer I have designed that simulates morphology of English. It is a huge file, but just to give you an idea of how it is like, I provide a sample. Let's say I want to create a transducer that would model the behavior of English with regards to Nouns and in terms of plurality. My lexicon consists of only three words (book, boy, girl). My transducer in this case would look something like this:

enter image description here

which is directly constructed from this DOT file:

digraph A {
rankdir = LR;
node [shape=circle,style=filled] 0
node [shape=circle,style=filled] 1
node [shape=circle,style=filled] 2
node [shape=circle,style=filled] 3
node [shape=circle,style=filled] 4
node [shape=circle,style=filled] 5
node [shape=circle,style=filled] 6
node [shape=circle,style=filled] 7
node [shape=circle,style=filled] 8
node [shape=circle,style=filled] 9
node [shape=doublecircle,style=filled] 10
0 -> 4 [label="g "];
0 -> 1 [label="b "];
1 -> 2 [label="o "];
2 -> 7 [label="y "];
2 -> 3 [label="o "];
3 -> 7 [label="k "];
4 -> 5 [label="i "];
5 -> 6 [label="r "];
6 -> 7 [label="l "];
7 -> 9 [label="<+N:s> "];
7 -> 8 [label="<+N:0> "];
8 -> 10 [label="<+Sg:0> "];
9 -> 10 [label="<+Pl:0> "];
}

Now testing this transducer against the words means that if you feed it with book+Pl it should spit back books and vice versa. I'd like to see how it is possible to turn the dot file into a format that would allow such analysis and testing.

score 3 · Accepted Answer

您可以先使用https://code.google.com/p/pydot/加载文件。从那里开始，编写代码以根据输入字符串遍历内存图应该相对简单。

score 3 · Accepted Answer

安装graphviz库。然后尝试以下操作：

import graphviz
graphviz.Source.from_file('graph4.dot')

score 2 · Accepted Answer

dot另一个路径，以及在文件中查找循环的简单方法：

import pygraphviz as pgv
import networkx as nx

gv = pgv.AGraph('my.dot', strict=False, directed=True)
G = nx.DiGraph(gv)

cycles = nx.simple_cycles(G)
for cycle in cycles:
    print(cycle)

score 1 · Accepted Answer

使用它在 python 中加载 .dot 文件：

graph = pydot.graph_from_dot_file(apath)

# SHOW as an image
import tempfile, Image
fout = tempfile.NamedTemporaryFile(suffix=".png")
graph.write(fout.name,format="png")
Image.open(fout.name).show()

score 0 · Accepted Answer

我还没有尝试使用上面的示例，但是NetworkX有一个read_dot函数可能是解决这个问题的好方法，方法是将文件转换为具有良好能力的图形对象，然后分析和测试图形。

score 0 · Accepted Answer

Guillaume 的回答足以在 Spyder (3.3.2) 中呈现图形，这可能会解决一些人的问题。

如果你真的需要像 OP 那样操作图表，那将会有点复杂。部分问题在于 Graphviz 是一个图形渲染库，而您正在尝试分析图形。您尝试做的类似于从 PDF 文件对 Word 或 LateX 文档进行逆向工程。

如果您可以假设 OP 示例的良好结构，那么正则表达式就可以工作。我喜欢的一句格言是，如果你用正则表达式解决了一个问题，那么现在你有两个问题。尽管如此，对于这些情况，这可能只是最实际的做法。

以下是要捕获的表达式：

您的节点信息：r"node.*?=(\w+).*?\s(\d+)". 捕获组是种类和节点标签。
你的边缘信息：r"(\d+).*?(\d+).*?\"(.+?)\s"。捕获组是源、接收器和边缘标签。

要轻松试用它们，请参阅https://regex101.com/r/3UKKwV/1/和https://regex101.com/r/Hgctkp/2/。

python - How to parse a DOT file in Python

6 回答 6

Related

Reference