plotly - 读取 pyTextRank 文件

Question

我有一段文本，我希望使用 pytextrank 将其呈现为图表。代码（从源代码复制）是

    import spacy
    nlp = spacy.load("en_core_web_sm")
    import pytextrank
    import graphviz
    tr = pytextrank.TextRank()
    nlp.add_pipe(tr.PipelineComponent, name='textrank', last=True)
    
    line = "the ballistic nuclear threat can be thwarted by building a nuclear shield"
    doc = nlp(line)
    tr.write_dot(path="graph.dot")

"it" 将一些内容写入文件 "graph.dot"。这看起来像一个带有第一个字段“digraph {}”的 json 文件。在这一点上，我迷路了。我如何创建一个漂亮的文本图表（或者根本就没有图表）

谢谢，

安德烈亚斯

使用 ubuntu 20.04.1LTS、python 3.8、pytextrank 2.0.3

score 1 · Accepted Answer

PyTextRank的新在线文档中有更新，特别是请参阅https://derwen.ai/docs/ptr/start/上的“入门”页面以获取示例代码。sample.pyGitHub repo的脚本中也显示了类似的代码。

顺便说一句，最新版本是 3.0.1，它正在跟踪新的spaCy3.x 更新。

这是一个简单的用法：

import spacy
import pytextrank

# example text
text = "the ballistic nuclear threat can be thwarted by building a nuclear shield"

# load a spaCy model, depending on language, scale, etc.
nlp = spacy.load("en_core_web_sm")

# add PyTextRank to the spaCy pipeline
nlp.add_pipe("textrank", last=True)
doc = nlp(text)

# examine the top-ranked phrases in the document
for p in doc._.phrases:
    print("{:.4f} {:5d}  {}".format(p.rank, p.count, p.text))
    print(p.chunks)

输出将是：

0.1712     1  a nuclear shield
[a nuclear shield]
0.1652     1  the ballistic nuclear threat
[the ballistic nuclear threat]

如果您想在或其他读取文件格式的库中可视化引理图，您可以添加：GraphvizDOT

tr = doc._.textrank
tr.write_dot(path="graph.dot")

这会将输出写入"graph.dot"文件。Graphviz有关如何读取和呈现的示例，请参阅文档。

FWIW，我们目前正在研究该kglab库的集成，这将开辟更广泛的图形操作和可视化功能，因为它与许多其他库和文件格式集成。

此外，如果您对如何可视化 PyTextRank 的结果有任何建议或要求，在https://github.com/DerwenAI/pytextrank/issues上创建问题非常有帮助，我们的开发人员社区可以提供帮助更多。

如果我没有正确解释“将文本呈现为图形”，我深表歉意，因为另一种思考方式是使用displaCy依赖可视化器，它在句子中显示标记的语法依赖图。spaCy tuTorial中给出了一个示例。

plotly - 读取 pyTextRank 文件

1 回答 1

Related

Reference