python - 解析大型 NTriples 文件 Python

Question

我正在尝试使用Parse large RDF in Python中的代码解析一个相当大的 NTriples 文件

我为 python 安装了 raptor 和 redland-bindings。

import RDF
parser=RDF.Parser(name="ntriples") #as name for parser you can use ntriples, turtle, rdfxml, ...
model=RDF.Model()
stream=parser.parse_into_model(model,"file:./mybigfile.nt")
for triple in model:
    print triple.subject, triple.predicate, triple.object

但是程序挂起，我怀疑它正试图将整个文件加载到内存中或其他东西，因为它没有立即启动。

有人知道如何解决这个问题吗？

score 2 · Accepted Answer

这很慢，因为您正在读取没有索引的内存存储（RDF.Model() 默认）。所以它变得越来越慢。N-Triples 的解析确实从文件中流出，它永远不会将其全部吸入内存。

有关存储模型的概述，请参阅Redland 存储模块文档。在这里，您可能需要存储type“哈希”和hash-type内存。

s = RDF.HashStorage("abc", options="hash-type='memory'")
model = RDF.Model(s)

（未测试）

python - 解析大型 NTriples 文件 Python

1 回答 1

Related

Reference