java - extracting indivdiual bibo:Articles from RDF document

Question

I have a RDF/XML document with this format:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:ags="http://purl.org/agmes/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dct="http://purl.org/dc/terms/">
  <bibo:Article rdf:about="http://xxxxx/NO8500391">
    <dct:identifier>NO8500391</dct:identifier>
    ...
  </bibo:Article>
  <bibo:Article rdf:about="http://xxxxx/NO8500523">
    ...
  </bibo:Article>
  <bibo:Article rdf:about="http://xxxxx/NO8500496">
  ...
  </bibo:Article>
</rdf:RDF>

As you can see, in a single RDF/XML file, there are many bibo:Articles, could be thousands. What I want is to extract each article and convert it to RDF/JSON (I know how to write a model) using Apache Jena, so I can have a separate document for each article, and later import them all to a index like CouchDB or Elasticsearch to perform searches.

How can I extract each article in the model (Jena)? The dirty way that I was thinking is to process the file as XML and extract each bibo:Article element.

score 1 · Accepted Answer

首先，我可以要求澄清一下你的问题吗？我想你要问的是把每个bibo:Article条目分成自己的文档吗？

顺便说一句，这与拆分每个第一级节点不同，因为 RDF/XML 不是规范的序列化，即同一个 RDF 可能被多个不同的 RDF/XML 文档序列化，并且不能保证它们总是一级节点。

现在尝试回答您的问题，有两种主要方法可以实现您的目标。

使用 SPARQL 查询

首先发出一个SELECT查询来检索所有文章实例，然后为每个结果发出一个DESCRIBE关于文章 URI 的查询，这将为您提供一个新的 Jena 模型，其中仅包含有关该 URI 的信息。

请注意，如果您愿意，您可以DESCRIBE通过创建自定义来准确更改查询方式，DescribeHandler但这可能是矫枉过正。

然后，您可以将每个DESCRIBE查询的结果序列化到一个新文档中。

使用模型 API

使用该listStatements()方法（采用搜索条件的重载）首先查找文章，然后类似于第一个解决方案问题，进一步调用每个发现的文章 URI 以查找有关它的语句。这些可以添加到新模型中并根据需要进行序列化。

java - extracting indivdiual bibo:Articles from RDF document

1 回答 1

使用 SPARQL 查询

使用模型 API

Related

Reference