2

我建立了一个小型 RDF 模型:它只包含几个三元组,描述了人类基因组上的一些项目。

我只想保留那些与某些基因组片段(比如“基因”)重叠的项目,这些项目存储在另一个关系数据库中。这个基因数据库太大了,无法插入到我的初始 RDF 模型中。

有没有办法扩展 ARQ 以在查询期间在我的模型中注入一些新的语句(描述与项目重叠的唯一基因的 RDF 语句)?

输入:

uri:object1  my:hasChromosome "chr1" .
uri:object1  my:hasStartPosition "1235689887" .
uri:object1  my:hasEndPosition "2897979879" .
uri:object1  dc:title "my variation" .

输出:

uri:object1  my:hasChromosome "chr1" .
uri:object1  my:hasStartPosition "1235689887" .
uri:object1  my:hasEndPosition "2897979879" .
uri:object1  dc:title "my variation" .
uri:gene1  dc:title "GeneName" .

我读过关于http://jena.sourceforge.net/ARQ/arq-query-eval.html但我迷路了:我应该选择哪种扩展机制?财产 ?网络上有更完整的例子吗?

谢谢,

4

2 回答 2

2

这里的细节有点薄。从简单开始,使用自定义函数。这将允许您在外部查找FILTERs或使用BIND检索值。

对于更新,您可能需要考虑SPARQL Update

最后你说

我只想保留那些与某些基因组片段(比如“基因”)重叠的项目,这些项目存储在另一个关系数据库中。

所以也许是这样的:

PREFIX my: <...>
PREFIX f:  <java:com.example.DBFunctions.>

DELETE { ?missing ?p ?o } # Purge the non-overlapping objects
WHERE {
    ?missing my:hasChromosome ?chr ; 
             my:hasStartPosition ?start ;
             my:hasEndPosition ?end .
    FILTER (!f:overlaps(?chr, ?start, ?end)) # true if not overlapping
}

好的,我猜在这里,但我希望这会有所帮助。

于 2012-09-19T19:23:52.663 回答
2

You have two datastores. One a small dataset in a Jena in memory Model, and a large set of gene related data in a relational database. You want to write a sparql query as if the large set of data is local without actually importing it. (The actual data transformation you want to do is a bit vague.)

In SPARQL 1.1 you can do this using the SERVICE keyword between sparql endpoints. To be able to use your relational database of gene data as a SPARQL endpoint you need a SPARQL to SQL translator such as D2RQ or convert the data to RDF and load it into a general purpose SPARQL capable triple-store.

Once the gene data is available in a SPARQL endpoint.

PREFIX my: <...>
PREFIX f:  <java:com.example.DBFunctions.>

INSERT { ?missing a my:Gene } # mark a region as a gene
WHERE {
    ?missing my:hasChromosome ?chr ; 
         my:hasStartPosition ?start ;
         my:hasEndPosition ?end .
    SERVICE<http://localhost:????/gene_data/sparql>{
       ?gene a my:Gene .
         my:hasStartPosition ?gStart ;
         my:hasEndPosition ?gEnd .
       #Detect overlap.
       FILTER( !(?start > ?gEnd || ?end < ?gStart) ) .
    }
}

The other option is to do the filter as @user205512 shows by using a custom function. Where the filter java code uses JDBC to connect to the relational database.

于 2012-09-24T14:31:50.723 回答