我正在使用 Sesame 通过 SPARQL 查询 RDF。我使用大文件(2GB、10GB)并随后进行了几次查询。在处理如此大的文件期间,我收到错误java.lang.OutOfMemoryError: Java heap space。我使用参数-Xmx3g运行我的应用程序,但对于这些文件似乎还不够。也许我应该在每次查询后关闭存储库?
有我的代码:
void runQuery() {
try {
con = repo.getConnection();
TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, queryString);
TupleQueryResult result = tupleQuery.evaluate();
while (result.hasNext()) {
result.next();
}
result.close();
con.close();
} catch (Exception e) {
...
}
}
}
runTests() {
File dataDir = new File("RepoDir/");
repo = new SailRepository(new NativeStore(dataDir));
repo.initialize();
...
for (int j = 0; j < NUMBER_OF_QUERIES; ++j) {
queryString = queries.get(j);
runQuery();
}
...
repo.shutDown();
}
另外,对于这么大的文件,是否可以使用 MemoryStore 而不是 NativeStore?
发出错误的查询示例:
SELECT DISTINCT ?name1 ?name2
WHERE {
?article1 rdf:type bench:Article .
?article2 rdf:type bench:Article .
?article1 dc:creator ?author1 .
?author1 foaf:name ?name1 .
?article2 dc:creator ?author2 .
?author2 foaf:name ?name2 .
?article1 swrc:journal ?journal .
?article2 swrc:journal ?journal
FILTER (?name1<?name2)
}