application/rdf-xml
我编写了以下代码,用于从内容类型为链接数据应用程序的网页中提取 URI 。
public static void test(String url) {
try {
Model read = ModelFactory.createDefaultModel().read(url);
System.out.println("to go");
StmtIterator si;
si = read.listStatements();
System.out.println("to go");
while(si.hasNext()) {
Statement s=si.nextStatement();
Resource r=s.getSubject();
Property p=s.getPredicate();
RDFNode o=s.getObject();
System.out.println(r.getURI());
System.out.println(p.getURI());
System.out.println(o.asResource().getURI());
}
}
catch(JenaException | NoSuchElementException c) {}
}
但是对于输入
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ex="http://example.org/stuff/1.0/">
<rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar"
dc:title="RDF/XML Syntax Specification (Revised)">
<ex:editor>
<rdf:Description ex:fullName="Dave Beckett">
<ex:homePage rdf:resource="http://purl.org/net/dajobe/" />
</rdf:Description>
</ex:editor>
</rdf:Description>
</rdf:RDF>
输出是:
Subject URI is http://www.w3.org/TR/rdf-syntax-grammar
Predicate URI is http://example.org/stuff/1.0/editor
Object URI is null
Subject URI is http://www.w3.org/TR/rdf-syntax-grammar
Predicate URI is http://purl.org/dc/elements/1.1/title
Website is read
我需要在输出中包含该页面上存在的所有 URI,以便为 RDF 页面构建网络爬虫。我需要输出中的所有以下链接:
http://www.w3.org/TR/rdf-syntax-grammar
http://example.org/stuff/1.0/editor
http://purl.org/net/dajobe
http://example.org/stuff/1.0/fullName
http://www.w3.org/TR/rdf-syntax-grammar
http://purl.org/dc/elements/1.1/title