xml - 解析器从 RDF/XML 网页中为我的 Java 中的 Web-Crawler 提取 URI

Question

我正在为链接数据构建一个网络爬虫。我通过以下代码区分了 HTML 和 RDF/XML 页面：

public static int checktype(URL url) throws IOException
{
String contentType = ((HttpURLConnection) url.openConnection()).getContentType();
System.out.println("Website is read"); 
int t=0;
if("text/html".equals(contentType)) {t=0;}
else if("application/rdf+xml".equals(contentType)) {t=1;}
System.out.println(contentType);
return t;

}

现在我想用 RDF/XML 数据解析一个网页，以从该页面中提取所有 URI。我能够找到 HTML 解析器，但不能找到链接数据。请进一步帮助我

score 2 · Accepted Answer

您可能最好使用现有的库，例如Apache Any23，它已经带有用于自动区分不同格式的代码和所有格式的解析器。

score 1 · Accepted Answer

1

见耶拿图书馆。它包含一个RDF/XML 解析器。

于 2012-09-21T10:45:36.533 回答

xml - 解析器从 RDF/XML 网页中为我的 Java 中的 Web-Crawler 提取 URI

2 回答 2

Related

Reference