我是 Java 新手,我想知道如何解析网页中的数据(比如说 Google 中特定研究的结果标题)。我尝试实现以下异步任务只是为了查看是否可以检索数据:
private class DownloadTask extends AsyncTask<String, Void, Void> {
protected Void doInBackground(String... urls) {
HtmlCleaner parser = new HtmlCleaner();
URL url;
try{
url = new URL("https://www.google.com/search?q=java&rct=j&gws_rd=cr&ei=VsWJVcycOMaisAH4wISABA");
URLConnection conn = url.openConnection();
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
XPathExpression expr = xpath.compile("//*[@id=\"rso\"]/div[2]/li[1]/div/h3/a");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
Log.d(TAG,nodes.item(i).getNodeValue());
}
}
catch (SAXException e) {
Log.d(TAG, "\necc1");
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
Log.d(TAG, "\necc2");
e.printStackTrace();
} catch (IOException e) {
Log.d(TAG, "\necc3");
e.printStackTrace();
} catch (ParserConfigurationException e) {
Log.d(TAG, "\necc4");
e.printStackTrace();
} catch (XPathExpressionException e) {
Log.d(TAG, "\necc5");
e.printStackTrace();
}
return null;
}
}
但我一直有这样的 SAXParseExcpetion:
org.xml.sax.SAXParseException: Unexpected <! (position:START_DOCUMENT null@1:1 in java.io.InputStreamReader@425a7e80)
参考线:Document doc = builder.parse(conn.getInputStream());
. 我想文档创建中可能存在错误,但我真的不知道如何处理它。