在问这个问题之前,我尝试了几种不同的方法,当然还尝试在谷歌上搜索一些方向/答案。我已经检查过 StackOverflow,似乎找不到解决方案。
基本上,我想创建一个工具,例如基于 url 和 xpath 返回数据
URL: http://www.google.co.uk/search?q=wicked+games
XPath: id('rso')/li/div/h3/a
应该返回这些结果
我可以从其他 URL 解析 XML,例如,如果我要获取确切的 XML 文件,例如http://renualsoft.com/jordon/person.xml但是我不确定如何为 google 执行此操作?
我试过这个
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
XPathExpression expr = null;
builder = factory.newDocumentBuilder();
doc = builder.parse("http://www.google.co.uk/search?q=wicked+games");
XPathFactory xFactory = XPathFactory.newInstance();
XPath xpath = xFactory.newXPath();
expr = xpath.compile("id('rso')/li/div/h3/a/@href");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
但是我得到了这个例外
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.google.co.uk/search?q=wicked+games
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1625)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:633)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:189)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:799)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:237)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:300)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:177)
at NewEmptyJUnitTest.query(NewEmptyJUnitTest.java:35)
at NewEmptyJUnitTest.main(NewEmptyJUnitTest.java:77)
Java Result: 1
任何帮助或指导都会非常感谢,我曾尝试在其他地方寻找答案,但就像我说的我找不到任何有用的东西。