您可以根据需要使用JTidy。
代码示例:
public void downloadSinglePage(String pageLink, String targetDir) throws XPathExpressionException, IOException {
URL url = new URL(pageLink);
BufferedInputStream page = new BufferedInputStream(url.openStream());
Tidy tidy = new Tidy();
tidy.setQuiet(true);
tidy.setShowWarnings(false);
Document response = tidy.parseDOM(page, null);
XPathFactory factory = XPathFactory.newInstance();
XPath xPath=factory.newXPath();
NodeList nodes = (NodeList)xPath.evaluate(IMAGE_PATTERN, response, XPathConstants.NODESET);
String imageURL = (String) nodes.item(0).getNodeValue();
saveImageNIO(imageURL, targetDir);
}
在哪里
IMAGE_PATTERN = "///a/img/@src";
但模式取决于图像如何嵌入到页面 HTML 代码中。
使用NIO保存图像的方法:
public void saveImageNIO(String imageURL, String targetDir, String imageName) throws IOException {
URL url = new URL(imageURL);
ReadableByteChannel rbc = Channels.newChannel(url.openStream());
FileOutputStream fos = new FileOutputStream(targetDir + "/" + imageName + ".jpg");
fos.getChannel().transferFrom(rbc, 0, 1 << 24);
}