java - 使用 JSoup 将这个 url: http://www.aw20.co.uk/images/logo.png 的内容保存到文件中

Question

我尝试使用 JSoup 获取此 url http://www.aw20.co.uk/images/logo.png的内容，即图像 logo.png，并将其保存到文件中。到目前为止，我已经使用 JSoup 连接到http://www.aw20.co.uk并获取文档。然后我找到了我正在寻找的图像的绝对网址，但现在不知道如何获得实际图像。所以我希望有人能指出我这样做的正确方向吗？还有无论如何我可以使用 Jsoup.connect("http://www.aw20.co.uk/images/logo.png").get(); 得到图像？

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class JGet2 {

public static void main(String[] args) {

    try {
        Document doc = Jsoup.connect("http://www.aw20.co.uk").get();

        Elements img = doc.getElementsByTag("img");

        for (Element element : img) {
            String src = element.absUrl("src");

            System.out.println("Image Found!");
            System.out.println("src attribute is: " + src);
            if (src.contains("logo.png") == true) {
                System.out.println("Success");     
            }
            getImages(src);
        }
    } 

    catch (IOException e) {
        e.printStackTrace();
    }
}

private static void getImages(String src) throws IOException {

    int indexName = src.lastIndexOf("/");

    if (indexName == src.length()) {
        src = src.substring(1, indexName);
    }

    indexName = src.lastIndexOf("/");
    String name = src.substring(indexName, src.length());

    System.out.println(name);
}
}

score 10 · Accepted Answer

如果您不想将其解析为 HTML，则可以使用 Jsoup 获取任何 URL 并将数据作为字节获取。例如：

byte[] bytes = Jsoup.connect(imgUrl).ignoreContentType(true).execute().bodyAsBytes();

ignoreContentType(true)设置是因为否则 Jsoup 将抛出内容不可 HTML 解析的异常——在这种情况下没关系，因为我们bodyAsBytes()用于获取响应正文，而不是解析。

检查Jsoup 连接 API以获取更多详细信息。

score 5 · Accepted Answer

Jsoup 不是为下载 url 的内容而设计的。

由于您可以使用第三方库，因此您可以尝试使用apache common IO将给定 URL 的内容下载到文件中：

FileUtils.copyURLToFile(URL source, File destination);

它只有一条线。

score 1 · Accepted Answer

这种方法效果不好。使用时请小心。

byte[] bytes = Jsoup.connect(imgUrl).ignoreContentType(true).execute().bodyAsBytes();

score 1 · Accepted Answer

您可以使用这些方法或这些方法的一部分来解决您的问题。注意：IMAGE_HOME 是绝对路径。例如 /home/你的名字/文件夹名

public static String storeImageIntoFS(String imageUrl, String fileName, String relativePath) {
    String imagePath = null;
    try {
        byte[] bytes = Jsoup.connect(imageUrl).ignoreContentType(true).execute().bodyAsBytes();
        ByteBuffer buffer = ByteBuffer.wrap(bytes);
        String rootTargetDirectory = IMAGE_HOME + "/"+relativePath;
        imagePath = rootTargetDirectory + "/"+fileName;
        saveByteBufferImage(buffer, rootTargetDirectory, fileName);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return imagePath;
}

public static void saveByteBufferImage(ByteBuffer imageDataBytes, String rootTargetDirectory, String savedFileName) {
   String uploadInputFile = rootTargetDirectory + "/"+savedFileName;

   File rootTargetDir = new File(rootTargetDirectory);
   if (!rootTargetDir.exists()) {
       boolean created = rootTargetDir.mkdirs();
       if (!created) {
           System.out.println("Error while creating directory for location- "+rootTargetDirectory);
       }
   }
   String[] fileNameParts = savedFileName.split("\\.");
   String format = fileNameParts[fileNameParts.length-1];

   File file = new File(uploadInputFile);
   BufferedImage bufferedImage;

   InputStream in = new ByteArrayInputStream(imageDataBytes.array());
   try {
       bufferedImage = ImageIO.read(in);
       ImageIO.write(bufferedImage, format, file);
   } catch (IOException e) {
       e.printStackTrace();
   }

}

score 0 · Accepted Answer

还有无论如何我可以使用 Jsoup.connect("http://www.aw20.co.uk/images/logo.png").get(); 得到图像？

不，JSoup 只会获取文本等，但不能用于下载文件或二进制数据。话虽如此，只需使用您通过 JSoup 获得的文件名和路径，然后使用标准 Java I/O 下载文件。

我使用 NIO 进行下载。IE，

     String imgPath = // ... url path to image
     String imgFilePath = // ... file path String

     URL imgUrl;
     ReadableByteChannel rbc = null;
     FileOutputStream fos = null;
     try {
        imgUrl = new URL(imgPath);
        rbc = Channels.newChannel(imgUrl.openStream());
        fos = new FileOutputStream(imgFilePath);
        // setState(EXTRACTING + imgFilePath);
        fos.getChannel().transferFrom(rbc, 0, 1 << 24);

     } catch (MalformedURLException e) {
        e.printStackTrace();
     } catch (FileNotFoundException e) {
        e.printStackTrace();
     } catch (IOException e) {
        e.printStackTrace();
     } finally {
        if (rbc != null) {
           try {
              rbc.close();
           } catch (IOException e) {
              e.printStackTrace();
           }
        }
        if (fos != null) {
           try {
              fos.close();
           } catch (IOException e) {
              e.printStackTrace();
           }
        }
     }

java - 使用 JSoup 将这个 url: http://www.aw20.co.uk/images/logo.png 的内容保存到文件中

5 回答 5

Related

Reference