java - 检索网页的内容

Question

我想获取网页并将内容保存为字符串？有图书馆可以做到这一点吗？我想将字符串用于我正在构建的一些程序。它适用于不一定提供 rss 提要的网站。

score 3 · Accepted Answer

我想你需要这个

URL url = new URL("http://www.google.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = null; // con.getContentEncoding(); *** WRONG: should use "con.getContentType()" instead but it returns something like "text/html; charset=UTF-8" so this value must be parsed to extract the actual encoding
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);

score 1 · Accepted Answer

1

我可以推荐 JSoup 吗？

Document doc = Jsoup.connect("www.google.com").get();

于 2013-07-03T16:03:37.797 回答

score 0 · Accepted Answer

您可以使用Apache HttpComponents

    CloseableHttpClient httpclient = HttpClients.createDefault();
    HttpGet httpget = new HttpGet("http://www.google.gr");
    try (CloseableHttpResponse response = httpclient.execute(httpget)) { 
        HttpEntity entity = response.getEntity();
        if (entity != null) {
           System.out.println(EntityUtils.toString(entity));
        }
        response.close();
    } catch (IOException ex) {
        Logger.getLogger(HttpClient.class.getName()).log(Level.SEVERE, null, ex);
    }

java - 检索网页的内容

3 回答 3

Related

Reference