1

我正在尝试从 Google https 下载 Web 内容,如下面的链接所示。

下载链接

使用下面的代码,我首先出于测试目的禁用证书验证并信任所有证书,然后将网络作为常规 http 下载,但由于某种原因,它没有成功:

public static void downloadWeb() {
        // Create a new trust manager that trust all certificates
        TrustManager[] trustAllCerts = new TrustManager[] { new X509TrustManager() {
            public java.security.cert.X509Certificate[] getAcceptedIssuers() {
                return null;
            }

            public void checkClientTrusted(
                    java.security.cert.X509Certificate[] certs, String authType) {
            }

            public void checkServerTrusted(
                    java.security.cert.X509Certificate[] certs, String authType) {
            }
        } };

    // Activate the new trust manager
        try {
            SSLContext sc = SSLContext.getInstance("SSL");
            sc.init(null, trustAllCerts, new java.security.SecureRandom());
            HttpsURLConnection
                    .setDefaultSSLSocketFactory(sc.getSocketFactory());
        } catch (Exception e) {}

            //begin download as regular http
        try {
            String wordAddress = "https://www.google.com/webhp?hl=en&tab=ww#hl=en&tbs=dfn:1&sa=X&ei=obxCUKm7Ic3GqAGvoYGIBQ&ved=0CDAQBSgA&q=pronunciation&spell=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&fp=c5bfe0fbd78a3271&biw=1024&bih=759";
            URLConnection yc = new URL(wordAddress).openConnection();
            BufferedReader in = new BufferedReader(new InputStreamReader(
                    yc.getInputStream()));
            String inputLine = "";
            while ((inputLine = in.readLine()) != null) {
                System.out.println(wordAddress);
            }

        } catch (IOException e) {}

    }
4

1 回答 1

1

您需要伪造 HTTP 标头,以便 Google 认为您是从 Web 浏览器下载它。这是使用HttpClient的示例代码:

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;

public class App1 {

    public static void main(String[] args) throws IOException {
        HttpClient httpclient = new DefaultHttpClient();
        HttpGet httpget = new HttpGet("http://_google_url_");
        httpget.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0");
        HttpResponse execute = httpclient.execute(httpget);
        File file = new File("google.html");
        FileOutputStream fout = null;
        try {
            fout = new FileOutputStream(file);
            execute.getEntity().writeTo(fout);
        } finally {
            if (fout != null) {
                fout.close();
            }
        }
    }
}

警告,如果您使用此代码并违反 Google 的服务条款协议,我概不负责。

于 2012-09-02T01:09:38.057 回答