2

我正在编写一个 JAVA 中继代理服务,它充当浏览器和互联网之间的中间件。它的目的是只查看从浏览器传递的 Web 请求和对浏览器的响应,然后离线解析这些响应。

我的 JAVA 代理在特定套接字上侦听来自浏览器的连接。当新连接出现时,它会读取浏览器请求标头,识别要连接的主机,创建与主机的连接并传递浏览器请求。解析浏览器请求和中继服务器响应的代码是下面给出的 streamHTTPData() 方法。在代码中,debugOut 是标准的 System.out。

该代码适用于大部分网站,但一些网站出现了一个奇怪的问题,我无法查看主页。当我在 Google 搜索上随机跟踪链接并遇到一个论坛时,我注意到了这种情况。我为 Firefox 浏览器使用了 HTTPFOX 扩展,并注意到浏览器向 JAVA 程序发送的请求以及从那里发送到 Web 服务器的请求完全相同。但是,我在不使用 JAVA 中间盒时收到 HTTP 200 响应,否则收到 HTTP 404。我不确定问题是什么。谁能指出我正确的方向。HTTPFOX 捕获的 HTTP 请求和响应如下所示。

private int streamHTTPData(InputStream in, OutputStream out,StringBuffer host, StringBuffer url, boolean waitForDisconnect) {
    // get the HTTP data from an InputStream, and send it to
    // the designated OutputStream
    StringBuffer header = new StringBuffer("");
    String data = "";
    int responseCode = 200;
    int contentLength = 0;
    int pos = -1;
    int byteCount = 0;

    try {
        // get the first line of the header, so we know the response code
        data = readLine(in);
        if (data != null) {
            header.append(data + "\r\n");
            pos = data.indexOf(" ");
            if ((data.toLowerCase().startsWith("http")) && (pos >= 0)
                    && (data.indexOf(" ", pos + 1) >= 0)) {
                String rcString = data.substring(pos + 1,
                        data.indexOf(" ", pos + 1));
                try {
                    responseCode = Integer.parseInt(rcString);
                } catch (Exception e) {
                    if (debugLevel > 0)
                        debugOut.println("Error parsing response code "
                                + rcString);
                }
            } else {
                if ((pos >= 0) && (data.indexOf(" ", pos + 1) >= 0)) {
                    String suffix = data.substring(pos + 1,
                            data.indexOf(" ", pos + 1));
                    url.setLength(0);
                    url.append(suffix.trim());
                }
            }
        }

        // get the rest of the header info
        while ((data = readLine(in)) != null) {
            // the header ends at the first blank line
            if (data.length() == 0)
                break;
            header.append(data + "\r\n");

            // check for the Host header
            pos = data.toLowerCase().indexOf("host:");
            if (pos >= 0) {
                host.setLength(0);
                host.append(data.substring(pos + 5).trim());
            }

            // check for the Content-Length header
            pos = data.toLowerCase().indexOf("content-length:");
            if (pos >= 0)
                contentLength = Integer.parseInt(data.substring(pos + 15)
                        .trim());
        }

        // add a blank line to terminate the header info
        header.append("\r\n");

        // convert the header to a byte array, and write it to our stream
        out.write(header.toString().getBytes(), 0, header.length());
        System.out.println(header.toString());
        // if the header indicated that this was not a 200 response,
        // just return what we've got if there is no Content-Length,
        // because we may not be getting anything else
        if ((responseCode != 200) && (contentLength == 0)) {
            out.flush();
            return header.length();
        }

        // get the body, if any; we try to use the Content-Length header to
        // determine how much data we're supposed to be getting, because
        // sometimes the client/server won't disconnect after sending us
        // information...
        if (contentLength > 0)
            waitForDisconnect = false;

        if ((contentLength > 0) || (waitForDisconnect)) {
            try {
                byte[] buf = new byte[4096];
                int bytesIn = 0;
                while (((byteCount < contentLength) || (waitForDisconnect))
                        && ((bytesIn = in.read(buf)) >= 0)) {
                    out.write(buf, 0, bytesIn);
                    out.flush();
                    byteCount += bytesIn;
                }
            } catch (Exception e) {
                String errMsg = "Error getting HTTP body: " + e;
                if (debugLevel > 0)
                    debugOut.println(errMsg);
            }
        }
    } catch (Exception e) {
        if (debugLevel > 0)
            debugOut.println("Error getting HTTP data: " + e);
    }

    // flush the OutputStream and return
    try {
        out.flush();
    } catch (Exception e) {
    }
    return (header.length() + byteCount);
}

HTTP 请求(带和不带中间盒):

(Request-Line)  GET / HTTP/1.1
Host    andhrawatch.com
User-Agent  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1
Accept  text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip, deflate
Proxy-Connection    keep-alive

没有 JAVA 中间盒的 HTTP 响应:

(Status-Line)   HTTP/1.1 200 OK
Date    Fri, 27 Jul 2012 03:51:38 GMT
Server  Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5   mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By    PHP/5.3.1
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control   post-check=0, pre-check=0
Pragma  no-cache
Set-Cookie  0f486952816b6d6bf53a4c34b724b278=c68edaebc6dedb2b291832dfbfb784fc; path=/
Last-Modified   Fri, 27 Jul 2012 03:51:38 GMT
Keep-Alive  timeout=5, max=100
Connection  Keep-Alive
Transfer-Encoding   chunked
Content-Type    text/html; charset=utf-8   

带有 JAVA 中间盒的 HTTP 响应

(Status-Line)   HTTP/1.1 404 Component not found
Date    Fri, 27 Jul 2012 03:54:39 GMT
Server  Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5            mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By    PHP/5.3.1
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control   post-check=0, pre-check=0
Pragma  no-cache
Set-Cookie  0f486952816b6d6bf53a4c34b724b278=33806d89181aa6d488ccba1b9163e511; path=/
Last-Modified   Fri, 27 Jul 2012 03:54:39 GMT
Transfer-Encoding   chunked
Content-Type    text/html; charset=utf-8
4

0 回答 0