我正在编写一个 JAVA 中继代理服务,它充当浏览器和互联网之间的中间件。它的目的是只查看从浏览器传递的 Web 请求和对浏览器的响应,然后离线解析这些响应。
我的 JAVA 代理在特定套接字上侦听来自浏览器的连接。当新连接出现时,它会读取浏览器请求标头,识别要连接的主机,创建与主机的连接并传递浏览器请求。解析浏览器请求和中继服务器响应的代码是下面给出的 streamHTTPData() 方法。在代码中,debugOut 是标准的 System.out。
该代码适用于大部分网站,但一些网站出现了一个奇怪的问题,我无法查看主页。当我在 Google 搜索上随机跟踪链接并遇到一个论坛时,我注意到了这种情况。我为 Firefox 浏览器使用了 HTTPFOX 扩展,并注意到浏览器向 JAVA 程序发送的请求以及从那里发送到 Web 服务器的请求完全相同。但是,我在不使用 JAVA 中间盒时收到 HTTP 200 响应,否则收到 HTTP 404。我不确定问题是什么。谁能指出我正确的方向。HTTPFOX 捕获的 HTTP 请求和响应如下所示。
private int streamHTTPData(InputStream in, OutputStream out,StringBuffer host, StringBuffer url, boolean waitForDisconnect) {
// get the HTTP data from an InputStream, and send it to
// the designated OutputStream
StringBuffer header = new StringBuffer("");
String data = "";
int responseCode = 200;
int contentLength = 0;
int pos = -1;
int byteCount = 0;
try {
// get the first line of the header, so we know the response code
data = readLine(in);
if (data != null) {
header.append(data + "\r\n");
pos = data.indexOf(" ");
if ((data.toLowerCase().startsWith("http")) && (pos >= 0)
&& (data.indexOf(" ", pos + 1) >= 0)) {
String rcString = data.substring(pos + 1,
data.indexOf(" ", pos + 1));
try {
responseCode = Integer.parseInt(rcString);
} catch (Exception e) {
if (debugLevel > 0)
debugOut.println("Error parsing response code "
+ rcString);
}
} else {
if ((pos >= 0) && (data.indexOf(" ", pos + 1) >= 0)) {
String suffix = data.substring(pos + 1,
data.indexOf(" ", pos + 1));
url.setLength(0);
url.append(suffix.trim());
}
}
}
// get the rest of the header info
while ((data = readLine(in)) != null) {
// the header ends at the first blank line
if (data.length() == 0)
break;
header.append(data + "\r\n");
// check for the Host header
pos = data.toLowerCase().indexOf("host:");
if (pos >= 0) {
host.setLength(0);
host.append(data.substring(pos + 5).trim());
}
// check for the Content-Length header
pos = data.toLowerCase().indexOf("content-length:");
if (pos >= 0)
contentLength = Integer.parseInt(data.substring(pos + 15)
.trim());
}
// add a blank line to terminate the header info
header.append("\r\n");
// convert the header to a byte array, and write it to our stream
out.write(header.toString().getBytes(), 0, header.length());
System.out.println(header.toString());
// if the header indicated that this was not a 200 response,
// just return what we've got if there is no Content-Length,
// because we may not be getting anything else
if ((responseCode != 200) && (contentLength == 0)) {
out.flush();
return header.length();
}
// get the body, if any; we try to use the Content-Length header to
// determine how much data we're supposed to be getting, because
// sometimes the client/server won't disconnect after sending us
// information...
if (contentLength > 0)
waitForDisconnect = false;
if ((contentLength > 0) || (waitForDisconnect)) {
try {
byte[] buf = new byte[4096];
int bytesIn = 0;
while (((byteCount < contentLength) || (waitForDisconnect))
&& ((bytesIn = in.read(buf)) >= 0)) {
out.write(buf, 0, bytesIn);
out.flush();
byteCount += bytesIn;
}
} catch (Exception e) {
String errMsg = "Error getting HTTP body: " + e;
if (debugLevel > 0)
debugOut.println(errMsg);
}
}
} catch (Exception e) {
if (debugLevel > 0)
debugOut.println("Error getting HTTP data: " + e);
}
// flush the OutputStream and return
try {
out.flush();
} catch (Exception e) {
}
return (header.length() + byteCount);
}
HTTP 请求(带和不带中间盒):
(Request-Line) GET / HTTP/1.1
Host andhrawatch.com
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip, deflate
Proxy-Connection keep-alive
没有 JAVA 中间盒的 HTTP 响应:
(Status-Line) HTTP/1.1 200 OK
Date Fri, 27 Jul 2012 03:51:38 GMT
Server Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By PHP/5.3.1
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control post-check=0, pre-check=0
Pragma no-cache
Set-Cookie 0f486952816b6d6bf53a4c34b724b278=c68edaebc6dedb2b291832dfbfb784fc; path=/
Last-Modified Fri, 27 Jul 2012 03:51:38 GMT
Keep-Alive timeout=5, max=100
Connection Keep-Alive
Transfer-Encoding chunked
Content-Type text/html; charset=utf-8
带有 JAVA 中间盒的 HTTP 响应
(Status-Line) HTTP/1.1 404 Component not found
Date Fri, 27 Jul 2012 03:54:39 GMT
Server Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
X-Powered-By PHP/5.3.1
P3P CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Expires Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control post-check=0, pre-check=0
Pragma no-cache
Set-Cookie 0f486952816b6d6bf53a4c34b724b278=33806d89181aa6d488ccba1b9163e511; path=/
Last-Modified Fri, 27 Jul 2012 03:54:39 GMT
Transfer-Encoding chunked
Content-Type text/html; charset=utf-8