我正在研究链接检查器,通常我可以执行HEAD
请求,但是有些网站似乎禁用了这个动词,所以在失败时我还需要执行GET
请求(仔细检查链接是否真的死了)
我使用以下代码作为我的链接测试器:
public class ValidateResult
{
public HttpStatusCode? StatusCode { get; set; }
public Uri RedirectResult { get; set; }
public WebExceptionStatus? WebExceptionStatus { get; set; }
}
public ValidateResult Validate(Uri uri, bool useHeadMethod = true,
bool enableKeepAlive = false, int timeoutSeconds = 30)
{
ValidateResult result = new ValidateResult();
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
if (useHeadMethod)
{
request.Method = "HEAD";
}
else
{
request.Method = "GET";
}
// always compress, if you get back a 404 from a HEAD it can be quite big.
request.AutomaticDecompression = DecompressionMethods.GZip;
request.AllowAutoRedirect = false;
request.UserAgent = UserAgentString;
request.Timeout = timeoutSeconds * 1000;
request.KeepAlive = enableKeepAlive;
HttpWebResponse response = null;
try
{
response = request.GetResponse() as HttpWebResponse;
result.StatusCode = response.StatusCode;
if (response.StatusCode == HttpStatusCode.Redirect ||
response.StatusCode == HttpStatusCode.MovedPermanently ||
response.StatusCode == HttpStatusCode.SeeOther)
{
try
{
Uri targetUri = new Uri(Uri, response.Headers["Location"]);
var scheme = targetUri.Scheme.ToLower();
if (scheme == "http" || scheme == "https")
{
result.RedirectResult = targetUri;
}
else
{
// this little gem was born out of http://tinyurl.com/18r
// redirecting to about:blank
result.StatusCode = HttpStatusCode.SwitchingProtocols;
result.WebExceptionStatus = null;
}
}
catch (UriFormatException)
{
// another gem... people sometimes redirect to http://nonsense:port/yay
result.StatusCode = HttpStatusCode.SwitchingProtocols;
result.WebExceptionStatus = WebExceptionStatus.NameResolutionFailure;
}
}
}
catch (WebException ex)
{
result.WebExceptionStatus = ex.Status;
response = ex.Response as HttpWebResponse;
if (response != null)
{
result.StatusCode = response.StatusCode;
}
}
finally
{
if (response != null)
{
response.Close();
}
}
return result;
}
这一切都很好,花花公子。除了当我执行GET
请求时,整个有效载荷都会被下载(我在wireshark中看过这个)。
有没有办法配置底层ServicePoint
或根本HttpWebRequest
不缓冲或急切加载响应体?
(如果我手动编码,我会将 TCP 接收窗口设置得非常低,然后只抓取足够的数据包来获取标头,一旦我有足够的信息就停止响应 TCP 数据包。)
对于那些想知道这意味着什么的人,我不想在获得 404 时下载 40k 404,这样做几十万次在网络上是昂贵的