c# - 链接检查器；如何避免误报

Question

我正在使用链接检查器/断开的链接查找器，我收到了很多误报，在仔细检查后我注意到许多错误代码正在返回 webexceptions，但它们实际上是可下载的，但在其他一些情况下，状态码是 404，我可以从浏览器访问页面。

所以这里是代码，它非常丑陋，id 喜欢有更多的东西，id 说实用。如果用于过滤我不想添加到断开链接的状态代码，所有状态代码都很大，因为它们是有效链接（我对它们都进行了测试）。我需要修复的是结构（如果可能的话）以及如何不得到错误的 404。

谢谢！

try
{
   HttpWebRequest request = ( HttpWebRequest ) WebRequest.Create ( uri );
   request.Method = "Head";
   request.MaximumResponseHeadersLength = 32; // FOR IE SLOW SPEED
   request.AllowAutoRedirect = true;
   using ( HttpWebResponse response = ( HttpWebResponse ) request.GetResponse() )
   {
      request.Abort();
   }
   /* WebClient wc = new WebClient();
     wc.DownloadString( uri ); */

   _validlinks.Add ( strUri );
}
catch ( WebException wex )
{
   if (    !wex.Message.Contains ( "The remote name could not be resolved:" ) &&
           wex.Status != WebExceptionStatus.ServerProtocolViolation )
   {
      if ( wex.Status != WebExceptionStatus.Timeout )
      {
         HttpStatusCode code = ( ( HttpWebResponse ) wex.Response ).StatusCode;
         if (
            code != HttpStatusCode.OK &&
            code != HttpStatusCode.BadRequest &&
            code != HttpStatusCode.Accepted &&
            code != HttpStatusCode.InternalServerError &&
            code != HttpStatusCode.Forbidden &&
            code != HttpStatusCode.Redirect &&
            code != HttpStatusCode.Found
         )
         {
            _brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
         }
         else _validlinks.Add ( strUri );
      }
      else _brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
   }
   else _validlinks.Add ( strUri );
}

score 1 · Accepted Answer

1

您应该添加一个 UserAgent 标头，因为许多网站都需要它们。

于 2010-06-10T14:59:51.983 回答

c# - 链接检查器；如何避免误报

1 回答 1

Related

Reference