c# - WebClient 没有从提供的 URL 下载正确的文件

Question

我想从 Linux 发行版下载 .torrent 文件，但由于某种原因，从我的应用程序下载的最终文件与手动下载的文件不同。我的应用下载的那个有 31KB 并且它是一个无效的 .torrent 文件，而正确的一个（当我手动下载时）有 41KB 并且它是有效的。

我要下载的文件的 URL 是http://torcache.net/torrent/C348CBCA08288AE07A97DD641C5D09EE25299FAC.torrent

为什么会这样？我如何下载相同的文件（有效文件，41KB）？

谢谢。

下载上述文件的方法中的 C# 代码：

        string sLinkTorCache = @"http://torcache.net/torrent/C348CBCA08288AE07A97DD641C5D09EE25299FAC.torrent";
        using (System.Net.WebClient wc = new System.Net.WebClient())
        {
            var path = @"D:\Baixar automaticamente"; // HACK Pegar isso dos settings na versão final
            var data = Helper.Retry(() => wc.DownloadData(sLinkTorCache), TimeSpan.FromSeconds(3), 5);
            string fileName = null;

            // Try to extract the filename from the Content-Disposition header
            if (!string.IsNullOrEmpty(wc.ResponseHeaders["Content-Disposition"]))
            {
                fileName = wc.ResponseHeaders["Content-Disposition"].Substring(wc.ResponseHeaders["Content-Disposition"].IndexOf("filename=") + 10).Replace("\"", "");
            }

            var torrentPath = Path.Combine(path, fileName ?? "Arch Linux Distro");

            if (File.Exists(torrentPath))
            {
                File.Delete(torrentPath);
            }

            Helper.Retry(() => wc.DownloadFile(new Uri(sLinkTorCache), torrentPath), TimeSpan.FromSeconds(3), 5);
        }

Helper.Retry（在 HTTP Exceptions 的情况下尝试再次执行该方法）：

    public static void Retry(Action action, TimeSpan retryInterval, int retryCount = 3)
    {
        Retry<object>(() =>
        {
            action();
            return null;
        }, retryInterval, retryCount);
    }

    public static T Retry<T>(Func<T> action, TimeSpan retryInterval, int retryCount = 3)
    {
        var exceptions = new List<Exception>();

        for (int retry = 0; retry < retryCount; retry++)
        {
            try
            {
                if (retry > 0)
                    System.Threading.Thread.Sleep(retryInterval); // TODO adicionar o Using pro thread
                return action();
            }
            catch (Exception ex)
            {
                exceptions.Add(ex);
            }
        }

        throw new AggregateException(exceptions);
    }

score 1 · Accepted Answer

我最初认为如果站点认为这是来自机器人的请求（也就是说，它正在检查某些标头），它会以垃圾邮件响应。在查看Fiddler之后- 似乎返回的数据对于 Web 浏览器和代码都是完全相同的。这意味着，我们没有正确地缩小（提取）响应。Web 服务器压缩数据（使用 gzip 之类的东西）是很常见的。WebClient不会自动缩小数据。

使用来自通过 WebClient.DownloadData 自动解压缩 gzip 响应的答案- 我设法让它正常工作。

另请注意，您要下载该文件两次。你不需要这样做。

工作代码：

//Taken from above linked question
class MyWebClient : WebClient
{
    protected override WebRequest GetWebRequest(Uri address)
    {
        HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
        request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
        return request;
    }
}

并使用它：

string sLinkTorCache = @"http://torcache.net/torrent/C348CBCA08288AE07A97DD641C5D09EE25299FAC.torrent";
using (var wc = new MyWebClient())
{
  var path = @"C:\Junk";
  var data = Helper.Retry(() => wc.DownloadData(sLinkTorCache), TimeSpan.FromSeconds(3), 5);
  string fileName = "";

  var torrentPath = Path.Combine(path, fileName ?? "Arch Linux Distro.torrent");

  if (File.Exists(torrentPath))
      File.Delete(torrentPath);

    File.WriteAllBytes(torrentPath, data);
}

c# - WebClient 没有从提供的 URL 下载正确的文件

1 回答 1

Related

Reference