0

这是代码:

for (int x = 0; x < imagesSatelliteUrls.Count; x++)
{
    if (!imagesSatelliteUrls[x].StartsWith("http://"))
    {
        imagesSatelliteUrls[x] = stringForSatelliteMapUrls + imagesSatelliteUrls[x];
    }

    using (WebClient client = new WebClient())
    {
        if (!imagesSatelliteUrls[x].Contains("href"))
        {
            client.DownloadFile(imagesSatelliteUrls[x],
                                UrlsDir + "SatelliteImage" + counter.ToString("D6"));
        }
    }

    counter++;
}

它将逐个文件下载。List imagesSatelliteUrls 包含按组排序的 260 个文件链接。

例如:

index[0] "Group 1"
index[1] some link ....
index[2] some link ....
.
.
.
index[34] "Group 2"
index[35] some link ....
index[36] some link ....
.
.
.
.
index[71] "Group 3"

依此类推,有 7 个组。我希望它从每个组下载第一个文件,这意味着下载并行 7 个文件。Group 1 的第一个文件 2 3 4 5 6 7 然后,如果其中一个文件在任何组中完成,它将开始从该组下载下一个文件。

所以我会看到每秒有 7 个文件下载,每个文件来自另一个组。一个文件在某个组中完成下载,它应该移动到同一组中的下一个并开始下载。

我该怎么做 ?由于我现在使用的这个 client.DownloadFile 只会逐个文件下载文件。

尝试下载并行:

这是代码:

Parallel.For(0, imagesSatelliteUrls.Count, /*new ParallelOptions { MaxDegreeOfParallelism = 20 },*/ x =>
            {
                if (!imagesSatelliteUrls[x].StartsWith("http://"))
                {
                    imagesSatelliteUrls[x] = stringForSatelliteMapUrls + imagesSatelliteUrls[x];
                }

                using (WebClient client = new WebClient())
                {
                    if (!imagesSatelliteUrls[x].Contains("href"))
                    {
                        client.DownloadFile(imagesSatelliteUrls[x],
                                            UrlsDir + "SatelliteImage" + counter.ToString("D6"));
                    }
                }

                counter++;
            }); // end of Paralle

例外是:

System.Net.WebException was unhandled by user code
  HResult=-2146233079
  Message=An exception occurred during a WebClient request.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at WeatherMaps.ExtractImages.<>c__DisplayClass2.<.ctor>b__0(Int32 x) in d:\C-Sharp\WeatherMaps\WeatherMaps\WeatherMaps\ExtractImages.cs:line 145
       at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()
  InnerException: System.IO.IOException
       HResult=-2147024864
       Message=The process cannot access the file 'd:\localpath\Urls\SatelliteImage000000' because it is being used by another process.
       Source=mscorlib
       StackTrace:
            at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
            at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
            at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
            at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       InnerException: 

我也试过这段代码:

for (int i = 0; i < 7; i++)
            {
                Task.Factory.StartNew(() =>
                {
                    // Here you can easily implement your checking algo as you see fit
                    while (counter < imagesSatelliteUrls.Count)
                    {
                        if (!imagesSatelliteUrls[count].StartsWith("http://"))
                        {
                            imagesSatelliteUrls[count] = stringForSatelliteMapUrls + imagesSatelliteUrls[count];
                        }
                        using (WebClient client = new WebClient())
                        {
                            if (!imagesSatelliteUrls[count].Contains("href"))
                            {

                                client.DownloadFile(imagesSatelliteUrls[count], UrlsDir + "SatelliteImage" + counter.ToString("D6"));
                            }
                        }

                        lock (this)
                        {
                            count++;
                            counter++;
                        }
                    }
                });
            }


System.Net.WebException was unhandled by user code
  HResult=-2146233079
  Message=An exception occurred during a WebClient request.
  Source=System
  StackTrace:
       at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       at System.Net.WebClient.DownloadFile(String address, String fileName)
       at WeatherMaps.ExtractImages.<>c__DisplayClass4.<.ctor>b__2() in d:\C-Sharp\WeatherMaps\WeatherMaps\WeatherMaps\ExtractImages.cs:line 122
       at System.Threading.Tasks.Task.InnerInvoke()
       at System.Threading.Tasks.Task.Execute()
  InnerException: System.IO.IOException
       HResult=-2147024864
       Message=The process cannot access the file 'd:\localpath\Urls\SatelliteImage000000' because it is being used by another process.
       Source=mscorlib
       StackTrace:
            at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
            at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
            at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
            at System.Net.WebClient.DownloadFile(Uri address, String fileName)
       InnerException: 
4

2 回答 2

1

使用 Parallel.For

//for (int x = 0; x < imagesSatelliteUrls.Count; x++)
Parallel.For(0, imagesSatelliteUrls.Count, /*new ParallelOptions { MaxDegreeOfParallelism = 20 },*/ x =>
{
    if (!imagesSatelliteUrls[x].StartsWith("http://"))
    {
        imagesSatelliteUrls[x] = stringForSatelliteMapUrls + imagesSatelliteUrls[x];
    }

    using (WebClient client = new WebClient())
    {
        if (!imagesSatelliteUrls[x].Contains("href"))
        {
            client.DownloadFile(imagesSatelliteUrls[x],
                                UrlsDir + "SatelliteImage" + x.ToString("D6"));
        }
    }

    counter++;
}); // end of Parallel.For
于 2013-10-28T20:26:05.983 回答
0

System.Net.Http.dll我创建了一个独立的示例,说明如果您添加对类的引用并使用该类,您将如何做到这一点HttpClient

// Create a mock list of data
string someImageUrl = "..."; // some test url of an image file
string urlsDirectory = @"C:\Temp"; // some working directory

var urls = new string[7 * 20];

for (int i = 0; i < urls.Length; i += 7)
{
    urls[i] = String.Format("Group {0}", (i / 7) + 1);

    for (int j = 1; j < 7; j++)
    {
        urls[i + j] = someImageUrl;
    }
}


// Download 6 files at a time.
var client = new HttpClient();

for (int i = 0; i < urls.Length; i += 7)
{
    var directoryPath = Directory.CreateDirectory(Path.Combine(urlsDirectory, urls[i])).FullName;

    var tasks = urls.Skip(i + 1).Take(6).Select(url =>
    {
        return client.GetAsync(url);
    }).ToArray();

    Task.WaitAll(tasks);

    for (int j = 0; j < tasks.Length; j++)
    {
        var response = tasks[j].Result;

        using (var fs = new FileStream(Path.Combine(directoryPath, String.Format("Image {0}.jpg", j + 1)), FileMode.OpenOrCreate))
        {
            using (var responseStream = response.Content.ReadAsStreamAsync().Result)
            {
                responseStream.CopyTo(fs);
            }
        }
    }
}

需要注意的重要一点是,我认为您丢失了一些 WebClient 的自动文件名协商。这是值得的,但您可以在我的示例中看到我只是将图像标记为“Image 1.jpg”、“Image 2.jpg”等。

从技术上讲,当通过 HTTP 请求文件时,您可以请求具有如下 URL 的图像:

http://somehost.com/getImage?id=5

在这种情况下,甚至很难说文件名应该是什么。处理这个问题的 HTTP 标准方法是添加一个名为的标头Content-Disposition,它告诉 HTTP 客户端文件的名称应该是什么。

但并非每个Web 服务器都会为您提供 Content-Disposition 标头,因此您需要回退到尝试将上述 URL 解析为与 Windows 兼容的文件名。您可以尝试找到一个简单的函数来去除所有非 NTFS 兼容字符的 URL。但请记住,在这种情况下,您不会获得扩展名(jpg、gif 等)。服务器可能会给您一个Content-Type标头来告诉您 MIME 类型,例如“image/jpeg”,但由您决定要给它的扩展名。

于 2013-10-28T20:24:51.417 回答