-1

我已经在网上挖掘了一段时间,没有找到可以帮助我解决问题的代码示例。我查看了示例代码,但我仍然没有“得到”它......

我已阅读,

http://msdn.microsoft.com/en-us/library/aa480507.aspx
http://msdn.microsoft.com/en-us/library/dd781401.aspx

但我似乎无法让它工作..

我正在使用 HTMLAGILITYPACK

今天我最多提出 20 个网络请求,

请求完成后,结果被添加到字典中,然后一个方法搜索它的信息,如果发现代码退出,如果没有,它会发出另一个 webrequest,直到它达到 20。我需要能够异步退出所有线程找到所有内容后调用。

它是这样的

public void FetchAndParseAllPages()
    {
        PageFetcher fetcher = new PageFetcher();
        for (int i = 0; i < _maxSearchDepth; i += _searchIncrement)
        {
            string keywordNsearch = _keyword + i;
            ParseHtmldocuments(fetcher.GetWebpage(keywordNsearch));
            //this checks if the information was found or not, if 
            //found stop exit and add to database

            if (GetPostion() != 201)
            {   //ADD DATA TO DATABASE
                InsertRankingData(DocParser.GetSearchResults(), _theSearchedKeyword);
                return;
            }
        }
    }

这是在获取页面的类中

    public HtmlDocument GetWebpage(string urlToParse)
    {

        System.Net.ServicePointManager.Expect100Continue = false;
        HtmlWeb htmlweb = new HtmlWeb();
        htmlweb.PreRequest = new   HtmlAgilityPack.HtmlWeb.PreRequestHandler(OnPreRequest);
        HtmlDocument htmldoc = htmlweb.Load(@"urlToParse", "38.69.197.71", 45623, "PORXYUSER", "PROXYPASSWORD");

        return htmldoc;       
    }

    public bool OnPreRequest(HttpWebRequest request)
    {
       // request.UserAgent = RandomUseragent();
        request.KeepAlive = false;
        request.Timeout = 100000;
        request.ReadWriteTimeout = 1000000; 
        request.ProtocolVersion = HttpVersion.Version10;
        return true; // ok, go on
    }

我怎样才能使这个异步并通过线程使它变得非常快?或者我什至应该在异步执行时使用线程?

4

1 回答 1

0

好的,我解决了!至少我是这么认为的!执行时间下降到大约七秒。在没有异步的情况下,我花了大约 30 秒来做到这一点。

这是我的代码以供将来参考。编辑我使用控​​制台项目来测试代码。我也在使用 html agilitypack。这是我的做法,任何关于如何进一步优化的提示都会很酷。

    public delegate HtmlDocument FetchPageDelegate(string url);

    static void Main(string[] args)
    {
        System.Net.ServicePointManager.DefaultConnectionLimit = 10;
        FetchPageDelegate del = new FetchPageDelegate(FetchPage);
        List<HtmlDocument> htmllist = new List<HtmlDocument>();
        List<IAsyncResult> results = new List<IAsyncResult>();
        List<WaitHandle> waitHandles = new List<WaitHandle>();

        DateTime start = DateTime.Now;
        for(int i = 0; i < 200; i += 10)
        {
            string url = @"URLSTOPARSE YOU CHANGE IT HERE READ FROM LIST OR ANYTHING";
            IAsyncResult result = del.BeginInvoke(url, null, null);
            results.Add(result);
            waitHandles.Add(result.AsyncWaitHandle);
        }

        WaitHandle.WaitAll(waitHandles.ToArray());

        foreach (IAsyncResult async in results)
        {   
            FetchPageDelegate delle = (async as AsyncResult).AsyncDelegate as FetchPageDelegate;
            htmllist.Add(delle.EndInvoke(async));
        }
        Console.ReadLine();

    }

    static HtmlDocument FetchPage(string url)
    {
        HtmlWeb htmlweb = new HtmlWeb();
        HtmlDocument htmldoc = htmlweb.Load(url);
        return htmldoc; 
    }
于 2012-11-05T13:25:33.237 回答