0

我创建了一个程序,将链接(从网页)下载到 htm 文件中。我希望做的是测试 htm 文件中的每个链接并输出任何损坏的链接。不幸的是,并非所有下载的链接都以“http://”开头,所以我试图通过使用 if 语句来避免这个问题。如何将所有链接读入一个数组,然后使用异步 Web 请求和响应循环遍历该数组。

private async void button4_Click(object sender, EventArgs e)
    {
        string text =  System.IO.File.ReadAllText(@"C:\\Users\\Conal_Curran\\OneDrive\\C#\\MyProjects\\Web Crawler\\URLTester\\OP.htm");

        List<string> stringlist = new List<string>();
        stringlist.Add(text);


        if (!text.StartsWith("http://"))
        {

            foreach (string line in stringlist)
            {
                var request = WebRequest.Create(text);
                var response = (HttpWebResponse)await Task.Factory
                .FromAsync<WebResponse>(request.BeginGetResponse,     request.EndGetResponse, null);

                Debug.Assert(response.StatusCode == HttpStatusCode.OK);

                if (response == null)
                {
                    BrokenLinks.Text = text;
                }
                else
                {
                    BrokenLinks.Text = "All URLS Are OK";
                }
            }
        }

正则表达式解析 html 文件:

string text = System.IO.File.ReadAllText(@"C:\\Users\\Conal_Curran\\OneDrive\\C#\\MyProjects\\Web Crawler\\URLTester\\OP.htm");

        string regex = "href=\"(.*)\"";
        Match match = Regex.Match(text, regex);
        if (match.Success)
        {
            string link = match.Groups[1].Value;
            Console.WriteLine(link);

            MessageBox.Show("Going over URLS now Please stand by.");
            var request = WebRequest.Create(link);
            var response = (HttpWebResponse)await Task.Factory
                .FromAsync<WebResponse>(request.BeginGetResponse, request.EndGetResponse, null);

            Debug.Assert(response.StatusCode == HttpStatusCode.OK);

            if (response == null)
            {
                BrokenLinks.Text = text;
                label2.ForeColor = System.Drawing.Color.Red;
            }
            else
            {
                BrokenLinks.Text = "All URLS Are OK";
                label2.ForeColor = System.Drawing.Color.Green;
            }


        }
4

1 回答 1

0

我认为这段代码应该让你走上正确的道路。显然,只有当您正在阅读的文件是一个带有一行链接的 txt 文件时,这才有效。

var lines = File.ReadLines(fileName);//this reads the file one l
    foreach (var line in lines){
        if (text.StartsWith("http://")){
            //execute your request, since it looks like a valid link
        } else {
        //in this the case url dosn't start with http:// if you want to check it just add http:// to the beginning of the string, otherwise don't do anything.
        }
    }

如果您想检查链接是否有效,请参阅答案。我希望这可以帮助你。

于 2016-01-07T15:12:53.040 回答