c# - 使用 C# 阅读和发布到网页

Question

我有一个工作项目，要求我能够在网页中输入信息，阅读我被重定向到的下一页，然后采取进一步的行动。一个简化的现实世界示例类似于访问 google.com，输入“编码技巧”作为搜索条件，然后阅读结果页面。

像http://www.csharp-station.com/HowTo/HttpWebFetch.aspx链接的小编码示例告诉如何阅读网页，但不告诉如何通过将信息提交到表单并继续与它进行交互到下一页。

郑重声明，我不是在构建恶意和/或垃圾邮件相关产品。

那么，我该如何阅读需要先正常浏览几个步骤才能到达的网页呢？

score 5 · Accepted Answer

您可以以编程方式创建 Http 请求并检索响应：

 string uri = "http://www.google.com/search";
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";

        // encode the data to POST:
        string postData = "q=searchterm&hl=en";
        byte[] encodedData = new ASCIIEncoding().GetBytes(postData);
        request.ContentLength = encodedData.Length;

        Stream requestStream = request.GetRequestStream();
        requestStream.Write(encodedData, 0, encodedData.Length);

        // send the request and get the response
        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        {

            // Do something with the response stream. As an example, we'll
            // stream the response to the console via a 256 character buffer
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                Char[] buffer = new Char[256];
                int count = reader.Read(buffer, 0, 256);
                while (count > 0)
                {
                    Console.WriteLine(new String(buffer, 0, count));
                    count = reader.Read(buffer, 0, 256);
                }
            } // reader is disposed here
        } // response is disposed here

当然，此代码将返回错误，因为 Google 使用 GET 而不是 POST 进行搜索查询。

如果您正在处理特定的网页，则此方法将起作用，因为 URL 和 POST 数据基本上都是硬编码的。如果你需要一些更有活力的东西，你必须：

捕获页面
去掉表格
根据表单字段创建 POST 字符串

FWIW，我认为 Perl 或 Python 之类的东西可能更适合这类任务。

编辑：x-www-form-urlencoded

score 3 · Accepted Answer

你可以试试Selenium。使用 Selenium IDE 记录 Firefox 中的操作，以 C# 格式保存脚本，然后使用 Selenium RC C# 包装器播放它们。正如其他人所提到的，您也可以使用System.Net.HttpWebRequest或System.Net.WebClient。如果这是一个桌面应用程序，另请参见System.Windows.Forms.WebBrowser。

附录：类似于基于 Java 的 Selenium IDE 和 Selenium RC，WatiN 测试记录器和WatiN是基于 .NET 的。

score 2 · Accepted Answer

您需要做的是不断检索和分析链中每个页面的 html 源代码。对于每个页面，您需要弄清楚表单提交的样子，并发送一个与之匹配的请求以获取链中的下一页。

我所做的是构建一个包含 System.Net.HttpWebRequest/HttpWebResponse 的自定义类，因此检索页面就像使用 System.Net.WebClient 一样简单。但是，我的自定义类还在请求中保留了相同的 cookie 容器，这使得发送发布数据、自定义用户代理等变得更容易一些。

score 0 · Accepted Answer

根据网站的工作方式，您可以操纵 url 来执行您想要的操作。例如，要搜索“beatles”这个词，您可以向 google.com?q=beetles 打开一个请求，然后阅读结果。

或者，如果网站不使用查询字符串值 (url) 来处理页面操作，那么您将需要处理一个 webrequest，它将所需的值发布到网站。在 Google 中搜索使用 WebRequest 和 webresponse。

c# - 使用 C# 阅读和发布到网页

4 回答 4

Related

Reference