0

我尝试解析此链接:http ://agent.bronni.ru/Result.aspx?id=c7a6a33a-174e-426d-b127-828ee612c36e&account=27178&page=1&pageSize=50&mr=true

但我无法获得结果表,因为正如我在提琴手中看到的那样,有带有 json 结果的延迟加载方法。

我的代码是:

HtmlWeb hw = new HtmlWeb(); HtmlDocument doc = hw.Load("http://agent.bronni.ru/Result.aspx?id=c7a6a33a-174e-426d-b127-828ee612c36e&account=27178&page=1&pageSize=50&mr=true");

    // Get all tables in the document
    HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");

    // Iterate all rows in the first table
    HtmlNodeCollection rows = tables[0].SelectNodes(".//tr");

    var data = rows.Skip(1).ToList().Take(10).ToList().Select(x => new TableRow()
    {
        Price = x.SelectNodes(".//td").ToList()[4].InnerText,
        Operator = x.SelectNodes(".//td").ToList()[15].InnerText,
        DepartureDate = x.SelectNodes(".//td").ToList()[6].InnerText,
        DestinationRegion = x.SelectNodes(".//td").ToList()[7].InnerText
    }).ToList();

更新 第二个站点:代码

 WebClient wc = new WebClient();
        wc.Headers.Add("Referer", "http://sletat.ru/");//MUST BE THIS HEADER
        string result = wc.DownloadString("http://module.sletat.ru/Main.svc/GetTours?cityFromId=832&countryId=35&cities=&meals=&stars=&hotels=&s_adults=1&s_kids=0&s_kids_ages=&s_nightsMin=6&s_nightsMax=16&s_priceMin=0&s_priceMax=&currencyAlias=RUB&s_departFrom=25%2F06%2F2012&s_departTo=31%2F07%2F2012&visibleOperators=&s_hotelIsNotInStop=true&s_hasTickets=true&s_ticketsIncluded=true&debug=0&filter=0&f_to_id=&requestId=19198631&pageSize=20&pageNumber=1&updateResult=1&includeDescriptions=1&includeOilTaxesAndVisa=1&userId=&jskey=1&callback=_jqjsp&_1340633427022=");
        result = result.Substring(result.IndexOf("{"), result.LastIndexOf("}") - result.IndexOf("{") + 1);
        JavaScriptSerializer js = new JavaScriptSerializer();
        dynamic json = js.DeserializeObject(result);
        var prices = json["GetToursResult"]["Data"]["aaData"] as object[];
        // var operators = ((object[])json["result"]["prices"]).Cast<Dictionary<string, object>>();
        var temp = prices.ToList().Take(20).Select(x => new TableRow
        {
            Operator = (x as object[])[40].ToString(),
            //Price = x["operatorPrice"].ToString(),
            //DepartureDate = x["checkinDate"].ToString(),
            //DestinationRegion = ((Dictionary<string, object>)x["country"])["englishName"].ToString()
        }).ToList();

        string str = "";

        foreach (var tableRow in temp)
        {
            str += tableRow.Operator + "<br />";
        }
        Response.Write(str);

通过这种方式,我尝试一切正常,但问题是这个链接工作了大约 30 分钟,然后我需要再次放置其他链接。有什么办法可以解决这个问题吗?(只有第二个站点有它)再次感谢,

4

1 回答 1

0

数据真的来自这里:

http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=3&pageSize=50&_=134013175663

除了page=#pageSize=#可以动态调整。

因此,您可以从 URL 获取 JSON 数据并对其进行解析,而不是解析 HTML。例如:

WebClient wc = new WebClient();
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=1000&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json =  js.DeserializeObject(result);
var prices = ((object[])json["result"]["prices"]).Cast<Dictionary<string,object>>();
var data = from p in prices 
           select new
{
  OperatorID = p["operatorID"],
  Price = p["operatorPrice"],
  Country = ((Dictionary<string,object>)p["country"])["englishName"],
  CheckinDate = p["checkinDate"]
};

Console.WriteLine(data);

在我的 LinqPad 程序上,产生如下内容:

OperatorID Price Country CheckinDate 
0          1,27  Greece  2012-06-28 
0          55,90 Greece  2012-06-28 
0          67,34 Greece  2012-06-28 

还有更多行,具体取决于您要多少...

注意:该行的原因result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);是jsonp结果开头有这个垃圾:

jQuery17207647891761735082_1340131755603({"

以 which 结尾})会使 JavascriptSerializer 在尝试解析它时窒息;因此需要删除它。

更新:

有趣的是,返回数据的 ASHX 处理程序似乎Referer在请求中需要一个 Header;否则,响应将不包含操作员信息。所需的推荐人不能是您想要的任何东西,似乎它实际上正在http://agent.bronni.ru特别寻找。

基本上,您需要做的就是以下几点:

WebClient wc = new WebClient();
wc.Headers.Add("Referer","http://agent.bronni.ru");//MUST BE THIS HEADER
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=1000&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JavaScriptSerializer js = new JavaScriptSerializer();
dynamic json =  js.DeserializeObject(result);
var prices = ((object[])json["result"]["prices"]).Cast<Dictionary<string,object>>();
var data = from p in prices 
           select new
{
  OperatorID = p["operatorID"],
  Price = p["operatorPrice"],
  Country = ((Dictionary<string,object>)p["country"])["englishName"],
  Hotel = ((Dictionary<string,object>)p["hotel"])["englishName"],
  Operator = ((Dictionary<string,object>)p["operator"])["englishName"],//OPERATOR
  CheckinDate = p["checkinDate"]
};

OperatorID Price Country Hotel                           Operator          CheckinDate 
19681      1,27  Greece  Julia Hotel                     Mouzenidis Travel 2012-06-28 
19681      1,27  Greece  Forest Park                     Mouzenidis Travel 2012-06-28 
19681      1,27  Greece  Kassandra Mare (ï-îâ Êàññàíäðà) Mouzenidis Travel 2012-06-28 

更新 2:

我决定比较开箱即用的 Javascriptserializer 与JSON.NET 序列化器的性能,并且在我使用不同记录大小 (50,1000,3000) 的所有测试中,JSON.NET 至少比 Javascriptserializer 快两倍,并且在在较小的记录集上,某些情况甚至快 10 倍。

如果您决定使用 JSON.NET 库,以下代码将为您提供与上述代码相同的结果:

WebClient wc = new WebClient();
wc.Headers.Add("Referer","http://agent.bronni.ru");
string result =wc.DownloadString("http://beta.remote.bronni.ru/LazyLoading.ashx/getResult?jsonp=jQuery17207647891761735082_1340131755603&id=c7a6a33a-174e-426d-b127-828ee612c36e&page=1&pageSize=50&_=1340131756631");
result = result.Substring(result.IndexOf("{"),result.LastIndexOf("}")-result.IndexOf("{")+1);
JObject o = JObject.Parse(result);
var data = from x in o["result"]["prices"]
select new
 {
  OperatorID = x["operatorID"],
  Price = x["operatorPrice"],
  Country = x["country"]["englishName"],
  Hotel = x["hotel"]["englishName"],
  Operator = x["operator"]["englishName"],
  CheckinDate = x["checkinDate"]
};

Console.WriteLine(data);
于 2012-06-19T18:55:53.127 回答