0

挣扎了2天。我在 .NET 4.5 winforms 项目中使用 C# 和 HtmlAgilityPack 从网站中提取数据(我要提取的字段是 $ 流和 B/S 比率)。我到达现场(流量:/n/t/t/t;而不是流量 245 M)但我没有任何价值。我不知道为什么当我在网页中看到值时查询时没有得到任何值。想看看其他人是否找到了我的查询结果节点=null的原因。这是查询网页的网址:http: //finance.avafin.com/tradeFlow? type=BS_RATIO&date=06%2F14%2F2013&alertId=0&symbol=spy§orId=0&industryId=0

我使用上面的 url 作为查询。

请注意,我使用了以下方法,但在另一个网页上使用了不同的查询并且它有效,有些东西不适用于当前查询,或者我怀疑当前网页的字段混淆了。

使用方法:

     /// <summary>
        ///     Gets the data.
        /// </summary>
        /// <param name="url"> The URL. </param>
        /// <returns> </returns>
        public List<string> GetFlowData(string url)
        {
            // ('//a[contains(@href, "genre")]')
            // <td class=" sorting_1">137.27B</td>
            //*[@id="tf_data"]/tbody/tr[1]/td[8] // this is the xpath as seen in navigator for first value => I get no value when used as a query  => (nodes = null)
            //*[@id="tf_data"]/tbody/tr[1]/td[9] //  this is the xpath as seen in navigator for second value => I get no value when used as a query => (nodes = null)

// //td[@class=''] => nodes null too


            // I see the b/s ratio node in body but no value /n/ttt instead using [@id='tf_data']/tbody
            var nodes = LoadHtmlDoc(url, "//*[@id='tf_data']/tbody");
            List<string> tickers = new List<string>();
            if (nodes == null)
            {
                return new List<string> { "Ticker not available" };
            }
            int i = 0;
            foreach (var v in nodes)
            {
                i++;

                    MessageBox.Show(v.InnerText + " " + i.ToString());
                //// The placement of the data containing bought/sold ratio
                //if (i == 7)
                //{
                //    tickers.Add(v.InnerText);
                //}
                //// The placement of the data containing $ Flow
                //if (i == 8)
                //{
                //    tickers.Add(CleanFlowData(v.InnerText));
                //}
            }

            return tickers;
        }
4

1 回答 1

0

Page you are querying does not contain any data in table with id th_data. If you will examine page markup, you'll see:

<table cellpadding="0" cellspacing="0" border="0" class="display" id="tf_data">
    <thead>
        <tr height="10">
            <th align="center"></th>
            <th align="center" width="90">CHART</th>
            <th align="left" width="70">SYMBOL</th>
            <th align="left">MARKET CAP</th>
            <th align="right" width="65">PRICE</th>
            <th align="center" width="80">CHANGE</th>
            <th align="right">VOL</th>
            <th align="right">B/S RATIO</th>
            <th align="right" width="80">NET CASH FLOW</th>
        </tr>
    </thead>
    <tbody> <-- empty!
    </tbody>
</table>

All data are added to this table by browser via Java Script after document is loaded (see $(document).ready function). So if you are getting html from that url, there will be no data until browser will run Java Script code. I.e. there is nothing you can parse.

I suggest you to examine script which loads JSON data into page, and simply call same service from your code.


Its out of question scope, but for retrieving data you can use HttpClient class from System.Net.Http assembly. Here is sample of usage (its up to you to analyze how query string should be composed):

HttpClient client = new HttpClient();
client.BaseAddress = new Uri("http://finance.avafin.com");
string url = "data?sEcho=2&iColumns=9&sColumns=&iDisplayStart=0&iDisplayLength=20&mDataProp_0=0&mDataProp_1=1&mDataProp_2=2&mDataProp_3=3&mDataProp_4=4&mDataProp_5=5&mDataProp_6=6&mDataProp_7=7&mDataProp_8=8&sSearch=&bRegex=false&sSearch_0=&bRegex_0=false&bSearchable_0=true&sSearch_1=&bRegex_1=false&bSearchable_1=true&sSearch_2=&bRegex_2=false&bSearchable_2=true&sSearch_3=&bRegex_3=false&bSearchable_3=true&sSearch_4=&bRegex_4=false&bSearchable_4=true&sSearch_5=&bRegex_5=false&bSearchable_5=true&sSearch_6=&bRegex_6=false&bSearchable_6=true&sSearch_7=&bRegex_7=false&bSearchable_7=true&sSearch_8=&bRegex_8=false&bSearchable_8=true&iSortCol_0=4&sSortDir_0=asc&iSortingCols=1&bSortable_0=true&bSortable_1=true&bSortable_2=true&bSortable_3=true&bSortable_4=true&bSortable_5=true&bSortable_6=true&bSortable_7=true&bSortable_8=true&type=BS_RATIO&date=06%2F14%2F2013&categoryName=&alertId=0&alertId2=&industryId=0&sectorId=0&symbol=spy&recom=&period=&perfPercent=";
var response = client.GetStringAsync(url).Result;

Response will contain html which you can parse.

于 2013-06-18T08:25:32.953 回答