2

我正在尝试从大型 HTML 页面中解析出值,并且正在努力解决如何从两个选择器之间提取文本。这是我的示例 HTML 来说明:

<table class="categories">
<tr class="category">
    <td class="categoryTitle">Category #1</td>
    <td class="categoryDate">12-1-2012</td>
    <td class="categoryFoos">212</td>       
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #1</div></td>
    <td class="catItemColor">Blue</td>
    <td class="catItemSprockets">17</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #2</div></td>
    <td class="catItemColor">Red</td>
    <td class="catItemSprockets">454</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #3</div></td>
    <td class="catItemColor">Purple</td>
    <td class="catItemSprockets">11</td>
</tr>
<tr class="category">
    <td class="categoryTitle">Category #2</td>
    <td class="categoryDate">12-17-2012</td>
    <td class="categoryFoos">311</td>       
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #1</div></td>
    <td class="catItemColor">Yellow</td>
    <td class="catItemSprockets">73</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #2</div></td>
    <td class="catItemColor">Red</td>
    <td class="catItemSprockets">5</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #3</div></td>
    <td class="catItemColor">Purple</td>
    <td class="catItemSprockets">11</td>
</tr>
</table>

我将如何获取 ICsqWebResponse 并解析每个类别,包括标题、日期和“foos”,以及每个类别中的所有项目作为项目的集合?很清楚我想要得到什么,对象应该是这样的:

Categories = {
    Category #1 { 
       Date: 12-1-2012,
       Foos: 212,
       Items: [
          Category Item #1 {
             Color: Blue,
             Sprockets: 17
          },
          Category Item #2 {
             Color: Red,
             Sprockets: 454
          },
          ... more items ...
       ]
     },
     Category #2 {
        Date: 12-17-2012,
        Sprockets: 311,
        Items: [
            Category Item #1 {
                Color: Yellow,
                Sprockets: 73
            },
            Category Item #2 {
                Color: Red,
                Sprockets: 5
            },
            Category Item #3 {
                Color: Purple,
                Sprockets: 11
            }
        ]
     }
 }
4

2 回答 2

0

您将遍历所有行。使用CsQuery 库

CQ dom = "<table> ...your html... </table>"; // or CQ.CreateFromUrl("http://www.jquery.com");
CQ rows= dom["tr"].ToList();

如果您有一个新类别,则开始一个新类别并添加项目。

var categoryList = new List<Category>();
var currentCategory = null;

    foreach(var r in rows) {
       // extract class name from html, with regex
       var className = ...;

       if(currentCategory != null && className == "catItem")
       {
           var item = new CategoryItem();
           item.Name = r[".itemName"].First().Text();
           item.Color = r[".catItemColor"].First().Text();
       ...

           currentCategory.Items.Add(item);
       }
       else if(className == "category")
       {
           var item = new CategoryItem();
           item.Date = r[".categoryDate"].First().Text();
           item.Foos= r[".categoryFoos"].First().Text();
       ...

           categoryList.Add(item);
       }

    }

免责声明:这不是生产就绪代码;-)

于 2014-05-05T14:25:35.720 回答
0

如果我明白你想说什么...

    CQ html = "your html here";
    html[".Category"].Each((index,dom)=>{

        var category = dom.Cq(); //everything what will go bellow
        //you will need to use .Find() function NOT '[]' or SELECT because it will
        // get values from whole html not just from your  category

        string categoryTitle = category.Find(".categoryTitle").Text();
        string categoryDate = cateogry.Find(".categoryDate").Text();
        //and etc...

        //now loop throw catItems
        category[".catItems"].Each((catIndex,catDom)=>{

            var catItem = catDom.Cq();
            //the same principe goes here. 
        });
    });
于 2014-05-14T14:30:16.983 回答