1

我从这个 html 表中解析:

<table align="center">
   <tbody>
      <!-- riadok -->
      <tr>
         <td valign="middle" align="right">
            <form action="130427_0i.htm" method="get">
               <input type="submit" class="button" title="uvedení do první modlitby dne" value="Inv.">
            </form>
         </td>
         <td valign="middle" align="center">
            <form action="130427_0c.htm" method="get">
               <input type="submit" class="button" title="modlitba se čtením" value="Čtení">
            </form>
         </td>
         <td valign="middle" align="left">
            <form action="130427_0r.htm" method="get">
               <input type="submit" class="button" title="ranní chvály" value="Ranní chvály">
            </form>
         </td>
      </tr>
      <!-- riadok -->
      <tr>
         <td valign="middle" align="right">
            <form action="130427_09.htm" method="get">
               <input type="submit" class="button" title="modlitba dopoledne" value="9h">
            </form>
            <form action="130427_09d.htm" method="get">
               <input type="submit" class="button" title="modlitba dopoledne (žalmy z doplňovacího cyklu)" value="(alt)">
            </form>
         </td>
         <td valign="middle" align="center">
            <form action="130427_02.htm" method="get">
               <input type="submit" class="button" title="modlitba v poledne" value="12h">
            </form>
            <form action="130427_02d.htm" method="get">
               <input type="submit" class="button" title="modlitba v poledne (žalmy z doplňovacího cyklu)" value="(alt)">
            </form>
         </td>
         <td valign="middle" align="left">
            <form action="130427_03.htm" method="get">
               <input type="submit" class="button" title="modlitba odpoledne" value="15h">
            </form>
            <form action="130427_03d.htm" method="get">
               <input type="submit" class="button" title="modlitba odpoledne (žalmy z doplňovacího cyklu)" value="(alt)">
            </form>
         </td>
      </tr>
      <!-- riadok -->
      <tr>
         <td align="right">
            <form action="130427_0v.htm" method="get">
               <input type="submit" class="button" title="nešpory" value="Nešpory">
            </form>
         </td>
         <td valign="middle" align="center">
            <form action="130427_0k.htm" method="get">
               <input type="submit" class="button" title="kompletář" value="Kompl.">
            </form>
         </td>
      </tr>
      <!-- riadok -->
      <tr>
         <td align="right"></td>
      </tr>
   </tbody>
</table>

而且我需要在一个 HtmlNode 中获取每个表单(带有输入)。例如这个:

<form action="130427_0c.htm" method="get">
               <input type="submit" class="button" title="modlitba se čtením" value="Čtení">
 </form>

使用我的代码,我只得到这个:

<form action="130427_0c.htm" method="get">

我的代码:

public static class FromHtmlTableToHtmlNodeList
    {
        static List<List<HtmlNode>> tableOfNode = new List<List<HtmlNode>>();

        public static List<List<HtmlNode>> Do(string htmltable)
        {
            var doc = new HtmlDocument();
            doc.LoadHtml(htmltable);

            HtmlNodeCollection rows = doc.DocumentNode.SelectNodes(".//tr");
            for (int i = 0; i < rows.Count; i++)
            {
                int i2 = tableOfNode.Count;
                HtmlNodeCollection cols = rows[i].SelectNodes("./td");

                for (int j = 0; j < cols.Count; j++)
                {

                    HtmlNodeCollection inCols = cols[j].SelectNodes("./form/descendant-or-self::*");
                    List<HtmlNode> nextRow = new List<HtmlNode>();

                    if (inCols != null)
                    {
                        for (int k = 0; k < inCols.Count; k++)
                        {
                            if (tableOfNode.Count < i2+k + 1)
                            {
                                tableOfNode.Add(nextRow);

                            }
                            if (tableOfNode[i2 + k].Count < j + 1) tableOfNode[i2 + k].Insert(j, inCols[k]);

                        }
                    }                                   
                }


            }

            return tableOfNode;
        }



    }

我知道问题是存在的:

HtmlNodeCollection inCols = cols[j].SelectNodes("./form/descendant-or-self::*");

我想要的 XPath 应该是什么样子?

4

2 回答 2

0

You're looking for the XPath expression

./form[input]

This returns all <form/> elements including their subtrees which contain at least one <input/> element.

于 2013-04-27T10:02:10.217 回答
0

Html Agility Pack 默认对 FORM 进行特殊处理。在这里查看原因:HtmlAgilityPack -- <form> 是否出于某种原因自行关闭?

此代码应获取所有 FORM 元素:

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("form");
doc.Load(myTestHtm);

foreach (var v in doc.DocumentNode.SelectNodes("//form"))
{
    Console.WriteLine(v.OuterHtml);
}
于 2013-04-27T11:54:17.293 回答