xpath - 如何使用 xpath/htmlwebunit 获取标签内的值

Question

我正在尝试创建一个从网页检索信息的 Java 应用程序。这是我试图访问第二个 tr 标记中第一个 td 标记中的值的代码的一部分：

<TABLE  CLASS="datadisplaytable" width = "100%">
<TR>
    <TD CLASS="dddead">&nbsp;</TD>
    <TH CLASS="ddheader" scope="col" ><SPAN class="fieldlabeltext">Capacity</SPAN></TH>
    <TH CLASS="ddheader" scope="col" ><SPAN class="fieldlabeltext">Actual</SPAN></TH>
    <TH CLASS="ddheader" scope="col" ><SPAN class="fieldlabeltext">Remaining</SPAN></TH>
</TR> 
<TR>
    <TH CLASS="ddlabel" scope="row" ><SPAN class="fieldlabeltext">Seats</SPAN></TH>
    **<TD CLASS="dddefault">46</TD>**
    <TD CLASS="dddefault">46</TD>
    <TD CLASS="dddefault">0</TD>
</TR>

这就是我现在所拥有的，但这仅返回 td 标签的类，而不是其中的值：

List<?> table = page.getByXPath("//table[@class='datadisplaytable'][1]//tr[2]/td");

我将如何获取 td 标签的值而不是其属性？

编辑：上面的代码返回这个：

HtmlTableDataCell[<td class="dddefault">]

score 10 · Accepted Answer

I am trying to create a Java Application that retrieves information from a webpage. This is part of the code I am trying to access the value in the 1st td tag in the 2nd tr tag:

Assuming that the document is as shown in the question (TABLE is the top element),

Use:

/TABLE/TR[2]/TD[1]/text()

This selects any text-node child of the first TD child of the second TR child of the top element TABLE.

In case the table is buried in the XML document, but can be uniquely identified by its CLASS attribute, use:

//TABLE[@CLASS='datadisplaytable']/TR[2]/TD[1]/text()

This selects any text-node child of the first TD child of the second TR child of any (we know thre is only one such) element TABLE in the XML document, such that the string value of its CLASS attribute is the string 'datadisplaytable'.

Finally, if even worse, there could be many TABLE elements whose CLASS attribute's value is 'datadisplaytable', and we want to select in the first such table, use:

(//TABLE[@CLASS='datadisplaytable'])[1]/TR[2]/TD[1]/text()

score 1 · Accepted Answer

for getting the text content from an element there is an xpath function called "text()" which you can use.

Element containing text 't' exactly         //*[.='t']  
Element <E> containing text 't'             //*[.='t']  
<a> containing text 't'                     //a[contains(text(),'t')]
<a> with target link 'url'                  //a[@href='url']
Link URL labeled with text 't' exactly      //a[.='t']/@href

如果您还使用 JwebUnit，则有一个方法“getElementTextByXPath”也可用于获取文本。 net.sourceforge.jwebunit.junit.WebTestCase

getElementTextByXPath

public String getElementTextByXPath(String xpath) 已弃用。获取给定元素的文本。参数： xpath - 元素的 xpath。

    for (int i = 1; i != 6; i++) {

        String result = getElementTextByXPath("//td["+i+"][text()]");

        System.out.println("The Content of TD is " +result);
    }

xpath - 如何使用 xpath/htmlwebunit 获取标签内的值

2 回答 2

Related

Reference