0

我正在尝试使用 DOM 解析 HTML 表,它工作正常,但是当某些单元格包含 html 时,它不能正常工作。

这是示例 HTML 表

<tr>
<td>Razon Social: </td>
<td>Circulo Inmobiliaria Sur (Casa Central)</td>
</tr>

<tr>
<td>Email: </td>
<td> <img src="generateImage.php?email=myemail@domain.com"/> </td>
</tr>

和 PHP 代码:

$rows = $dom->getElementsByTagName('tr');

foreach ($rows as $row)   
{
    $cells = $row->getElementsByTagName('td');

    if(strpos($cells->item(0)->textContent, "Razon") > 0)
    {
        $_razonSocial = $cells->item(1)->textContent;
    }
    else if(strpos($cells->item(0)->textContent, "Email") > 0)
    {
        $_email = $cells->item(1)->textContent;
    }
}   

echo "Razon Social: $_razonSocial<br>Email: $_email";

输出:

Razon Social: Circulo Inmobiliaria Sur (Casa Central) 
Email: 

电子邮件是空的,它必须是:

<img src="generateImage.php?email=myemail@domain.com"/>

我什至尝试过

$cells->item(1)->nodeValue;

代替

$cells->item(1)->textContent;

但这也行不通。我怎样才能让它返回 HTML 值?

4

2 回答 2

0

Give id to your table as item_specification

 $dom = new DOMDocument();
        @$dom->loadHTML($html);
        $x = new DOMXPath($dom); 


    $table = $x->query("//*[@id='item_specification']/tr");
    $rows = $table;
    foreach ($rows as $row) {
     $atr_name = $row -> getElementsByTagName('td')->item(0)->nodeValue;
     $atr_val = $row -> getElementsByTagName('td')->item(1)->nodeValue;
     }

echo " {$atr_name} - {$atr_val} <br \>";

Its working fine.

于 2013-11-14T06:50:07.200 回答
0

正如我已经提到的,<img src="generateImage.php?email=myemail@domain.com"/>不是文本。这是另一个 html 实体。所以试试这个:

if(strpos($cells->item(0)->textContent, "Razon") !== false) {
    $_razonSocial = $cells->item(1)->textContent;
} else if(strpos($cells->item(0)->textContent, "Email") !== false) {
    $count = 0;
    // here we get all child nodes of td.
    // space before img-tag is also a child node, but it has type DOMText
    // so we skip it.
    foreach ($cells->item(1)->childNodes as $child) {
        if (++$count == 2)
            $_email = $child->getAttribute('src');
    }
    // now in $_email you have full src value and can somehow extract email
}
于 2013-11-14T06:51:16.293 回答