0

我尝试了很多方法来提取表格:

https://secure.tickertech.com/bnkinvest/cgi/?a=historical&ticker=IVV&w=dividends

我正在使用 DOM、xpath 和在 stackoverflow 上找到的所有其他东西,它们都不起作用:/

谁能给我一些想法如何获得那张桌子?

是嵌套的......并且没有任何 ID 作为选择器,我的想法用完了......

<?php
$ch = curl_init("https://secure.tickertech.com/bnkinvest/cgi/?a=historical&ticker=IVV&w=dividends");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$content = curl_exec($ch);
curl_close($ch);

$doc = new DOMDocument();

// It's rare you'll have valid XHTML, suppress any errors- it'll do its best.
@$doc->loadhtml($content);

$xpath = new DOMXPath($doc);

// Modify the XPath query to match the content
foreach($xpath->query('//table')->item(1)->getElementsByTagName('tr') as $rows) {
    $cells = $rows->getElementsByTagName('td');
    if($cells->lenght() ==2)
    {
        print_r($cells);
    }
}
4

1 回答 1

0

我已调整 XPath 以尝试确保您获得正确的表,但正如您所说,没有任何 id 或类来区分它。这将查找具有 tr 和 td 组合的嵌套表。然后使用与您当前必须检查是否有 2 列几乎相同的代码,然后输出数据......

foreach( $xpath->query('//table[1]//table//table/tr[td]') as $rows) {
    $cells = $rows->getElementsByTagName('td');
    if($cells->length ==2)
    {
        echo $cells[0]->textContent."=>".$cells[1]->textContent.PHP_EOL;
    }
}
于 2019-11-11T20:42:55.493 回答