1

我试图弄清楚如何才能从此页面仅获取电影的标题。

我有这个,但我无法让它工作。我也不太了解 DomDocument。这当前获取页面上的所有链接。但是,我只需要获取列出的电影标题的链接。

$content =  file_get_contents("http://www.imdb.com/movies-in-theaters/");

$dom = new DomDocument();
$dom->loadHTML($content);
$urls = $dom->getElementsByTagName('a');
4

1 回答 1

2
$dom = new DomDocument();
@$dom->loadHTMLFile('http://www.imdb.com/movies-in-theaters/');
$urls = $dom->getElementsByTagName('a');
$titles = array();

foreach ($urls as $url)
{
    if ('overview-top' === $url->parentNode->parentNode->getAttribute('class'))
        $titles[] = $url->nodeValue;
}

print_r($titles);

将输出:

Array
(
    [0] =>  Star Trek Into Darkness (2013)
    [1] =>  Frances Ha (2012)
    [2] =>  Stories We Tell (2012)
    [3] =>  Erased (2012)
    [4] =>  The English Teacher (2013)
    [5] =>  Augustine (2012)
    [6] =>  Black Rock (2012)
    [7] =>  State 194 (2012)
    [8] =>  Iron Man 3 (2013)
    [9] =>  The Great Gatsby (2013)
    [10] =>  Pain & Gain (2013)
    [11] =>  Peeples (2013)
    [12] =>  42 (2013)
    [13] =>  Oblivion (2013)
    [14] =>  The Croods (2013)
    [15] =>  The Big Wedding (2013)
    [16] =>  Mud (2012)
    [17] =>  Oz the Great and Powerful (2013)
)

您也可以使用 XPath 来执行此操作,但我不太了解以这种方式执行此操作。

于 2013-05-14T02:31:19.380 回答