php - 在 PHP 和正则表达式中使用 preg_match_all 进行 URL 匹配

Question

我正在尝试构建一个从 imdb 列表中获取电影网址的爬虫。我能够将页面上的所有链接放入一个数组中，并且只想选择那些带有“标题”的链接。

preg_match_all($pattern, "[125] => href=\"/chart/2000s?mode=popular\" [126] => href=\"/title/tt0111161/\" ", $matches);

哪里$pattern='/title/'。

我收到以下错误：

警告：preg_match_all() [function.preg-match-all]：第 53 行 C:\xampp\htdocs\phpProject1\index.php 中的分隔符不能是字母数字或反斜杠

关于如何去做的任何想法？非常感谢。

score 1 · Accepted Answer

你确定$pattern是'/title/'在调用 preg_match_all 的时候吗？

当提供给 preg_match_all （第一个参数）的模式没有正确分隔时，您会遇到错误。

score 1 · Accepted Answer

使用DOM 解析器：

// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');

// Find all links containing title as part of their HREF 
$links = $html->find('a[href*=title]');

// loop through links and do stuff
foreach($links as $link) { 
       echo $element->href . '<br>';
}

http://simplehtmldom.sourceforge.net/manual.htm

php - 在 PHP 和正则表达式中使用 preg_match_all 进行 URL 匹配

2 回答 2

Related

Reference