php - 使用 simplehtmldom 从网页获取指定的 url

Question

我正在尝试构建简单的 php 爬虫

以此目的

我正在使用 http://simplehtmldom.sourceforge.net/获取网页常量

获取页面数据后，我得到如下页面

include('simplehtmldom/simple_html_dom.php');
$html = file_get_html('http://www.mypage.com');
foreach($html->find('a') as $e) 
echo $e->href . '<br>';

这完美地工作，并打印该页面上的所有链接。

我只想得到一些像

/view.php?view=open&id=

我为此目的编写了函数

function starts_text_with($s, $prefix){
    return strpos($s, $prefix) === 0;
}

并将此功能用作

include('simplehtmldom/simple_html_dom.php');
$html = file_get_html('http://www.mypage.com');
foreach($html->find('a') as $e) {
    if (starts_text_with($e->href, "/view.php?view=open&id=")))
    echo $e->href . '<br>';
}

但没有任何回报。

我希望你明白我需要什么。

i need to print only url which match that criteria.

谢谢

score 1 · Accepted Answer

include('simplehtmldom/simple_html_dom.php');
$html = file_get_html('http://www.mypage.com');
foreach($html->find('a') as $e) {
    if (preg_match($e->href, "view.php?view=open&id="))
         echo $e->href . '<br>';
}

试试这个。

参考preg_match

php - 使用 simplehtmldom 从网页获取指定的 url

1 回答 1

Related

Reference