php - 使用正则表达式提取页面内容和图像

Question

我正在使用以下方法获取页面内容：

$data = file_get_contents($url);

现在我想提取

图像和
数据部分留下脚本和 html 代码。

这是我使用的图像的正则表达式：

function get_logo($data) 
{
    return preg_match("/<img(.*?)src=(\"|\')(.+?)(gif|jpg|png|bmp)(\"|\')(.*?)(\/)?>(<\/img>)?/", $html, $matches) ? $matches[1] : '';
}

什么都不返回。

score 2 · Accepted Answer

不要使用正则表达式来解析 HTML！

我建议您使用像PHP Simple HTML DOM Parser这样的 HTML DOM 解析。

score 1 · Accepted Answer

1）我们看不到html，很难理解您需要。

2)preg_match_all("/<img[^>]+src=[\"|\'](.+\.(gif|jpg|png|bmp))[\"|\']/im", $html, $matches)返回页面上的所有img标签、图片名称和扩展名

score 1 · Accepted Answer

以下正则表达式将从 $data 变量中提取图像 url：

preg_match_all('/<img[^>]+src=([\'"])([^"\']+)\1/i', $content, $matches);
var_dump($matches[2]);

在数组中 $matches[2] 将是来自 $content 的图像的所有链接

php - 使用正则表达式提取页面内容和图像

3 回答 3

Related

Reference