php - 使用 preg_match_all 从字符串中提取图像 SRC

Question

我有一串设置为$content的数据，这个数据的一个例子如下

This is some sample data which is going to contain an image in the format <img src="http://www.randomdomain.com/randomfolder/randomimagename.jpg">.  It will also contain lots of other text and maybe another image or two.

我试图只抓取<img src="http://www.randomdomain.com/randomfolder/randomimagename.jpg">并将其保存为另一个字符串，例如 $extracted_image

到目前为止我有这个....

if( preg_match_all( '/<img[^>]+src\s*=\s*["\']?([^"\' ]+)[^>]*>/', $content, $extracted_image ) ) {
$new_content .= 'NEW CONTENT IS '.$extracted_image.'';

它返回的只是......

NEW CONTENT IS Array

我意识到我的尝试可能完全错误，但有人能告诉我哪里出错了吗？

score 1 · Accepted Answer

您的第一个问题是http://php.net/manual/en/function.preg-match-all.php将数组放入$matches，因此您应该从数组中输出单个项目。尝试$extracted_image[0]开始。

score 1 · Accepted Answer

如果您只想要一个结果，则需要使用不同的功能：

preg_match()返回第一个也是唯一一个匹配项。 preg_match_all()返回一个包含所有匹配项的数组。

score 0 · Accepted Answer

不建议使用正则表达式来解析有效的 html。因为在 src 属性之前可能会有意想不到的属性，因为非 img 标签可以欺骗正则表达式进行误报匹配，并且因为属性值可以用单引号或双引号引起来，所以应该使用 dom 解析器。它干净、可靠且易于阅读。

代码：（演示）

$string = <<<HTML
This is some sample data which is going to contain an image
in the format <img src="http://www.randomdomain.com/randomfolder/randomimagename.jpg">.
It will also contain lots of other text and maybe another image or two
like this: <img alt='another image' src='http://www.example.com/randomfolder/randomimagename.jpg'>
HTML;

$srcs = [];
$dom=new DOMDocument;
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('img') as $img) {
    $srcs[] = $img->getAttribute('src');
}

var_export($srcs);

输出：

array (
  0 => 'http://www.randomdomain.com/randomfolder/randomimagename.jpg',
  1 => 'http://www.example.com/randomfolder/randomimagename.jpg',
)

php - 使用 preg_match_all 从字符串中提取图像 SRC

3 回答 3

Related

Reference