php - Preg_match 的问题

Question

我做了一个简单的应用程序来从allrecipes.com.我正在使用的网站上获取食谱信息preg_match，但是有些东西不起作用。

$geturl = file_get_contents("http://allrecipes.com/Recipe/Brown-Sugar-Smokies/Detail.aspx?src=rotd");
          preg_match('#<title>(.*) - Allrecipes.com</title>#', $geturl, $match);
          $name = $match[1];
          echo $name;

我只是想获取页面的标题（减去- Allrecipes.com部分）并将其放入变量中，但出现的所有内容都是空白的。

score 3 · Accepted Answer

如果您查看页面的源代码，您会注意到<title>实际文本周围包含一些填充，您需要对此进行补偿。

'#<title>\s*(.*) - Allrecipes.com\s*</title>#'

score 2 · Accepted Answer

这种模式有两个问题。首先，在<title>没有被捕获的之后有一个换行符.（因为没有/s修饰符.实际上是“除 EOL 之外的任何符号”）。其次，Allrecipes.com文本后面实际上没有</title>子字符串，它们之间有一个换行符。

考虑到\s同时涵盖普通空格和分隔行的事实，您可以像这样更改您的正则表达式：

'#<title>\s*(.*?) - Allrecipes.com\s*</title>#s'

/s修饰符在这里实际上并不相关（感谢 minitech 注意到这一点），因为这个秘籍中的标题是单行的，并且所有“\n”符号都将\s*被子表达式覆盖。但我仍然建议把它留在那里，这样多行标题就不会让你措手不及。

为了提高效率，我在这里替换.*为.*?：由于您要查找的字符串很短，因此在这里使用非贪婪量词是有意义的。

score 1 · Accepted Answer

您应该首先获得整个标题，然后使用 PHP 将其剥离，如下所示：

<?php

$raw_html=file_get_contents('http://www.allrecipes.com');
if (empty($raw_html)) {
    throw new \RuntimeException('Fetch empty');
}

$matches=array();
if (preg_match('/<title>(.*)<\/title>/s', $raw_html, $matches) === false) {
    throw new \RuntimeException('Regex error');
}

$title=trim($matches[1]);

// you should strip your title here
echo $title;

php - Preg_match 的问题

3 回答 3

Related

Reference