php - 功能无法在所有情况下正常工作

Question

在为某些内容（链接等）抓取网页时，我有一个功能可以帮助我：

function list_tags($html, $start, $end)
{
    preg_match_all("($start(.*)$end)siU", $html, $matching_data);
    return $matching_data[0];
}

示例用法：

$open_tag  = '<a';
$close_tag = '>';
$links     = list_tags($html, $open_tag, $close_tag);

因此print_r($links);导致：

Array
(
    [0] => <a href="blah.html">
    [1] => <a href="other_blah.html">
    Etc...
    Etc...
)

当我使用 $open_tag = '<script';或 $open_tag = '<div';等时我可以做同样的事情，但是当我尝试使用$open_tag = '<input';我的数组时，我的数组完全是空的，尽管<input>页面上有几个标签。有任何想法吗？

编辑：

我试图抓取的特定页面是http://www.pcsoweb.com/inmatebooking/Inquiry.aspx. 我在自己制作的页面上使用了相同的东西，它确实找到了`<input ... />我创建的所有事件。

我将不得不更深入地了解是什么阻止我抓取<input />这个特定网站上的标签。

我还将研究DOMDocument课程，看看这是否能提供更好的结果。

感谢您的建议，doublesharp和feeela。我会进一步研究一下，看看真正的问题是什么。

score 2 · Accepted Answer

使用 DOM 解析器是首选，但如果需要使用正则表达式来解析数据，请尝试使用/作为分隔符，而不是(使)代码更具可读性并使匹配组变得懒惰?（删除U修饰符）：

function list_tags($html, $start, $end)
{
    // escape forward slashes in your pattern start and end
    $start = str_replace("/", "\/", $start);
    $end   = str_replace("/", "\/", $end);
    preg_match_all("/{$start}(.*?){$end}/si", $html, $matching_data);
    return $matching_data[0];
}

$html = "<input test='test'><a href='asdf'>";
$open_tag  = '<(input|a)';
$close_tag = '>';
$links     = list_tags($html, $open_tag, $close_tag);
print_r($links);

运行此代码会导致：

Array
(
    [0] => <input test='test'>
    [1] => <a href='asdf'>
)

score 0 · Accepted Answer

如果我将您的正则表达式粘贴(<input(.*)>)siU到http://www.functions-online.com/preg_match_all.html

和

<a>dfg</a><input type="sdgf"/>

/>需要注意的一件事是以（自关闭）结尾的输入。您的设置可能导致找不到？

没有 HTML 示例，很难说。

php - 功能无法在所有情况下正常工作

2 回答 2

Related

Reference