php - 如何从 PHP 中的字符串中找到单词的其余部分？

Question

假设我有一个页面我想抓取其中带有“ice”的单词，我怎样才能轻松做到这一点？我看到很多爬虫把东西分解成源代码，但我不需要这个。我只需要通过网页上的纯文本搜索的东西。

编辑：我基本上需要一些东西来搜索 .jpeg 并找到整个文件名。（它是网站上的纯文本，而不是隐藏在标签中）

score 1 · Accepted Answer

任何与以下匹配的都是其中的单词ice：

/(\w*)ice(\w*)/i

（请注意\w匹配。以下可能会产生更好的0-9结果：）_/\b.*?ice\b.*?/i

更新
要匹配文件名（不得包含空格）：

/\S+\.jpeg/i

示例：

<?php
$str = 'Picture of me: 238484534.jpeg and someone else img-of-someone.jpeg here';
$cnt = preg_match_all('/\S+\.jpeg/i', $str, $matches);
print_r($matches);

score 0 · Accepted Answer

为此将需要使用一些正则表达式。下面我使用 PCRE http://www.php.net/manual/en/ref.pcre.php和函数 preg_match http://www.php.net/manual/en/function.preg-match-all.php

<?php

$html = <<<EOF
<html>
    <head>
        <title>Test</title>
    </head>
    <body>List of files:
        <ul>
            <li>test1.jpeg</li>
            <li>test2.jpeg</li>
        </ul>
    </body>
</html>
EOF;
$matches = array();
$count = preg_match_all("([0-9a-zA-Z_-]+\.jpeg)", $html, $matches);
if (count($matches) > 1) {
    for ($i = 1; $i < count($matches); $i++) {
        print "Filename: {$matches[$i]}\n";
    }
}
?>

score 0 · Accepted Answer

1.你是否也想阅读 HTML 标签中的单词，比如属性、文本名称？2.或者只有网页的可见部分？

for#1 ：解决方案很简单，并且已经存在，如其他答案中所述。

for#2：使用 PHP DOMDOCUMENT 类，仅在 innerHTML 中提取和搜索。这里的文档：

http://php.net/manual/en/class.domdocument.php

例如看这个：

PHP DOMDocument 剥离 HTML 标签

score 0 · Accepted Answer

0

试试这个：

preg_match_all('/\w*ice\w*/', 'abc icecream lice', $matches);

print_r($matches);

于 2011-04-14T10:03:22.827 回答

php - 如何从 PHP 中的字符串中找到单词的其余部分？

4 回答 4

Related

Reference