php - 解析html代码并打印出来

Question

给这个带有多个（a href="https://twitter.com/$name) 的 html 页面（部分代码）我需要解析所有 $names 并在页面中打印我该怎么做？

 <td>Apr 01 2011<br><b>527
  </b> 
</td>
<td>
                                            <a href="https://twitter.com/al_rasekhoon" class="twitter-follow-button" data-show count="false" data-lang="" data-width="60px" > al_rasekhoon</a>
</td>                                   
</tr>
   <tr class="rowc"><td colspan="11"></td></tr>

score 2 · Accepted Answer

您需要遍历 $names 数组并a为该数组中的每个条目打印一个正确的标签。像这样：

<?php foreach($names as $name){ ?>
    <a href="https://twitter.com<?php echo $name ?>"><?php echo $name ?></a>
<?php }  ?>

score 0 · Accepted Answer

听起来像是屏幕抓取，您需要为此遍历 DOM。RE 将非常不可靠。

DOMDocument 可能对您有所帮助，但您可能希望查看用于屏幕抓取的库，例如 BeautifulSoup（或一些 PHP 等效项）。

score 0 · Accepted Answer

如果我理解正确，您从某处获取 html 页面并想要提取所有链接的 twitter 用户？您可以解析 html 代码，也可以通过一些字符串拆分来执行此操作。这段代码未经测试，但应该给你一个想法：

$input = '(the html code)';
$links = explode('<a ', $input); //split input by start of link tags
for ($i = 0; $i < count($links); $i++) {
    //cut off everything after the closing '>'
    $links[$i] = explode('>', $links[$i], 2)[0]
    //skip this link if it doesn't go to twitter.com
    if (strpos($links[$i], 'href="twitter.com/') === False) { continue; }
    //split by the 'href' attribute and keep everything after 'twitter.com'
    $links[$i] = explode('href="twitter.com/', $links[$i], 2)[1]
    //cut off everything after the " ending the href attribute
    $links[$i] = explode('"', $links[$i], 2)[0]
    //now $links[$i] should contain the twitter username
    echo $links[$i]
}

注意：如果页面上有其他指向 twitter 的链接不是主页或用户，它们也会被打印出来（例如，如果页面链接到 twitter 常见问题解答）。您需要手动过滤它们。

php 很烂，让我们在 python 中执行此操作！

input = '(the html code)'
links = [l.split(">", 1)[0] for l in input.split("<a ")}
twitter_links = [l for l in links if 'href="twitter.com/' in l]
twitter_hrefs = [l.split('href="twitter.com/', 1)[1] for l in twitter_links]
users = [l.split('"', 1)[0] for l in twitter_hrefs]
print '\n'.join(users)

php - 解析html代码并打印出来

3 回答 3

Related

Reference