我在找工作。我正在编写一个每天 cron 一次的脚本。它从网站中提取文本和链接。当谈到正则表达式模式时,我很无助。
这是我从中提取数据的示例:
<div class="cat-list-item job-list-item">
<h3 class="expressway full-width"><a href="/about/careers/network_engineer_voip_telephony">Network Engineer - VoIP Telephony</a></h3>
<div class="career-summary">
<p>
Provide daily support, proactive maintenance and independent troubleshooting, and identify capacity/performance issues to ensure
</p>
</div>
<p class="locations-heading"><b>Locations</b></p>
<ul class="locations-list normal">
<li>
Elizabethtown Headquarters
</li>
</ul>
<div class="list-bottom">
<a class="learn-more replace" href="/about/careers/network_engineer_voip_telephony">Learn More</a>
</div>
这是我到目前为止所拥有的:
<?php
$url = "http://bluegrasscellular.com/about/careers/";
$input = @file_get_contents($url) or die("Could not access file: $url");
$regexp = "<h3 class=\"expressway full-width\"><a\s[^>]*href=\"\/about\/careers\/(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
if (preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
foreach ($matches as $match) {
// $match[2] = link address
// $match[3] = link text
echo "<a href='http://bluegrasscellular.com/about/careers/{$match[2]}'>{$match[3]}</a><br>";
}
}
?>
然而,所做的只是将文本和 href 从 . 我还想抓住以下内容:
- 提供日常支持、主动维护和独立故障排除,并识别容量/性能问题以确保
- 伊丽莎白镇总部
我最终希望将这些存储在数据库中并通知我任何新职位。我不知道该怎么做。任何帮助是极大的赞赏。