绝对使用 DOM 解析器。带有 DOMDocument 的 Xpath 将干净、可靠地替换以下脚本标记:
- 有一个 src 属性和
- src 属性不以 http 开头。
我本可以进一步开发 xpath 查询表达式来检查前导http
子字符串,但我不想用更多语法吓跑你。
代码:(演示)
$html = <<<HTML
<html>
<head>
<script type='text/javascript' src='/wp-includes/js/jquery/jquery.js?ver=1.8.3'></script>
<script language="JavaScript">
window.moveTo(0,0);
window.resizeTo(screen.width,screen.height);
</script>
</head>
</html>
HTML;
$workingUrl = 'https://www.example.com';
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//script[@src]") as $node) {
if (strpos($node->getAttribute('src'), 'http') !== 0) {
$node->setAttribute('src', $workingUrl);
}
}
echo $dom->saveHTML();
输出:
<html>
<head>
<script type="text/javascript" src="https://www.example.com"></script>
<script language="JavaScript">
window.moveTo(0,0);
window.resizeTo(screen.width,screen.height);
</script>
</head>
</html>
唯一稍微“吓人”的 xpath 版本:(演示)
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//script[@src and not(starts-with(@src,'http'))]") as $node) {
$node->setAttribute('src', $workingUrl);
}
echo $dom->saveHTML();