我想将链接列表(数组中的hrefs)中的单词列表(数组中)替换为html页面。
我认为主要有两种选择:
从正则表达式执行此操作(强烈建议不要解析和更改 html)。
使用 html 解析器并遍历要替换的每个单词和链接列表的 DOM。
第二个选项的问题如下:
我不想替换之前在 html 页面中创建的链接,对于在标签所在的列表中找到的每个单词,我都必须知道这些链接。
我不想替换 DOM 的每个节点上的单词,只替换没有子节点的节点,即仅在叶子中。
简单的例子:
$aURLlist = array('www.google.com','www.facebook.com');
$aWordList = array('Google', 'Facebook');
$htmlContent='<html><body><div>Google Inc. is an American multinational corporation specializing in Internet-related services and products.</div><div>Facebook is an online social networking service, whose name stems from the colloquial name for the book given to students at the start of the academic year by some university administrations in the United States to help students get to know each other.</div></body></html>';
$dom = new DOMDocument();
$dom->loadHTML($htmlContent);
$htmlContent=walkingDom($dom,$aURLlist,$aWordList); //replace all words of $aWordList found in text nodes of $dom TO links with href equal to URL in $aURLlist
结果:
$htmlContent=<html><body><div><a href='www.google.com'>Google</a> Inc. is an American multinational corporation specializing in Internet-related services and products.</div><div><a href='www.facebook.com'>Facebook</a> is an online social networking service, whose name stems from the colloquial name for the book given to students at the start of the academic year by some university administrations in the United States to help students get to know each other.</div></body></html>';
我有一个递归函数,它使用 DOMDocument 库遍历 DOM,但我无法附加“锚”节点来替换叶“文本”节点中找到的单词。
function walkDom($dom, $node, $element, $sRel, $sTarget, $iSearchLinks, $iQuantityTopics, $level = 0, $bLink = false) {
$indent = '';
if ($node->nodeName == 'a') {
$bLink = true;
}
for ($i = 0; $i < $level; $i++)
$indent .= ' ';
if ($node->nodeType != XML_TEXT_NODE) {
//echo $indent . '<b>' . $node->nodeName . '</b>';
//echo $indent . '<b>' . $node->nodeValue . '</b>';
if ($node->nodeType == XML_ELEMENT_NODE) {
$attributes = $node->attributes;
foreach ($attributes as $attribute) {
//echo ', ' . $attribute->name . '=' . $attribute->value;
}
//echo '<br>';
}
} else {
if ($bLink || $node->nodeName == 'img' || $node->nodeName == '#cdata-section' || $node->nodeName == '#comment' || trim($node->nodeValue) == '') {
continue;
//echo $indent;
//echo 'NO replace: ';
//var_dump($node->nodeValue);
//echo '<br><br>';
} elseif (!$bLink && $node->nodeName != 'img' && trim($node->nodeValue) != '') {
//echo $indent;
//echo "TEXT TO REPLACE: $element, $replace, $node->nodeValue, $iSearchLinks <br>";
$i = 0;
$n = 1;
while (i != $iSearchLinks && $n > 0 ) {
//echo "Create link? <br>";
$node->nodeValue = preg_replace('/'.$element->name.'/', '', $node->nodeValue, 1, $n);
if ($n > 0) {
//echo "Creating link with $element->name <br>";
$link = $dom->createElement("a", $element->name);
$link->setAttribute("class", "nl_tag");
$link->setAttribute("id", "@@ID@@");
$link->setAttribute("hreflang", $element->type);
$link->setAttribute("title", $element->altname);
$link->setAttribute("href", $element->resource);
if ($sRel == "nofollow") $link->setAttribute("rel", $sRel);
if ($sTarget == "_blank") $link->setAttribute("target", $sTarget);
$node->parentNode->appendChild($link);
//var_dump($node->parentNode);
$dom->encoding = 'UTF-8';
$dom->saveHTML();
$iQuantityTopics++;
}
$i++;
//saveHTML?
//echo '<br><br>';
}
}
}
此解决方案不起作用,因为 appendChild 函数仅在子项末尾添加新子项,但我想将其添加到找到要替换的单词所在的位置。
我还尝试将带有 preg_replace 函数的链接直接添加到叶文本节点中,但是锚作为“文本格式”添加到文本节点中,我需要将其添加为链接节点以替换叶文本节点中的单词位于。
我的问题是:是否可以使用 PHP 中的 html 解析器来执行此操作,或者我必须求助于正则表达式?提前致谢!