php - 从多个 HTML 文件中获取锚标记

Question

我不确定这是否可能，但我正在尝试在我的网站上的几个 HTML 文件中提取所有锚标记链接。我目前编写了一个 php 脚本，它扫描一些目录和子目录，这些目录和子目录构建了一组 HTML 文件链接。这是该代码：

$di = new RecursiveDirectoryIterator('Migration'); 
$migrate = array();
foreach (new RecursiveIteratorIterator($di) as $filename => $file) { 
if (eregi("\.html",$file) || eregi("\.htm",$file) ) {
$migrate[] .= $filename; 
}
}

此方法成功生成了我需要的 HTML 文件链接。前任：

Migration/administration/billing/Billing.htm
Migration/administration/billing/_notes/Billing.htm.mno
Migration/administration/new business/_notes/New Business.htm.mno
Migration/administration/new business/New Business.htm
Migration/account/nycds/_notes/NYCDS Index.htm.mno
Migration/account/nycds/NYCDS Index.htm

还有更多链接，但这给了你一个想法。下一部分是我卡住的地方。我在想我需要一个 for 循环来遍历每个数组元素，打开文件，提取链接，然后将这些链接存储在某个地方。我只是不确定我将如何进行这个过程。我试图用谷歌搜索这个问题，但我似乎从来没有得到与我想要做的匹配的结果。这是我拥有的简化的 for 循环。

var obj = <?php echo json_encode($migrate); ?>;
for(var i=0;i< obj.length;i++){ 
// alert(obj[i]);
}

上面的代码是在javascript中的。从我正在阅读的内容来看，我似乎不应该使用 javascript，但应该继续使用 PHP。我对下一步应该做什么感到困惑。如果有人能指出我正确的方向，我将不胜感激。非常感谢您的参与。

score 1 · Accepted Answer

用于DOMDocument::getElementsByTagName检索所有<a>标签

http://www.php.net/manual/en/domdocument.getelementsbytagname.php

例子，

$doc = new DOMDocument();
$doc->loadHTMLFile("filename.html");
$anchors = $doc->getElementsByTagName('a'); //retrieve all anchor tags
foreach ($anchors as $a) { //loop anchors
    echo $a->nodeValue;
}

php - 从多个 HTML 文件中获取锚标记

1 回答 1

Related

Reference