0

我正在尝试加载一个简单的 HTML 字符串,(无论 HTML 是否整洁)都不允许 DOMDocument 访问。

这是实例化

    $doc = new DOMDocument(/*'1.0', 'utf-8'*/);
    $doc->recover = true;
    $doc->strictErrorChecking = false;
    $doc->formatOutput = true;
    $doc->load($content);

    $node_array = $doc->getElementsByTagName("body");
    print_r( $node_array) 

...或者$node_array->items(0);

我得到:

DOMNodeList Object
(
)

DOMDocument 使用函数 save 很好地返回字符串它不是资源。会不会缺少依赖项、额外的 PHP 配置……?

更新: DOMDocument 的对象根本没有实现任何 tostring 转换函数:

    print_r( (string)$node_array );

类 DOMNodeList 的对象无法在...中转换为字符串


HTML 代码在这里: http: //pastebin.com/11V92Dup(故意格式错误 - 这是为了在代码中演示“整洁”正确关闭标签)

我想简单地遍历节点并输出它们的内容:

    $node_array = $doc->getElementsByTagName("html");//parent_node();
    $x = $doc->documentElement;
    foreach ($x->childNodes AS $item)
      {
      print $item->nodeName . " = " . $item->nodeValue . "<br />";
      }

更新2:我得到了这个结果!这是没有意义的。(所有空格从何而来?)

 body = 







                  COMPOUND: C05441
4

1 回答 1

0

我不太清楚你对答案的期望。反正我会试一试的。下面是一些递归迭代 HTML 树并输出每个元素的 textContent 值的代码。

<?php

$contents = <<<HTML
<html><head>
<title>KEGG COMPOUND: C05441</title>
<link type="text/css" rel="stylesheet" href="/css/gn2.css">
<link rel="stylesheet" href="/css/bget.css" type="text/css">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<meta http-equiv="Content-Script-Type" content="text/javascript">
</head>
<body onload="window.focus();init();" bgcolor="#ffffff">
<table border=0 cellpadding=0 cellspacing=0><tr><td>
<table border="0" cellspacing="0" cellpadding="0" width="100%"><tr><td width="70"><a href="/kegg/kegg2.html"><img align="middle" border="0" src="/Fig/bget/kegg2.gif" alt="KEGG"></a></td><td>&nbsp;&nbsp;&nbsp;</td><td><a name="compound:C05441"></a><font class="title2">COMPOUND: C05441</font></td><td align="right" valign="bottom"><a href="javascript:void(window.open('/kegg/document/help_bget_compound.html','KEGG_Help','toolbar=no,location=no,directories=no,width=720,height=640,resizable=yes,scrollbars=yes'))"><img onmouseup="btn(this,'Hb')" align="middle" onmouseout="btn(this,'Hb')" onmousedown="btn(this,'Hbd')" onmouseover="btn(this,'Hbh')" alt="Help" name="help" border="0" src="/Fig/bget/button_Hb.gif"></a></td></tr></table>
<form method="post" action="/dbget-bin/www_bget" enctype="application/x-www-form-urlencoded" name="form1">
<table border=0 cellpadding=1 cellspacing=0>
<tr>
<td class="fr2">
<table border=0 cellpadding=2 cellspacing=0 style="border-bottom:#000 1px solid">

</table>
</body></html>
HTML;

$doc = new DOMDocument("1.0", "UTF-8");
$doc->loadHTML($contents);

header("Content-Type: text/plain; charset=utf-8");

function recursivelyEchoChildNodes (DOMElement $parent, $depth = 1) {
    foreach ($parent->childNodes as $node) {
        if ($node instanceof DOMElement) {
            echo str_repeat("-", $depth) . " " . $node->localName . " = " . $node->textContent . "\n";
            if ($node->hasChildNodes()) {
                recursivelyEchoChildNodes($node, $depth + 1);
            }
        }
    }
}

$html = $doc->getElementsByTagName("html")->item(0);
recursivelyEchoChildNodes($html);
于 2012-05-24T22:18:50.773 回答