0

有人可以指导我如何使用 DOMDocument() 从下面的表单中提取内容吗?. 我能够提取所有链接,即 ../index.html、descriptions/page001 等,并将提取的数据保存到 mysql 数据库中,但我被困在如何获取内容,即会计、成人继续教育等并将信息保存到数据库中。

<HTML>
<HEAD></HEAD>
<BODY>
<FORM ACTION="#">
<SELECT ONCHANGE="MM_jumpMenu('parent',this,0)" NAME="menu1"> 
<OPTION VALUE="../index.html" SELECTED="SELECTED"></OPTION> 
<OPTION VALUE="descriptions/page001.html">Accounting</OPTION> 
<OPTION VALUE="descriptions/page122.html">Adult Continuing Education</OPTION>
<OPTION VALUE="descriptions/page115.html">Energy Engineering</OPTION> 
</SELECT>
</P></FORM> 
</BODY>
</HTML>


MY CURL SCRIPT
// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all on the page
$xpath = new DOMXPath($dom);


// GET AND LOOP THROUGH LINKS
$values = $xpath->evaluate("/html/body//option");
for ($cnt = 0; $cnt < $values->length; $cnt++) {
$value = $values->item($cnt);
$url = $value->getAttribute('value');
    //store extracted links and links source into the database function
storeLink($url,$target_url);
echo "Link stored: $url";
}

任何帮助将不胜感激。谢谢。

4

2 回答 2

0

对于标签之间的值,例如会计:

<OPTION VALUE="descriptions/page001.html">Accounting</OPTION>

你需要->nodeValue

...
$options = $document->getElementsByTagName('option');

foreach ($options as $option) {
  storeLink($option->getAttribute('value'), $option->nodeValue);
}
于 2012-06-12T18:56:02.480 回答
0

这是解决方案:

$html = '<HTML>
  <HEAD></HEAD>
  <BODY>
  <FORM ACTION="#">
  <SELECT ONCHANGE="MM_jumpMenu(\'parent\',this,0)" NAME="menu1"> 
  <OPTION VALUE="../index.html" SELECTED="SELECTED"></OPTION> 
  <OPTION VALUE="descriptions/page001.html">Accounting</OPTION> 
  <OPTION VALUE="descriptions/page122.html">Adult Continuing Education</OPTION>
  <OPTION VALUE="descriptions/page115.html">Energy Engineering</OPTION> 
  </SELECT>
  </P></FORM> 
  </BODY>
  </HTML>';

$document = new DOMDocument();
$document->loadHTML($html);
$options = $document->getElementsByTagName('option');

foreach ($options as $option) {
  echo $option->getAttribute('value');
  echo "\n";
}
于 2012-06-12T18:42:55.937 回答