1

说我在 html 中有这个

<strong class="top">Contact Person: </strong>
<br>
Shan
<strong class="top">Email-id: </strong>
<br>
<span>abshanai@gmail.com</span>
<br>
<strong class="top">Website:</strong>
www.absgym.co.in

是否可以使用简单的 html DOM 获取值?

4

1 回答 1

0
<?php

$sourcelink = 'http://en.wikipedia.org/wiki/Document_Object_Model';
$retriever = curl_init(); curl_setopt($retriever, CURLOPT_URL, $sourcelink);
curl_setopt($retriever, CURLOPT_REFERER, "http://www.google.com");
curl_setopt($retriever, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
curl_setopt($retriever, CURLOPT_HEADER, 0); curl_setopt($retriever,
CURLOPT_RETURNTRANSFER, true); curl_setopt($retriever, CURLOPT_TIMEOUT, 10);
$source_content =curl_exec ($retriever); curl_close ($retriever);

/*
 *preg_match('/The starting of content's html tag(.*?)ending of content's html tag, source, variable with results)
 */

preg_match('/<h1 id="firstHeading" class="firstHeading" lang="en">(.*?)<div id="bodyContent" class="mw-body-content">/s',$source_content,$selected_area);

$needed_content=$selected_area[0];

$dom_class = new DOMDocument();

@$dom_class->loadHTML($needed_content);

$processor = new DOMXPath($dom_class);

/*
 * This must be the html tag which is enclosing the targetted content to extract, syntax as below,
 * $processor->query('//html tag[@html_attribute="value"]');
 */
$process_selector = $processor->query('//span[@dir="auto"]');

foreach( $process_selector as $valuesalue ) {
    echo $values=trim($valuesalue->nodeValue); echo '<br>';
    $accumaltor[]=$values;
}

?>

跟随:

  1. 要更改的强制行是上述代码中的第 3、15 和 16 行,对应于您确定的 DOM 文档。

  2. 第 3 行:使用您的网址

  3. 第 15 行:HTML 父标记包含您确定的 HTML DOM 部分。

  4. 第 16 行:这必须是包含要提取的目标内容的 html 标记。在你的情况下<strong class="top>Targeted content to extract</strong>,所以

$process_selector = $processor->query('//strong[@class="top"]');

于 2014-12-11T10:44:54.483 回答