说我在 html 中有这个
<strong class="top">Contact Person: </strong>
<br>
Shan
<strong class="top">Email-id: </strong>
<br>
<span>abshanai@gmail.com</span>
<br>
<strong class="top">Website:</strong>
www.absgym.co.in
是否可以使用简单的 html DOM 获取值?
说我在 html 中有这个
<strong class="top">Contact Person: </strong>
<br>
Shan
<strong class="top">Email-id: </strong>
<br>
<span>abshanai@gmail.com</span>
<br>
<strong class="top">Website:</strong>
www.absgym.co.in
是否可以使用简单的 html DOM 获取值?
<?php
$sourcelink = 'http://en.wikipedia.org/wiki/Document_Object_Model';
$retriever = curl_init(); curl_setopt($retriever, CURLOPT_URL, $sourcelink);
curl_setopt($retriever, CURLOPT_REFERER, "http://www.google.com");
curl_setopt($retriever, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
curl_setopt($retriever, CURLOPT_HEADER, 0); curl_setopt($retriever,
CURLOPT_RETURNTRANSFER, true); curl_setopt($retriever, CURLOPT_TIMEOUT, 10);
$source_content =curl_exec ($retriever); curl_close ($retriever);
/*
*preg_match('/The starting of content's html tag(.*?)ending of content's html tag, source, variable with results)
*/
preg_match('/<h1 id="firstHeading" class="firstHeading" lang="en">(.*?)<div id="bodyContent" class="mw-body-content">/s',$source_content,$selected_area);
$needed_content=$selected_area[0];
$dom_class = new DOMDocument();
@$dom_class->loadHTML($needed_content);
$processor = new DOMXPath($dom_class);
/*
* This must be the html tag which is enclosing the targetted content to extract, syntax as below,
* $processor->query('//html tag[@html_attribute="value"]');
*/
$process_selector = $processor->query('//span[@dir="auto"]');
foreach( $process_selector as $valuesalue ) {
echo $values=trim($valuesalue->nodeValue); echo '<br>';
$accumaltor[]=$values;
}
?>
跟随:
要更改的强制行是上述代码中的第 3、15 和 16 行,对应于您确定的 DOM 文档。
第 3 行:使用您的网址
第 15 行:HTML 父标记包含您确定的 HTML DOM 部分。
第 16 行:这必须是包含要提取的目标内容的 html 标记。在你的情况下<strong class="top>Targeted content to extract</strong>
,所以
$process_selector = $processor->query('//strong[@class="top"]');