1

我想以数组形式或 xml 格式获取我的 html 数据,以便可以轻松地将其保存在数据库中。这是我到目前为止的工作:

$url = "http://www.example.com/";

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    if($html = curl_exec($ch)){

        // parse the html into a DOMDocument
        $dom = new DOMDocument();

        $dom->recover = true;
        $dom->strictErrorChecking = false;

        @$dom->loadHTML($html);

        $hrefs = $dom->getElementsByTagName('div');


        curl_close($ch);


    }else{
        echo "The website could not be reached.";
    }

我应该怎么做才能以数组形式或 xml 格式获取 html。来的html是这样的:

<div>
 <ul>
   <li>Product Name</li>
   <li>Category</li>
   <li>Subcategory</li>
   <li>Product Price</li>
   <li>Product Company</li>
 </ul>
</div>
4

1 回答 1

1

对于 XML 输出,只需执行以下操作:

function download_page($path){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$path);
curl_setopt($ch, CURLOPT_FAILONERROR,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
$retValue = curl_exec($ch);          
curl_close($ch);
return $retValue;
}

$sXML = download_page('http://example.com');
$oXML = new SimpleXMLElement($sXML);

foreach($oXML->entry as $oEntry){
    header('Content-type: application/xml')
    echo $oEntry->title . "\n";
}
于 2013-09-26T06:15:36.867 回答