0

我正在尝试从网站上获取最新消息并自己包含在内。该站点使用 Joomla (ugh) 并且生成的内容 href 缺少基本 href。

所以链接将保留contensite.php?blablabla,这将导致链接http://www.example.com/contensite.php?blablabla

所以我想在回显之前替换http://为。http://www.basehref.com但我的知识到此为止。

我应该使用哪个:preg_replacestr_replace?我不确定。

4

2 回答 2

0
include_once('db_connect.php');
// connect to my db
require_once('Net/URL2.php');
include_once('dom.php');
// include html_simple_dom!

$dom = file_get_html('http://www.targetsite.com');
// get the html content of a site and pass it through html simple dom !

$elem2 = $dom->find('div[class=blog]', 0);
// set the div to target for !


$uri = new Net_URL2('http://www.svvenray.nl'); // URI of the resource
$baseURI = $uri;
foreach ($elem2->find('base[href]') as $elem) {
$baseURI = $uri->resolve($elem->href);
}

foreach ($elem2->find('*[src]') as $elem) {
$elem->src = $baseURI->resolve($elem->src)->__toString();
}
foreach ($elem2->find('*[href]') as $elem) {
if (strtoupper($elem->tag) === 'BASE') continue;
$elem->href = $baseURI->resolve($elem->href)->__toString();
}

echo $elem2; 

这将修复所有损坏的链接,并且需要 PHP PEAR Net/URL2.php

于 2013-06-25T09:32:48.807 回答
0

所以我无法(因为我缺乏 preg 匹配的知识)修复损坏的链接,而是用另一个链接替换它们,并将链接的类替换为我的 fancybox 类,这样它将在花式盒子。

include_once('db_connect.php');
// connect to my db

include_once('dom.php');
// include html_simple_dom!

$dom = file_get_html('http://www.remotesite.com');
// get the html content of a site and pass it through html simple dom !

$elem = $dom->find('div[class=blog]', 0);
// set the div to target for !



$pattern = '/(?<=href\=")[^]]+?(?=")/';
$replacement ='http://www.remotesite.com';
$replacedHrefHtml = preg_replace($pattern, $replacement, $elem);
// replacement 1
// replace the broken links (base href is missing , joomla sucks , period !)
// im to lazy to preg_match it any other way, feel free to improve this !

$pattern2 = '/contentpagetitle/';
$replacement2 ='fancybox fancybox.iframe';
$replacedHrefHtml2 = preg_replace($pattern2, $replacement2,$replacedHrefHtml );
// replacement 2
// replace the joomla class on the links with the class contentpagetitle to my fancybox     class ! fancy innit!


$pattern2 = '/readon/';
$replacement2 ='fancybox fancybox.iframe';
$replacedHrefHtml2 = preg_replace($pattern2, $replacement2,$replacedHrefHtml );
// replacement 2
// replace the joomla class on the links  with class readon to my fancybox class ! fancy innit!

$replacedHrefHtml3 = preg_replace("/<img[^>]+\>/i", "<br />(Plaatje)<br /><br /> ",         $replacedHrefHtml2); 
// finally remove the images from the string !


$replacedHrefHtml4 = base64_encode($replacedHrefHtml3);
// encode the html with base64 before store to mysel 
// real escape wont work since it will break the links !

 try {
$conn = new PDO($link, $pdo_username, $pdo_password);
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$data222 = $conn->query('SELECT * FROM svvnieuws ORDER BY id DESC LIMIT 1');

foreach($data222 as $row) { 

 $lastitem = sprintf($row[inhoud]);

   }
 } catch(PDOException $e) {
echo 'ERROR: ' . $e->getMessage();
}                        
// get the last stored item in db for comparisation to current result!

if ($replacedHrefHtml4 == $lastitem){
// if the last item from the db is the same, do not store a new item ! importand to prevent clutter !

}
else {
// if its not the same, store a new item !

$conn = new PDO($link, $pdo_username, $pdo_password);
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// set up the connection to the db

$sql='INSERT INTO svvnieuws (id,inhoud) VALUES ("","'.$replacedHrefHtml4.'")';
// set the mysql query string

$rip = $conn->prepare($sql);
$rip->execute(array(':id'=>$id,
              ':inhoud'=>$replacedHrefHtml4
              ));
// insert to the db !

}
// close the else !

// place this file outside of the docroot, and let the cron run it every say 4 hours. 
// ofcourse make sure you also place dom.php in the same directory!
// dom.php is my short name for php simple html dom.

因此,replace 1 将
<a href="whatver"> 替换为 <a href="www.remotesite.com"> 替换 2 将 href 上的类替换为 fancybox 替换 3 将 readon 链接上的类替换到 fancybox 与上一个比较存储的项目,如果不同的存储它。

我很想知道如何修复损坏的链接而不是替换它们。该站点的链接源代码如下:<a href="/index.php?blabla">如果可能的话,我如何将 www.mysite.com 注入 <a href="/index.php ?blabla"> 制作 <a href="www.remotesite.com/index.php?blabla">

于 2013-06-25T08:54:22.227 回答