0

我正在创建一个 RSS 提要聚合器,它不仅可以通过访问每个链接来检索帖子的描述,还可以检索帖子的全部内容。我正在使用 stristr 过滤帖子中不需要的信息,例如 facebook、twitter 关注者和其他内容。它非常适合一个人饲料并且不能与其他人一起使用。这是我的代码:

<?php
function getcontent($l,$b,$c)
{
    $dom=file_get_html($l);
    $atitle=$dom->find($b);
    $content=$dom->find($c);
    $contents=implode(" ",$content);
foreach($atitle as $t)
            {
                echo "<b>".$t."</b>";

            }
            echo "<br /><br />";
        echo $contents;
        echo "<br />";
}
function filtercontent($strip,$l,$b,$c)
{
    $dom=file_get_html($l);
    $atitle=$dom->find($b);
    $content=$dom->find($c);
    $contents=implode(" ",$content);
    $contents=stristr($contents,$strip,true);
    foreach($atitle as $t)
            {
                echo "<b>".$t."</b>";

            }
            echo "<br />";
            echo $contents;
            echo "<br /><br />";

}
ini_set('default_charset', 'UTF-8');
ini_set('max_execution_time',0);
ini_set('memory_limit', -1);
include("simple_html_dom.php");

$url=array("http://www.deccanherald.com/rss/news.rss","http://syndication.indianexpress.com/rss/798/latest-news.xml");

$atitle=NULL;
$content=NULL;
foreach($url as $feed)
{
    $f=$feed;
    $feed=simplexml_load_file($feed);
    //echo $feed;
    if($feed)
    {
        //$feed_title=$feed->channel->title;
        //echo "<br />".$feed_title."<br />";
        $items=$feed->channel->item;
        foreach($items as $item)
        {
            //foreach($keywords as $key)
            //{
            //if(strtolower($item->description)==$key || strtolower($item->title)==$key)
            //{

        $title=$item->title;
        //echo "<h1><b>".$title."</b></h1><br />";
        $link=$item->link;
        //echo "<a href='".$link."'>".$link."</a><br />";
        $des=$item->description;
        //echo "<br />".$des."<br />";


            if($f=="http://beta.thehindu.com/news/?service=rss")
            {
            $title_class=".detail-title";
            $content_class=".body";
            getcontent($link,$title_class,$content_class);

            }
            if($f=="http://in.news.yahoo.com/rss/national/")
            {
            $title_class=".headline";
            $content_class=".yom-art-content";
            getcontent($link,$title_class,$content_class);
            }


        if($f=="http://syndication.indianexpress.com/rss/798/latest-news.xml")
            {

            $link=$link."0";
            $title_class=".headstory";
            $content_class=".contentLeftbigstory";
            $strip='<div class="paginationNew">';
            filtercontent($strip,$link,$title_class,$content_class);

            }
            if($f=="http://www.indiatvnews.com/rssfeed/india_news.xml")
            {

            $title_class=".topstorytitsub";
            $content_class=".standard";
            foreach($link as $post)
            {
                $dom=file_get_html($link);
                $title=$dom->find($title_class);
                $content=$dom->find('div[style=min-height:350px]');
                foreach($title as $t)
                echo "<b>".$t."</b><br />";
                foreach($content as $c)
                {
                    echo $c;

                }

            }


            }
            if($f=="http://beta.thehindu.com/news/?service=rss")
            {
            $title_class=".detail-title";
            $content_class=".body";
            getcontent($link,$title_class,$content_class);

            }
            if($f=="http://www.deccanherald.com/rss/news.rss")
            {
            $title_class=".newsText";
            $content_class=".postedBy";
            $strip='<a href="#top" class="gototop">Go to Top</a>';
            filtercontent($strip,$link,$title_class,$content_class);            
            }


            }
    }
        }


?> 

我使用简单的html dom解析器来解析html。filtercontent函数将一段字符串作为输入,而不是其他输入。这个称为strip的字符串用于过滤并返回第一次出现strip字符串之前的所有内容。它工作得很好使用 syndication.com 提要,但使用 deccanherald.com 提要失败。为了便于理解,我排除了其他提要,其他提要也使用 getcontent 功能,效果很好。deccan herald 中帖子的示例来源是:

<h1>Crazy star Ravichandran takes potshots at TV channels</h1>

                                                            <div class="postedBy">Mysore, September 28, 2012, DHNS:
                                                                                            <p>Actor opens ‘Conflux 2012’ media fest at Mahajana’s college in city</p>
                                                        <a name="top"></a>

                                                        <p><p><strong>When actor, director and producer of Kannada filmdom V&#8200;Ravichandran was invited to inaugurate &lsquo;Conflux 2012&rsquo; a two-day inter-collegiate media and communication fest of&#8200;SBRR&#8200;Mahajana First&#8200;Grade College in the city on Friday, many would have thought it contrasting.</strong><br /><br />However, when Ravi as he is popular among his acolytes, took over the dais and addressed the gathering where youngsters topped others, the choice of selecting Ravichandran to open the fest seemed apt. <br /><br />Mincing no words, the actor nick named &lsquo;Crazy Star&rsquo; made a relevant remark taking potshots at the electronic media for opting negativism rather than positive aspects to up their television rating points (TRP). Taking the names of two channels in Kannada, the actor said they are indulging in taking the people for a ride with concocted facts.<br /><br /> More than that, almost all the channels are airing moribund programmes. Said&#8200;Ravichandran; &ldquo; Pen is mightier than sword and show your talent in reaching the people and guide them.&rdquo;<br /><br />On filmdom, Ravichandran said that the fans still want him to romance heroines like what he did in Premaloka and other flicks. &ldquo;&#8200;I have already turned 50&rdquo;, said&#8200;Ravichandran making it clear that he cannot redo what he did in the past.&#8200;Referring to &lsquo;Manjina Hani&rsquo; the most awaited movie from his banner from the past several years, the actor said &lsquo;he is discovering the man in him&rsquo;.  <br /><br />Earlier, it was a filmy welcome to the actor. No sooner he entered the hall, pat filled the air an all time hit song from Ranadheera; baa baaro ranadheera...  <br /><br />Principal of the college&#8200;Prof K&#8200;V&#8200;Prabhakar said students from as many as 18 colleges from several parts of the State are participating in the fest.</p><p>To avoid chaos, the management had prohibited the entry of outsiders (especially students). <br /><br />Barring the participants, dignitaries and media, others were not allowed with students of the college keeping a tab on the visitors at the main gate of Vivekananda Hall of the college.<br /><br />Jayalakshmipuram police had to disperse the mad crowd who had dared to assemble in front of the hall.<br /><br />Chairman of&#8200;Mahajana Education Society R&#8200;Vasudevamurthy, HoD, mass communication and journalism Nivedita and others were present.<br /><br /><strong>Supports Cauvery stir</strong><br /><br />Actor&#8200;Ravichandran on&#8200;Friday extended support to ongoing agitation against the centre&rsquo;s directive to State to release 9,000 cusec of water to Tamil Nadu. On Karnataka bandh call given by various organisations on October 6 over the same issue, the actor said he too will support following Karnataka&#8200;Film&#8200;Chamber of Commerce&rsquo;s (KFCC) similar announcement. &ldquo;When the State itself is facing acute water shortage, how can we release water to them&rdquo;, the actor asserted. He also denied any interests to join politics saying; nange rajakeeya barolla (I don&rsquo;t know politics).</p></p>

                            <p class="gotoTop"><a href="#top" class="gototop">Go to Top</a></p>


                            <div class="socialNetworkingLinks">
                                 <a href="http://www.deccanherald.com/tell_a_friend.php?id=281782" style="margin-left:-5px;"><img src="http://www.deccanherald.com/images/email.jpg" alt="" border="0" /></a> 
                                <a href="#" onClick="javascript:window.print();"><img src="http://www.deccanherald.com/images/print.jpg" alt="" border="0" onClick="javascript:window.print();" /></a> 
                                <a href="javascript:addToFavorites()"><img src="http://www.deccanherald.com/images/bookmark.jpg" alt="" border="0" /></a>

我也使用过,$strip='<p class="gotoTop">'但没有任何效果,一切都返回结果与顶部和社交工具栏。为什么它不起作用。我的代码有什么问题。它适用于一个提要,但不适用于另一个提要。请帮我解决这个问题。$strip='<div class="socialNetworkingLinks">'$strip="Go to Top"

截屏: 在此处输入图像描述

我想删除从“转到顶部”开始的内容。

4

1 回答 1

0

我认为问题出在$content_class=".postedBy";. 该类中唯一的东西是Mysore, September 28, 2012, DHNS:, 不匹配$strip

编辑:

PostedBy DIV 看起来像:

<div class="postedBy">Mysore, September 28, 2012, DHNS:</div>

它不包括文章的正文。

于 2012-09-28T19:13:57.217 回答