2

我使用Simple HTML DOM来抓取一个页面以获取最新消息,然后使用这个PHP 类生成一个 RSS 提要。

这就是我现在所拥有的:

<?php

 // This is a minimum example of using the class
 include("FeedWriter.php");
 include('simple_html_dom.php');

 $html = file_get_html('http://www.website.com');

foreach($html->find('td[width="380"] p table') as $article) {
$item['title'] = $article->find('span.title', 0)->innertext;
$item['description'] = $article->find('.ingress', 0)->innertext;
$item['link'] = $article->find('.lesMer', 0)->href;     
$item['pubDate'] = $article->find('span.presseDato', 0)->plaintext;     
$articles[] = $item;
}


//Creating an instance of FeedWriter class. 
$TestFeed = new FeedWriter(RSS2);


 //Use wrapper functions for common channel elements

 $TestFeed->setTitle('Testing & Checking the RSS writer class');
 $TestFeed->setLink('http://www.ajaxray.com/projects/rss');
 $TestFeed->setDescription('This is test of creating a RSS 2.0 feed Universal Feed Writer');

  //Image title and link must match with the 'title' and 'link' channel elements for valid RSS 2.0

  $TestFeed->setImage('Testing the RSS writer class','http://www.ajaxray.com/projects/rss','http://www.rightbrainsolution.com/images/logo.gif');


foreach($articles as $row) {

    //Create an empty FeedItem
    $newItem = $TestFeed->createNewItem();

    //Add elements to the feed item    
    $newItem->setTitle($row['title']);
    $newItem->setLink($row['link']);
    $newItem->setDate($row['pubDate']);
    $newItem->setDescription($row['description']);

    //Now add the feed item
    $TestFeed->addItem($newItem);
}

  //OK. Everything is done. Now genarate the feed.
  $TestFeed->genarateFeed();

?>

我怎样才能使这段代码更简单?知道有两个 foreach 语句,我该如何组合它们?

因为抓取的新闻是挪威语,所以我需要在标题上应用 html_entity_decode()。我在这里尝试过,但我无法让它工作:

foreach($html->find('td[width="380"] p table') as $article) {
$item['title'] = html_entity_decode($article->find('span.title', 0)->innertext, ENT_NOQUOTES, 'UTF-8');
$item['description'] = "<img src='" . $article->find('img[width="100"]', 0)->src . "'><p>" . $article->find('.ingress', 0)->innertext . "</p>";    
$item['link'] = $article->find('.lesMer', 0)->href;     
$item['pubDate'] = unix2rssdate(strtotime($article->find('span.presseDato', 0)->plaintext));
$articles[] = $item;
} 

谢谢 :)

4

4 回答 4

4

似乎您循环遍历$html以构建文章数组,然后循环遍历这些添加到提要 - 您可以通过在找到时将项目添加到提要来跳过整个循环。为此,您需要FeedWriter在执行流程中将构造函数向上移动一点。

我还会添加一些方法来帮助提高可读性,从长远来看,这可能有助于可维护性。如果您需要为提要插入不同的提供程序类、更改解析规则等,封装提要创建、项目修改等应该会更容易。可以对以下代码进行进一步改进(html_entity_decode在单独的作业等) ,$item['title']但你得到了一般的想法。

你有什么问题html_entity_decode?你有样本输入/输出吗?

<?php

 // This is a minimum example of using the class
 include("FeedWriter.php");
 include('simple_html_dom.php');

 // Create new instance of a feed
 $TestFeed = create_new_feed();

 $html = file_get_html('http://www.website.com');

 // Loop through html pulling feed items out
 foreach($html->find('td[width="380"] p table') as $article) 
 {
    // Get a parsed item
    $item = get_item_from_article($article);

    // Get the item formatted for feed
    $formatted_item = create_feed_item($TestFeed, $item);

    //Now add the feed item
    $TestFeed->addItem($formatted_item);
 }

 //OK. Everything is done. Now generate the feed.
 $TestFeed->generateFeed();


// HELPER FUNCTIONS

/**
 * Create new feed - encapsulated in method here to allow
 * for change in feed class etc
 */
function create_new_feed()
{
     //Creating an instance of FeedWriter class. 
     $TestFeed = new FeedWriter(RSS2);

     //Use wrapper functions for common channel elements
     $TestFeed->setTitle('Testing & Checking the RSS writer class');
     $TestFeed->setLink('http://www.ajaxray.com/projects/rss');
     $TestFeed->setDescription('This is test of creating a RSS 2.0 feed Universal Feed Writer');

     //Image title and link must match with the 'title' and 'link' channel elements for valid RSS 2.0
     $TestFeed->setImage('Testing the RSS writer class','http://www.ajaxray.com/projects/rss','http://www.rightbrainsolution.com/images/logo.gif');

     return $TestFeed;
}


/**
 * Take in html article segment, and convert to usable $item
 */
function get_item_from_article($article)
{
    $item['title'] = $article->find('span.title', 0)->innertext;
    $item['title'] = html_entity_decode($item['title'], ENT_NOQUOTES, 'UTF-8');

    $item['description'] = $article->find('.ingress', 0)->innertext;
    $item['link'] = $article->find('.lesMer', 0)->href;     
    $item['pubDate'] = $article->find('span.presseDato', 0)->plaintext;     

    return $item;
}


/**
 * Given an $item with feed data, create a
 * feed item
 */
function create_feed_item($TestFeed, $item)
{
    //Create an empty FeedItem
    $newItem = $TestFeed->createNewItem();

    //Add elements to the feed item    
    $newItem->setTitle($item['title']);
    $newItem->setLink($item['link']);
    $newItem->setDate($item['pubDate']);
    $newItem->setDescription($item['description']);

    return $newItem;
}
?>
于 2009-02-17T16:43:00.077 回答
3

那么对于这两个循环的简单组合,您可以通过 HTML 创建提要作为解析:

<?php
include("FeedWriter.php");
include('simple_html_dom.php');

$html = file_get_html('http://www.website.com');

//Creating an instance of FeedWriter class. 
$TestFeed = new FeedWriter(RSS2);
$TestFeed->setTitle('Testing & Checking the RSS writer class');
$TestFeed->setLink('http://www.ajaxray.com/projects/rss');
$TestFeed->setDescription(
  'This is test of creating a RSS 2.0 feed Universal Feed Writer');

$TestFeed->setImage('Testing the RSS writer class',
                    'http://www.ajaxray.com/projects/rss',
                    'http://www.rightbrainsolution.com/images/logo.gif');

//parse through the HTML and build up the RSS feed as we go along
foreach($html->find('td[width="380"] p table') as $article) {
  //Create an empty FeedItem
  $newItem = $TestFeed->createNewItem();

  //Look up and add elements to the feed item   
  $newItem->setTitle($article->find('span.title', 0)->innertext);
  $newItem->setDescription($article->find('.ingress', 0)->innertext);
  $newItem->setLink($article->find('.lesMer', 0)->href);     
  $newItem->setDate($article->find('span.presseDato', 0)->plaintext);     

  //Now add the feed item
  $TestFeed->addItem($newItem);
}

$TestFeed->genarateFeed();
?>

您看到的问题是什么html_entity_decode,如果您给我们一个链接到它不起作用的页面可能会有所帮助?

于 2009-02-17T16:31:21.540 回答
0

我怎样才能使这段代码更简单?

我知道这不是你要问的,但你知道[ http://pipes.yahoo.com/pipes/ ](雅虎!管道)吗?

于 2009-02-17T17:10:38.997 回答
0

也许您可以使用 Feedity - http://feedity.com之类的东西,它已经解决了从任何网页生成 RSS 提要的问题。

于 2009-12-10T01:15:32.837 回答