1

我正在尝试从链接解析 RSS 提要。这是我的代码:

            $content = file_get_contents($this->feed);     
            print_r($content);   
            $rss = new SimpleXmlElement($content);
            print_r($rss);
            $rss_split = array();
           /* foreach ($rss->channel->item as $item) {
                $title = (string) $item->title; // Title
                $link = (string) $item->link; // Url Link
                $description = (string) $item->description; //Description               
                $rss_split[] = '<div><a href="' . $link . '" target="_blank" title="" >' . $title . ' </a><hr></div>';
            }*/

完整的 XML 正在从这里下载: http: //devilsworkshop.org/feed/

下面是一个摘录来说明结构:

<item>
    <title>Windows 8 Appstore resembles a ghost town</title>
    <link>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/</link>
    <comments>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/#comments</comments>
    <pubDate>Tue, 18 Sep 2012 05:30:22 +0000</pubDate>
    <dc:creator>Vibin</dc:creator>
    <category><![CDATA[Analysis]]></category>
    <category><![CDATA[Windows 8]]></category>

    <guid isPermaLink="false">http://devilsworkshop.org/?p=62284</guid>
    <description><![CDATA[<p>Microsoft is all set to release Windows 8 for public in the coming weeks. Apparently, the biggest change in Windows 8 seems to be the Metro UI (I know it&#8217;s no more called Metro, but let&#8217;s keep it like that [...]</p><p>--
            This Post <a href="http://devilsworkshop.org/windows-appstore-resembles-ghost-town/">Windows 8 Appstore resembles a ghost town</a> is Published on <a href="http://devilsworkshop.org">Devils Workshop</a> .
        </p><h3>Related posts:</h3><ul>
            <li><a href='http://devilsworkshop.org/googles-new-look-resembles-yahoo-search/' rel='bookmark' title='Google&#8217;s new look resembles Yahoo Search'>Google&#8217;s new look resembles Yahoo Search</a></li>
        </ul>]]></description>
    <content:encoded><![CDATA[<p>Microsoft is all set to release Windows 8 for public in the coming weeks. Apparently, the biggest change in Windows 8 seems to be the Metro UI (I know it&#8217;s no more called Metro, but let&#8217;s keep it like that for simplicity) and apps.</p>
        <ul>
        <h2>Apps are less advanced</h2>
        <p>Metro is great on tablets, but on desktop, it looks like an OS with dumbed down apps. Take Skitch for example, it is an app for taking and editing screenshots and was previously a Mac-only app but recently came to Windows 8. Just compare these two apps and you&#8217;ll know what I meant.</p>
        <p>Here&#8217;s how Skitch looks in Windows 8:</p>
        <p><a href="http://devilsworkshop.org/files/2012/09/SkitchinWindows8.png"><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-full wp-image-62302" title="SkitchinWindows8" src="http://devilsworkshop.org/files/2012/09/SkitchinWindows8.png" alt="" width="740" height="570" /></a></p>
        <p>And now, this is the Mac version of Skitch:</p>
        <p><a href="http://devilsworkshop.org/files/2012/09/SkitchinMac.png"><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-full wp-image-62301" title="SkitchinMac" src="http://devilsworkshop.org/files/2012/09/SkitchinMac.png" alt="" width="671" height="575" /></a></p>
        <p>Another example can be Newsmix, an app which will let you read stuff that matters to you &#8211; in a Magazine layout. Apparently, this app is a fail for someone like me who subscribe to 50+ blogs.</p>
        <p><a href="http://devilsworkshop.org/files/2012/09/NewsmixinWindows8.png"><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-large wp-image-62305" title="NewsMix in Windows 8" src="http://devilsworkshop.org/files/2012/09/NewsmixinWindows8-1024x640.png" alt="news-mix-windows-8" width="620" height="387" /></a><br />
            Sure, it will be great on a Windows slate, but not really on a PC/laptop.</p>
        <li><a href='http://devilsworkshop.org/how-to-enable-hibernate-option-in-windows-vistawindows-7/' rel='bookmark' title='How to enable Hibernate Option in Windows Vista/Windows 7'>How to enable Hibernate Option in Windows Vista/Windows 7</a></li>
        <li><a href='http://devilsworkshop.org/windows-store/' rel='bookmark' title='Microsoft to Introduce Windows Store with Windows 8 Platform'>Microsoft to Introduce Windows Store with Windows 8 Platform</a></li>
        </ul>]]>
    </content:encoded>          
    <wfw:commentRss>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/feed/</wfw:commentRss>
    <slash:comments>0</slash:comments>
</item>

当我打印$content时,它会显示content:encoded标签中的图像。但是打印$rss根本没有显示该标签,并且描述标签也显示了SimpleXMLElement Object()

我想解析两个标签。我在哪里做错了?

4

3 回答 3

2

首先,print_r()对于预测 SimpleXML 对象的行为方式不是一个好的选择,因为它们不是“正常”的 PHP 对象。你可以试试我的simplexml_dump()函数,它列出了特定节点或节点列表的内容、子节点和属性。

其次,该content:encoded元素位于命名空间content中,因此您需要告诉 SimpleXML 访问该命名空间中的节点,而不是使用默认的->children()方法。例如echo $item->children('content', true)->encoded;

于 2012-09-18T12:21:53.420 回答
1

当然打印$rss并没有显示数据..它显示了它的含义,因为它本身确实是一个SimpleXMLElement Object.

但是,据我所知,您的 xml 文档无法解析,因为它无效UTF-8。把它复制给我的客户,并梳理它,我发现了一堆xA0x92字符。

在用相应的字符(空格和撇号)替换它们并保存文档后,它解析得很好。

这肯定是你的问题。

此问题的解决方案如下:

$char_arr = array('/\xa0/','/\x92/','/\x96/');
$rep_arr = array('&nbsp;','\'','-');
$content = preg_replace($char_arr, $rep_arr, $content);

确保将此代码放在声明您的 simpleXML 对象之前:

$content = file_get_contents($this->feed);     
print_r($content);
$char_arr = array('/\xa0/','/\x92/','/\x96/');
$rep_arr = array('&nbsp;','\'','-');
$content = preg_replace($char_arr, $rep_arr, $content);
$rss = new SimpleXmlElement($content);

那应该可以解决您的问题;我自己测试过,它对我有用。

于 2012-09-18T07:28:11.987 回答
0

感谢 IMSoP 的回答,我直接访问了http://php.net/simplexml,在那里找到并使用了 xaviered_at gmail_dot_com 的 xmlObjToArr($obj) 函数来解决同样的问题。

对于那些仍在寻找一种在 content:encoded 之间标记内容的简单方法的人来说,这是一个简短而明显的脚本

<?php

echo "<pre>";

$url = "http://devilsworkshop.org/feed/";
$rss = simplexml_load_file($url);

if($rss){

    $items = $rss->channel->item;

    foreach($items as $item){

        $title = $item->title;
        $image = $item->image;
        $link = $item->link;
        $published_on = $item->pubDate;
        $description = $item->description;

        // bringing in to array <content:encoded> items from SimpleXMLElement Object()
        $content = xmlObjToArr($item->children('content', true)->encoded);


        echo "

        title: $title
        image: $image
        link: $link
        published on: $published_on
        description: $description
        content: 
        ";

        print_r($content);

    }
}


function xmlObjToArr($obj) {
        $namespace = $obj->getDocNamespaces(true);
        $namespace[NULL] = NULL;

        $children = array();
        $attributes = array();
        $name = strtolower((string)$obj->getName());

        $text = trim((string)$obj);
        if( strlen($text) <= 0 ) {
            $text = NULL;
        }

        // get info for all namespaces
        if(is_object($obj)) {
            foreach( $namespace as $ns=>$nsUrl ) {
                // atributes
                $objAttributes = $obj->attributes($ns, true);
                foreach( $objAttributes as $attributeName => $attributeValue ) {
                    $attribName = strtolower(trim((string)$attributeName));
                    $attribVal = trim((string)$attributeValue);
                    if (!empty($ns)) {
                        $attribName = $ns . ':' . $attribName;
                    }
                    $attributes[$attribName] = $attribVal;
                }

                // children
                $objChildren = $obj->children($ns, true);
                foreach( $objChildren as $childName=>$child ) {
                    $childName = strtolower((string)$childName);
                    if( !empty($ns) ) {
                        $childName = $ns.':'.$childName;
                    }
                    $children[$childName][] = xmlObjToArr($child);
                }
            }
        }

        return array(
            'name'=>$name,
            'text'=>$text,
            'attributes'=>$attributes,
            'children'=>$children
        );
    }


?>
于 2013-04-23T14:16:41.257 回答