php - SimpleXMLElement 转换为字符串不表现为字符串

Question

好的，我正在使用 SimpleXML 来解析 RSS 提要，并且由于许多提要包含嵌入的 html，我希望能够隔离嵌入的 html 中包含的任何图像地址。听起来很简单，但我在解析 SimpleXMLElement 对象中的数据时遇到了问题。这是相关的代码。

for($i = 0; $i < count($articles); $i++) {
    foreach($articles[$i] as $feedDeet) {
        $str = (string)$feedDeet;
        $result = strpos($str, '"');
        if($result === false) {
            echo 'There are apparently no quotes in this string: '.$str;
        }
        $explodedString = explode('"', $str);
        echo "<br>";
        if($explodedString[0] == $str) {
            echo 'ExplodedString is equal to str. Apparently, once again, the string contains no quotes.';
        }
        echo "<hr>";
    }
}

在这种情况下，$articles 是一个 SimpleXMLElement 对象数组，每个对象代表一个 RSS 文章，并包含许多表示该文章的属性和详细信息的子 SimpleXMLElement 对象。基本上，我想逐个遍历这些属性，将它们转换为字符串，然后使用任何引号作为分隔符来分解字符串（因为任何图像地址都将包含在引号内）。然后，我将解析分解后的数组并搜索任何看起来是图像地址的字符串。但是，explode() 和 strpos() 的行为都不像我预期的那样。举例说明我的意思，上述代码的输出之一如下：

There are apparently no quotes in this string: <p style="text-align: center;"><img class="alignnone size-full wp-image-243922" alt="gold iPhone Shop Le Monde" src="http://media.idownloadblog.com/wp-content/uploads/2013/08/gold-iPhone-Shop-Le-Monde.jpg" width="593" height="515" /></p> <p>Folks still holding out hope that the gold iPhone rumors aren’t true may want to brace themselves, the speculation has just been confirmed by the Wall Street Journal-owned blog AllThingsD. And given the site’s near perfect (perfect?) track record with predicting future Apple plans, and <a href="http://www.idownloadblog.com/2013/08/16/is-this-apples-gold-colored-iphone-5s/">corroborating evidence</a>, we’d say Apple is indeed going for the gold…(...)<br/>Read the rest of <a href="http://www.idownloadblog.com/2013/08/19/allthingsd-gold-iphone-yes/">AllThingsD confirms gold iPhone coming</a></p> <hr /> <p><small> "<a href="http://www.idownloadblog.com/2013/08/19/allthingsd-gold-iphone-yes/">AllThingsD confirms gold iPhone coming</a>" is an article by <a href="http://www.idownloadblog.com">iDownloadBlog.com</a>. <br/>Make sure to <a href="http://twitter.com/iDownloadBlog">follow us on Twitter</a>, <a href="http://www.facebook.com/iPhoneDownloadBlog">Facebook</a>, and <a href="https://plus.google.com/u/0/b/111910843959038324995/">Google+</a>. </small></p>
ExplodedString is equal to str. Apparently, once again, the string contains no quotes.

抱歉，如果这有点难以阅读，它是从输出中逐字复制的。

如您所见，有问题的字符串中有明显的引号，然而，strpos返回false，表示找不到指定的字符串，explode返回一个包含原始字符串的数组，表示指定的分隔符可以找不到。这里发生了什么？我已经被这件事难住了好几个小时，我觉得我正在失去理智。

谢谢！

score 1 · Accepted Answer

您在这里犯的错误是您的调试输出是一个 HTML 页面，因此您打印的消息被浏览器解释为 HTML。要查看它们的实际内容，您要么需要查看页面源代码，要么使用<pre>标签保留空白，并htmlspecialchars()添加一层 HTML 转义：echo '<pre>' . htmlspecialchars($str) . '</pre>';

如果浏览器中的输出看起来像<p style="text-align: center;">，那么显然输入已经用 HTML 实体进行了转义，实际上可能看起来像<p style="text-align: center;">. 虽然" 看起来像"，但它不是同一个字符串，所以strpos()不会找到它。

为了撤消这一额外的转义层，您可以html_entity_decode()在处理字符串之前对其运行。

php - SimpleXMLElement 转换为字符串不表现为字符串

1 回答 1

Related

Reference