0

我正在尝试从这里XML 的描述字段中删除社交媒体按钮,只留下段落(它太大了,无法在此处发布)。

编辑:由于你们中的一些人无法访问 XML,请遵循描述标签之一的一部分:

    <description>
 <!-- TWITTER https://twitter.com/about/resources/buttons#tweet --> <script> document.write('<a href="https://www.twitter.com/tst_oficial" class="twitter-follow-button" data-show-count="false" data-lang="pt">Seguir</a>'); !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
 <!-- CURTIR SITE FACEBOOK (Enviar) --> <iframe class="fb_ltr" src="http://www.facebook.com/plugins/like.php?href=https://www.facebook.com/TSTJus&layout=button_count&show_faces=false&action=like&colorscheme=light&width=25&height=25&locale=pt_BR" scrolling="no" frameborder="0" style="border:0px; margin-left:30px; overflow:hidden; width:120px; height:25px;vertical-align:bottom;" allowTransparency="true"></iframe>
 <!-- GOOGLE PLUS +1--> <script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script> 
 <g:plusone size="medium" href="https://plus.google.com/103151838647081346830" style="border-left:-200px"></g:plusone>
 </div> </br></br> 
 <div class="modelo_noticia">
  <div>
   <div style="float: left; width:47%; text-align:center; margin: 0 9px 0 0;"><a href="/image/journal/article?img_id=5733388&t=1377023456174" target="_blank" style="text-decoration:none; color:black;"><img src="/image/journal/article?img_id=5733388&t=1377023456174" style="margin: 0 5px; width:98%;"/><span style="font-style:italic;"></span> </a></div>
   <p> &nbsp;</p>
   <p style="text-align: justify;"> <span style="font-size:12px;">"A CLT continua atual enq...a.</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">...or.</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">O min...do".</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">Ca...as".</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">Ao enc...izou.</span></p> 
   <p style="text-align: justify;"> <span style="font-size:12px;">Também parti...o.</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">Ao a...ócio".</span></p> 
   <p style="text-align: justify;"> <span style="font-size:12px;"><strong>Debate: reforma na CLT</strong></span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">O min...s.</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">Ao...disse.</span></p>
   <p style="text-align: justify;"> <span style="font-size:12px;">O m...o o país". &nbsp;&nbsp;</span></p>  <p style="text-align: justify;"> <span style="font-size:12px;">(Fernanda Loureiro)</span></p>
  </div>
  <div style="clear:both;"></div>
 </div>
 <DIV style="vertical-align:bottom !important">
  <!-- FACEBOOK CURTIR --> <!-- <script src="http://connect.facebook.net/pt_BR/all.js#xfbml=1"></script>
  <fb:like layout="button_count" show_faces="true" width="80"></fb:like>-->
  <iframe class="fb_ltr" src="http://www.facebook.com/plugins/like.php?href=http://www.tst.jus.br/noticias/-/asset_publisher/89Dk/content/{rss=true}&layout=button_count&show_faces=false&action=like&colorscheme=light&width=25&height=25&locale=pt_BR" scrolling="no" frameborder="0" style="border:none;border:0;margin-left:0; overflow:hidden; width:95px; height:25px;horizontal-align:left;vertical-align:bottom;" allowTransparency="true"></iframe>
  <!-- TWITTAR --> <span style="margin-left:20px;"> <script type="text/javascript"> var endereco; endereco = window.location.href; document.write('<a href="http://twitter.com/share?url=' + endereco + '" class="twitter-share-button" data-text="Presidente do TST diz que trabalho precisa ser valorizado sem perda de competitividade" data-count="horizontal" data-via="tst_oficial">Tweet</a>') </script><script type="text/javascript" src="http://platform.twitter.com/widgets.js"></script> </span>
  <!-- OK FACEBOOK Recomendar --> <!--<iframe id="f2ee48257c" name="f1f8d54994" frameborder="0" scrolling="no" style="border: none; overflow: hidden; height: 20px; width: 200px;" title="Like this content on Facebook." class="fb_ltr" src="http://www.facebook.com/plugins/like.php?api_key=228619377180035&amp;locale=pt_BR&sdk=joey&channel_url=http://www.facebook.com/TSTJus?fref=ts&version=18%23cb%3Df360a99c9c&origin=http://www.tst.jus.br/noticias&href=http://www.tst.jus.br/noticias%26relation%3Dparent.parent&node_type=link&width=150&font=arial&layout=button_count&colorscheme=light&show_faces=false&send=true&extended_social_context=false&action=recommend" allowTransparency="true"></iframe>-->
  <iframe border="0" frameborder="0" scrolling="no" class="fb_ltr" id="f2ee48257c" name="f1f8d54994" style="border:none;margin-left:0; overflow:hidden; width:200px; height:25px;horizontal-align:left;vertical-align:bottom;" allowTransparency="true" title="Enviar notícia no Facebook" class="fb_ltr" src="http://www.facebook.com/plugins/like.php?api_key=228619377180035&locale=pt_BR&sdk=joey&channel_url=http://www.tst.jus.br/noticias%3Fversion%3D18%23cb%3Df360a99c9c%26origin%3Dhttp://www.tst.jus.br/noticias%26relation%3Dparent.parent&amp;href=http://www.tst.jus.br/noticias&node_type=link&amp;width=150&amp;font=arial&amp;layout=button_count&amp;colorscheme=light&show_faces=false&send=true&amp;extended_social_context=false&action=recommend"></iframe> 
  <!-- YOUTUBE --> <a href="http://www.youtube.com/tst" target="_blank"> <img src="http://www.tst.jus.br/image/image_gallery?uuid=49d1dfeb-fba6-48be-9984-c2ba7dac709e&groupId=10157&t=1359131490760" border="0" title="Inscrição no Canal Youtube do TST" alt="Inscrição no Canal Youtube do TST"></a>
 </DIV> </br>
</description>

我已经尝试过使用正则表达式,但只能得到第一段('#<p[^>]*>(.*)</p>#isU')。使用 SimpleXmlElement、DOM,我不断收到错误(我对它们了解不多,但它们似乎是最好的方法),最后是 HTMLPurifier,它过滤所有内容并且不返回任何相关内容。

这是我最后的做法(按照 Puggan Se 的建议):

$i=0;
$feed= '<XML STRING>'; //The whole XML string here
$dom = new DOMDocument(); //declaring DOMDocument
$dom->preserveWhiteSpace = false; //removing spaces
$dom->loadXML($feed, LIBXML_PARSEHUGE); //LIBXML_PARSEHUGE for long XMLs
$dom->formatOutput = true; // for a nice output ??

$xml = new DOMXPath($dom); //declaring the XPath

$xml->registerNamespace('a','http://purl.org/dc/elements/1.1/'); //getting the namespace from the XML

//evaluates
$source = $xml->evaluate("//channel/title");
$titles = $xml->evaluate("//item/title");
$links = $xml->evaluate("//item/link");
$dates = $xml->evaluate("//item/dc:date");
$descriptions = $xml->evaluate("//item/description");

//echoing channel's title
 if($source->length > 0) {
 $source= $source->item(0)->nodeValue;
 echo $source. '<br /><br />';
 }

//echoing the items
 foreach($titles as $title) {
  echo "{$titles->item($i)->nodeValue}<br /><br />";
  echo "{$links->item($i)->nodeValue}<br /><br />";
  echo "{$dates->item($i)->nodeValue}<br /><br />";
  //filtering only <p><span> text from <description>
  $description = "{$descriptions->item($i)->nodeValue} ";
  $description = mb_convert_encoding($conteudo, 'html-entities', 'utf-8'); 
  unset($domtmp);
  $domtmp = new DOMDocument();
  $domtmp->loadHTML($description );
  $xmltmp = new DOMXPath($domtmp);
  $desc= $xmltmp->evaluate("//p/span");
   foreach($desc as $node) {
    echo "<p>{$node->nodeValue}</p>";
   }
  $i++;
 }

你知道我该如何改进吗?

非常感谢你的帮助!

4

0 回答 0