0

我正在尝试从 URL 获取文本。谁能帮我。

$news1 = "http://www.espncricinfo.com/icc-womens-world-cup-2013/content/story/604808.html";     
$a=preg_match_all("/\<p class\=['\"]news-body['\"]\>(.*?)\<\/p\>/",$news1,$b);
echo $a;
print_r($b[1]);

它返回 0 Array()。如果有人可以提供帮助,将不胜感激。

下面的一些HTML:

<p class="news-body">
New Zealand captain, Suzie Bates, also spoke of how the sides had played a competitive         game but said intensity levels weren't the same after the dispiriting news came in. Bates felt    it would have been better to have not known the result of the other match. 
</p>
<p class="news-body">
It was a particularly shattering end for the holders England, who went out of the            tournament without having had a single really poor game. Their defeats to Sri Lanka and Australia were by one wicket - off the last ball - and two runs. Edwards, however, refused to offer any excuses and said England had paid for their "slow start" to the tournament, beginning with the shock loss to Sri Lanka.
</p>
<p class="news-body">
"We had come here to win this tournament and we haven't. We haven't even got to the final," Edwards said. "That is disappointing for us as a group of players. We were very inconsistent in the first phase of the tournament and are probably now playing our best cricket, which is too late. We prepared well. We have no excuses. We didn't play well. We didn't hold our catches against Sri Lanka."
</p>
4

1 回答 1

2
// Fetch the content
$html = file_get_contents('http://www.espncricinfo.com/icc-womens-world-cup-2013/content/story/604808.html');

// Load the HTML into DOM
$libxml_use_internal_errors = libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($html);
libxml_use_internal_errors($libxml_use_internal_errors); // note this may ruin any custom error handlers

// Load the DOM into SimpleXML
$simple = simplexml_import_dom($dom);

// Xpath the document
$news = $simple->xpath('//p[@class="news-body"]');

// Echo the results
foreach($news as $p)
{
  echo "<p>$p</p>";
}
于 2013-02-14T03:30:51.410 回答