php - PHP解析外部站点

Question

我在解析外部 url 以从中获取一些数据方面没有任何经验，但今天我尝试了一些实验：

$str1 = file_get_contents('http://www.indiegogo.com/projects/ubuntu-edge');
$test1 = strstr($str1, "amount medium clearfix");
$parts = explode(">",$test1);
$parts2 = vsprintf("%s", $parts[1]);

$str2 = file_get_contents('http://www.indiegogo.com/projects/ubuntu-edge');
$test2 = strstr($str2, "money-raised goal");
$test3 = str_ireplace("money-raised goal", "", "$test2");
$test4 = str_ireplace("\"", "", "$test3");
$test5 = str_ireplace(">", "", "$test4");
$test6 = substr($test5, 0, 29);
$test7 = explode("Raised of", $test6);
$test8 = vsprintf("%s", $test7[1]);

尝试以下代码：

print_r($parts2); 然后与print_r($test8);然后与echo "$parts2 - $test8";

因为现在 Ubuntu Edge 活动非常受欢迎，所以我尝试从网站上获取这两个字段（仅作为实验），但没有成功。好吧，它抓住了这两个领域，但我不能把两者都放在同一个变量中。输出是或 $parts2，或 $parts2 包含 test8 的值，或仅包含 $test8。

我做错了什么，为什么？还有一个更简单的方法来做我想要的，没有这么多代码？

score 2 · Accepted Answer

好吧，它抓住了这两个领域，但我不能把两者都放在同一个变量中。

不确定你的意思。

还有一个更简单的方法来做我想要的，没有这么多代码？

没有那么多代码？不。更灵活和（可能）高效？是的。

试试这个并根据自己的喜好定制

<?php
$page = file_get_contents('http://www.indiegogo.com/projects/ubuntu-edge');

$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($page);

$finder = new DomXPath($doc);

// find class="money-raised"
$nodes = $finder->query("//*[contains(@class, 'money-raised')]");

// get the children of the first match  (class="money-raised")
$raised_children = $nodes->item(0)->childNodes;

// get the children of the second match (class="money-raised goal")
$goal_children = $nodes->item(1)->childNodes;

// get the amount value
$money_earned = $raised_children->item(1)->nodeValue;

// get the amount value
preg_match('/\$[\d,]+/', $goal_children->item(0)->nodeValue, $m);
$money_earned_goal = $m[0];


echo "Money earned: $money_earned\n";
echo "Goal: $money_earned_goal\n";

?>

这有 11 行代码没有echos（与您的 12 行相比），但只调用另一个站点一次。抓取网站是一项有点复杂的任务。此代码从该确切页面获得您想要的值。

如果您想抓取网站，我强烈建议您学习使用DOMDocument和DOMXPath。有很多东西要学，但值得努力。

php - PHP解析外部站点

1 回答 1

Related

Reference