php - 为什么这个 php 爬虫不起作用？

Question

在我的本地主机文档根目录中：

爬行.html

<html>
<body>
<p>
<form action="welcome.php" method="get">
Site to crawl: <input type="text" name="crawlThis">
<input type="submit">
</form>
</p>

</body>
</html>

欢迎.php

 <html>
 <body>

 <?php 
 include ("crawler.php");

 echo $crawl = new Crawler($_GET["crawlThis"]);

 $images = $crawl->get("images");

 $links = $crawl->get("links"); 

 echo $links;
 echo $images;

 ?>
 <br>

</body>
</html>

和 crawler.php

<?php

class Crawler {

protected $markup = '';

public function __construct($uri) {

$this->markup = $this->getMarkup($uri);

}

public function getMarkup($uri) {

return file_get_contents($uri);

}

public function get($type) {

$method = "_get_{$type}";

if (method_exists($this, $method)){

return call_user_method($method, $this);

}

}

protected function _get_images() {

if (!empty($this->markup)){

preg_match_all('/<img([^>]+)\/>/i', $this->markup, $images);

return !empty($images[1]) ? $images[1] : FALSE;

}

}

protected function _get_links() {

if (!empty($this->markup)){

preg_match_all('/<a([^>]+)\>(.*?)\<\/a\>/i', $this->markup, $links);

return !empty($links[1]) ? $links[1] : FALSE;

}

}

}


/*$crawl = new Crawler($);

$images = $crawl->get('images');

$links = $crawl->get('links');*/

?>

结果页面只是空的。无法弄清楚我是否无法回显 $images，或者我的逻辑是否错误。我期待一个图像列表，然后是一个链接列表。

另外，我必须包含 crawler.php 还是 php 会在其容器目录中搜索同名的类？

抱歉，从 Java 转到 PHP 有点令人费解。

score 1 · Accepted Answer

1

于 2012-12-24T21:44:49.143 回答

score 0 · Accepted Answer

我完全赞成自己编写，但是有很多记录在案的示例可以做到这一点。这是您可以遵循或使用的一个很好的示例：

爬虫示例

php - 为什么这个 php 爬虫不起作用？

在我的本地主机文档根目录中：

2 回答 2

Related

Reference