1

I have to do a Scraper that will scrap about 100 URL's, the Scraper must to run in a PHP CLI called by a CronJob. I'm totally lost on how to manage this... for each URL I'm thinking to create a new file just to get things clear when I must to update code for a specific URL.

This could be a good option? Then, it is possible to call all this files from a single CronJob?

4

2 回答 2

5

You would want those 100 urls to be managed easily, by storing them in a database or a text-file. Then simply load all the urls, loop through them and call your scrape function.

于 2011-01-04T10:56:08.443 回答
0

你能做的是,

在数据库中维护所有 100 个 URL 的列表以及别名名称(可以是任何名称,例如http://google.com的“Google” )。

使用以下命名约定“别名名称.php”为每个 URL 创建文件,编写代码以解析该文件中的 URL。

现在您可以调用一个 Cronjob,它将从数据库中检索您的所有 URL。您可以遍历每个 URL 并执行具有相应别名名称的文件。

例如。如果您的 URL 是: http: //google.com并且它的别名是Google。为此,您需要创建名为 Google.php 的文件,编写报废代码。在 cron 工作中,您将拥有类似的代码

$urls = getAllURLs();    
foreach($urls as $url){

include_once($url['alias'].".php");

}

希望这会有所帮助。

谢谢!

侯赛因

于 2011-01-04T11:14:10.833 回答