1

我有一个脚本,它需要一个 some.txt 文件并读取链接并返回我的网站反向链接是否存在。但问题是,它非常慢,我想提高它的速度。有什么办法可以提高它的速度吗?

<?php
ini_set('max_execution_time', 3000);
$source = file_get_contents("your-backlinks.txt");
$needle = "http://www.submitage.com";   //without http as I have imploded the http later in the script
$new = explode("\n",$source);
foreach ($new as $check) {
$a = file_get_contents(trim($check));
if (strpos($a,$needle)) {
$found[] = $check;
     } else {
     $notfound[] = $check;
            }
                        }
echo "Matches that were found: \n ".implode("\n",$found)."\n";
echo "Matches that were not found \n". implode("\n",$notfound);
?>
4

2 回答 2

2

您最大的瓶颈是您按顺序执行 HTTP 请求,而不是并行执行。curl能够并行执行多个请求。这是文档中的一个示例,非常适合使用循环并实际收集结果。我不能保证它是正确的,我只能保证我正确地遵循了文档:

$mh = curl_multi_init();
$handles = array();

foreach($new as $check){
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $check);
  curl_setopt($ch, CURLOPT_HEADER, 0);
  curl_multi_add_handle($mh,$ch);
  $handles[$check]=$ch;
}

// verbatim from the demo
$active = null;
//execute the handles
do {
    $mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);

while ($active && $mrc == CURLM_OK) {
    if (curl_multi_select($mh) != -1) {
        do {
            $mrc = curl_multi_exec($mh, $active);
        } while ($mrc == CURLM_CALL_MULTI_PERFORM);
    }
}
// end of verbatim code

for($handles as $check => $ch){
  $a = curl_multi_getcontent($ch)
  ...
}
于 2012-11-26T12:09:56.573 回答
0

除了一些人造多线程解决方案之外,您将无法通过优化 PHP 来提高运行速度。

但是,您可以创建一个队列系统,允许您将检查作为后台任务运行。不要在遍历 URL 时检查它们,而是将它们添加到队列中。然后编写一个 cron 脚本,从队列中一个一个地抓取未经检查的 URL,检查它们是否包含对您的域的引用并保存结果。

于 2012-11-26T11:45:27.157 回答