3

我正在处理 Godaddy 拍卖域名,它们提供了一些下载域名列表的方法。我确实开发了一项 cron 作业来下载和转储(插入)域列表到我的数据库表中。这个过程从下载到转储到数据库需要几秒钟。在这种情况下,域(记录)的总数为 34000 个条目。

其次,我需要更新数据库中每个单独域的页面排名,总共 34000 条记录。我有用于实时获取页面排名的 PHP API。Godaddy 下载不提供页面排名详细信息,因此我必须单独获取和更新它。

现在,问题是当涉及到实时获取页面排名然后将页面排名更新到数据库中时,总共 34000 个域需要太多时间。

我最近通过 cron 作业做了一个实验来更新数据库中域的页面排名,仅从 34000 个域更新 13383 个域的页面排名就花了 4 个小时。因为它必须先获取然后更新到数据库中。这一切都在专用服务器上进行。

有什么方法可以加快大量域的这个过程吗?我在想的唯一方法是通过多任务处理来完成此任务。

是否有可能有 100 个任务同时获取页面排名并将其更新到数据库中?

如果您需要代码:

$sql = "SELECT domain from auctions";
    $mozi_get=runQuery($sql);

    while($results = mysql_fetch_array($mozi_get)){
        /* PAGERANK API*/
        if($results['domain']!='Featured Listings'){
            //echo $results['domain']."<br />"; 
            try 
                {
                  $url = new SEOstats("http://www.".trim($results['domain']));
                  $rank=$url->Google_Page_Rank();
                  if(!is_integer($rank)){
                    //$rank='0';
                   }
                } 
                catch (SEOstatsException $e) 
                {
                  $rank='0';
                }
                try 
                {
                  $url = new SEOstats(trim("http://".$results['domain']));
                  $rank_non=$url->Google_Page_Rank();
                  if(!is_integer($rank_non)){
                    //$rank_non='0';
                   }
                } 
                catch (SEOstatsException $e) 
                {
                  $rank_non='0';
                }




            $sql = "UPDATE auctions set rank='".$rank."',  rank_non='".$rank_non."' WHERE domain='".$results['domain']."'"; 
            runQuery($sql);
            echo $sql."<br />";
        }
    }

这是我更新的 pthreads 代码:

<?php
set_time_limit(0);
require_once("database.php");
include 'src/class.seostats.php';


function get_page_rank($domain) {

    try {

        $url = new SEOstats("http://www." . trim($domain));

        $rank = $url->Google_Page_Rank();

        if(!is_integer($rank)){
              $rank = '0';
         }


    } catch (SEOstatsException $e) {

        $rank = '0';
    }

    return $rank;
}

class Ranking extends Worker {
  public function run(){}
}

class Domain extends Stackable {

  public $name;
  public $ranking;

  public function __construct($name) {

    $this->name = $name;

  }

  public function run() {

    $this->ranking = get_page_rank($this->name);

    /* now write the Domain to database or whatever */

    $sql = "UPDATE auctions set rank = '" . $this->ranking . "' WHERE domain = '" . $this->name . "'"; 
    runQuery($sql);

  }

}

/* start some workers */
$workers = array();
while (@$worker++ < 8) {
  $workers[$worker] = new Ranking();
  $workers[$worker]->start();
}

/* select auctions and start processing */

$domains = array();

$sql = "SELECT domain from auctions"; // RETURNS 55369 RECORDS

$domain_result = runQuery($sql);

while($results = mysql_fetch_array($domain_result)) {

  $domains[$results['domain']] = new Domain($results['domain']);
  $workers[array_rand($workers)]->stack($domains[$results['domain']]);

}


/* shutdown all workers (forcing all processing to finish) */
foreach ($workers as $worker)
  $worker->shutdown();

/* we now have ranked domains in memory and database */
var_dump($domains);
var_dump(count($domains));
?>

任何帮助将不胜感激。谢谢

4

1 回答 1

2

好吧,这是一个 pthreads 示例,它允许您对操作进行多线程处理……我选择了 worker 模型并且正在使用 8 个 worker,您使用的 worker 数量取决于您的硬件和接收请求的服务……我'我从未使用过 SEOstats 或 Godaddy 域名拍卖,我不确定 CSV 字段,并将获取页面排名留给您...

<?php
define ("CSV", "https://auctions.godaddy.com/trpSearchResults.aspx?t=12&action=export");

/* I have no idea how to get the actual page rank */
function get_page_rank($domain) {
  return rand(1,10);
}

class Ranking extends Worker {
  public function run(){}
}

class Domain extends Stackable {
  public $auction;
  public $name;
  public $bids;
  public $traffic;
  public $valuation;
  public $price;
  public $ending;
  public $type;
  public $ranking;

  public function __construct($csv) {
    $this->auction = $csv[0];
    $this->name = $csv[1];
    $this->traffic = $csv[2];
    $this->bids = $csv[3];
    $this->price = $csv[5];
    $this->valuation = $csv[4];
    $this->ending = $csv[6];
    $this->type = $csv[7];
  }

  public function run() {
    /* we convert the time to a stamp here to keep the main thread moving */
    $this->ending = strtotime(
      $this->ending);

    $this->ranking = get_page_rank($this->name);

    /* now write the Domain to database or whatever */
  }
}

/* start some workers */
$workers = array();
while (@$worker++ < 8) {
  $workers[$worker] = new Ranking();
  $workers[$worker]->start();
}

/* open the CSV and start processing */
$handle = fopen(CSV, "r");
$domains = array();
while (($line = fgetcsv($handle))) {
  $domains[$line[0]] = new Domain($line);
  $workers[array_rand($workers)]->stack(
    $domains[$line[0]]);
}

/* cleanup handle to csv */
fclose($handle);

/* shutdown all workers (forcing all processing to finish) */
foreach ($workers as $worker)
  $worker->shutdown();

/* we now have ranked domains in memory and database */
var_dump($domains);
var_dump(count($domains));
?>

问题:

  1. 对,8个工人
  2. 工人按照堆栈()的顺序执行堆栈对象,这一行选择一个随机工人来执行堆栈
  3. 执行时可以遍历主进程中的$domains列表,在执行时检查每个Stackable的状态
  4. 所有每个工人堆栈都将在关闭之前执行,关闭确保所有工作都在脚本执行时完成。
于 2013-09-15T19:25:59.010 回答