php - Sphinx 即使长时间没有活动，我如何保持连接处于活动状态？

Question

我正在bulk inserts使用RealTime IndexPHP 和禁用 AUTOCOMIT ，例如

// sphinx connection
$sphinxql = mysqli_connect($sphinxql_host.':'.$sphinxql_port,'',''); 

//do some other time consuming work

//sphinx start transaction
mysqli_begin_transaction($sphinxql);

//do 50k updates or inserts

// Commit transaction
mysqli_commit($sphinxql);

并让脚本在一夜之间运行，早上我看到

PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate
212334 bytes) in

所以当我仔细检查nohup.out文件时，我注意到这些行

PHP Warning: mysqli_query(): MySQL server has gone away in /home/script.php on line 502
Warning: mysqli_query(): MySQL server has gone away in /home/script.php on line 502

这些行之前的内存使用量是正常的，但是这些行之后的内存使用量开始增加，并且它击中php mem_limit并给了PHPFatal error并死了。

in script.php , line 502 is

mysqli_query($sphinxql,$update_query_sphinx);

所以我的猜测是，狮身人面像服务器在几个小时/分钟不活动后关闭/死亡。

我试过在 sphinx.conf 中设置

client_timeout = 3600

重新开始搜索

systemctl restart searchd

我仍然面临同样的问题。

那么，当长时间没有活动时，我怎么能不让 sphinx 服务器死在我身上呢？

添加了更多信息 -

我一次从 mysql 获取 50k 块的数据，并执行 while 循环来获取每一行并在 sphinx RT 索引中更新它。像这样

//6mil rows update in mysql, so it takes around 18-20 minutes to complete this then comes this following part.

$subset_count = 50000 ;

$total_count_query = "SELECT COUNT(*) as total_count FROM content WHERE enabled = '1'" ;
$total_count = mysqli_query ($conn,$total_count_query);
$total_count = mysqli_fetch_assoc($total_count);
$total_count = $total_count['total_count'];

$current_count = 0;

while ($current_count <= $total_count){

$get_mysql_data_query = "SELECT record_num, views , comments, votes FROM content WHERE enabled = 1  ORDER BY record_num ASC LIMIT $current_count , $subset_count ";

//sphinx start transaction
mysqli_begin_transaction($sphinxql);

if ($result = mysqli_query($conn, $get_mysql_data_query)) {

    /* fetch associative array */
    while ($row = mysqli_fetch_assoc($result)) {

    //sphinx escape whole array
    $escaped_sphinx = mysqli_real_escape_array($sphinxql,$row);

    //update data in sphinx index
    $update_query_sphinx = "UPDATE $sphinx_index  
    SET 
        views       = ".$escaped_sphinx['views']." , 
        comments    = ".$escaped_sphinx['comments']." , 
        votes   = ".$escaped_sphinx['votes']." 
    WHERE 
        id          = ".$escaped_sphinx['record_num']." ";  

    mysqli_query ($sphinxql,$update_query_sphinx);

    }

    /* free result set */
    mysqli_free_result($result);
}
// Commit transaction
mysqli_commit($sphinxql);

$current_count = $current_count + $subset_count ;
}

score 1 · Accepted Answer

您需要在之前重新连接或重新启动数据库会话mysqli_begin_transaction($sphinxql)

像这样的东西。

<?php

//reconnect to spinx if it is disconnected due to timeout or whatever , or force reconnect
function sphinxReconnect($force = false) {
    global $sphinxql_host;
    global $sphinxql_port;
    global $sphinxql;
    if($force){
        mysqli_close($sphinxql);
        $sphinxql = @mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','') or die('ERROR'); 
    }else{
        if(!mysqli_ping($sphinxql)){
            mysqli_close($sphinxql);
            $sphinxql = @mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','') or die('ERROR'); 
        }
    }
}



//10mil+ rows update in mysql, so it takes around 18-20 minutes to complete this then comes this following part.

//reconnect to sphinx
sphinxReconnect(true);

//sphinx start transaction
mysqli_begin_transaction($sphinxql);

//do your otherstuff

// Commit transaction
mysqli_commit($sphinxql);

score 1 · Accepted Answer

所以这里有几个问题，都与运行大进程有关。

MySQL server has gone away- 这通常意味着 MySQL 已经超时，但也可能意味着MySQL 进程由于内存不足而崩溃。简而言之，这意味着MySQL已经停止响应，并且没有告诉客户端原因（即没有直接查询错误）。正如您所说，您在单个事务中运行 50k 更新，MySQL 很可能只是内存不足。
Allowed memory size of 134217728 bytes exhausted- 表示PHP内存不足。这也使人们相信 MySQL 内存不足的想法。

那么该怎么办呢？

最初的权宜之计是增加 PHP 和 MySQL 的内存限制。这并不能真正解决根本原因，并且根据您对部署堆栈的控制量（以及您拥有的知识），这可能是不可能的。

正如一些人提到的，批处理该过程可能会有所帮助。在不知道您正在解决的实际问题的情况下，很难说出最好的方法。如果您可以批量计算 10000 或 20000 条记录而不是 50000 条记录，则可以解决您的问题。如果这在单个进程中花费太长时间，您还可以考虑使用消息队列（RabbitMQ是我在许多项目中使用过的一个很好的队列），以便您可以同时运行多个进程处理较小的批次。

如果您正在做一些需要了解所有600 万多条记录来执行计算的事情，您可能会将流程拆分为多个较小的步骤，缓存“迄今为止”完成的工作（因此），然后拿起下一个过程中的下一步。如何干净地做到这一点是困难的（同样，像 RabbitMQ 这样的东西可以通过在每个进程完成时触发一个事件来简化它，以便下一个进程可以启动）。

所以，简而言之，有你最好的两个选择：

尽可能多地在问题上投入更多资源/内存
将问题分解成更小的、独立的块。

php - Sphinx 即使长时间没有活动，我如何保持连接处于活动状态？

2 回答 2

Related

Reference