0

我们的网络应用程序中有一个工具可以删除大量数据。我们通过对找到的所有记录进行分页来做到这一点u_id

我们拥有的键是为我们在应用程序中拥有的其他查询而设计的——理想情况下,拥有一个主键会很棒,u_id但这会破坏我们所有的其他查询。

以下方法在大多数情况下运行良好,但是,在删除大约 6-8 百万条记录后,我们得到:

Dse\Exception\RuntimeException:所有 I/O 线程上的所有连接都忙

我们有时还会收到稍有不同的错误消息:

Dse\Exception\ReadTimeoutException:操作超时 - 仅收到 0 个响应

您会在下面的代码usleep(2500000)中注意到暂停脚本。这是我们的解决方法,但最好能解决这个问题,因为 Cassandra 应该能够处理这么多的删除。

$cluster        = \Dse::cluster()
                    ->withDefaultTimeout(3600)
                      ->withContactPoints(env('CA_HOST'))
                        ->build();

$session        = $cluster->connect(env('CONNECT'));
$options        = array('page_size' => 50);
$results        = $session->execute("SELECT * FROM datastore WHERE u_id = $u_id;", $options);
$future_deletes = array();

while (true) {

    foreach ($results as $result) {

      $future_deletes[] = $session->executeAsync("DELETE FROM datastore WHERE record_id = '" . $result['record_id'] . "' AND record_version = " . $result['record_version'] . " AND user_id = " . $result['user_id']);
      $future_deletes[] = $session->executeAsync("UPDATE data_count set u_count = u_count - 1 WHERE u_id = " . $u_id);

    }

    if( !empty($future_deletes) ){
      foreach ($future_deletes as $future_delete) {
          // we will not wait for each result for more than 5 seconds
          $future_delete->get(5);
      }
      //usleep(2500000); //2.5 seconds
    }

    $future_deletes = array();

    if ($results->isLastPage()) {
        break;
    }

    $results = $results->nextPage();

}

//Disconnect
$session = NULL;

以下是我们的表格供您参考:

CREATE TABLE datastore (id uuid,
    record_id varchar,
    record_version int,
    user_id int,
    u_id int,
    column_1 varchar,
    column_2 varchar,
    column_3 varchar,
    column_4 varchar,
    column_5 varchar,
PRIMARY KEY((record_id), record_version, user_id)
);
CREATE INDEX u_id ON datastore (u_id);

CREATE TABLE data_count (u_id int PRIMARY KEY, u_count counter);

我们正在运行具有 8GB RAM 的服务器。

DSE 驱动程序的版本是 6.0.1。

先感谢您!

4

1 回答 1

1

You need to control, how many "in-flight" requests do you have a the same point of time. There is a limit on number of queries per connection, and number of connections. They are controlled by corresponding functions of the Cluster class (can't find fast enough in PHP docs, but it should be similar to Cluster functions in the C++ driver, because PHP is built on top of C++ driver).

于 2018-06-25T09:56:09.963 回答