4

我在本地(mac)机器和远程 unix 服务器上运行以下代码:

public void deleteValue(final String id, final String value) {
    log.info("Removing value " + value);
    final Collection<String> valuesBeforeRemoval = getValues(id);
    final MutationBatch m = keyspace.prepareMutationBatch();
    m.withRow(VALUES_CF, id).deleteColumn(value);
    try {
      m.execute();
    } catch (final ConnectionException e) {
      log.error("Unable to delete  location " + value, e);
    }
    final Collection<String> valuesAfterRemoval = getValues(id);
    if (valuesAfterRemoval.size()!=(valuesBeforeRemoval.size()-1)) {
      log.error("value " + value + " was supposed to be removed from list "  + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);
    }
...
  }

protected Collection<String> getValues(final String id) {
  try {
    final OperationResult<ColumnList<String>> operationResult = keyspace
            .prepareQuery(VALUES_CF).getKey(id).execute();
    final ColumnList<String> result = operationResult.getResult();
    if (result.isEmpty()) {
      log.info("No  value found for id: " + id);
      return new ArrayList<String>();
    }
    return result.getColumnNames();
  } catch (final ConnectionException e) {
    log.error("Unable to retrieve session " + id, e);
  }
  return new ArrayList<String>();
}

在本地,该行永远不会执行,这是有道理的:

log.error("value " + value + " was supposed to be removed from list "  + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);

但该行在我的开发服务器上执行:

[错误] [main] [nowsdSessionDaoCassandraImpl] [2013-03-08 13:12:24,801] [] - 值 3 应该从列表中删除 [3, 2, 1, 0, 7, 6, 5, 4, 9, 8] 但不是:[3, 2, 1, 0, 7, 6, 5, 4, 9, 8]

  • 我正在使用 com.netflix.astyanax
  • 我的本地机器和远程开发服务器都连接到同一个 cassandra 实例。
  • 我的本地机器和远程开发服务器都运行相同的测试,创建一个新的行族,并在删除一条记录之前添加 10 条记录。
  • 当dev发生错误时,log.error("无法删除位置" + value, e); 没有被执行(即运行删除命令没有产生任何异常)。
  • 我 100% 肯定,当我在 dev 上运行测试时,没有其他代码影响数据库的内容,所以这不是什么奇怪的并发问题。

什么可以解释 deleteColumn(value) 请求运行时没有产生任何错误但仍然没有从数据库中删除列?

附加信息

这是我创建键空间的方式:

create keyspace sessiondata
    with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
    and strategy_options = {replication_factor:1};

以下是我创建列族值的方法,在上面的代码中引用为 VALUES_CF:

create column family values
    with comparator = UTF8Type
;

以下是上面 java 代码中引用的键空间的定义方式:

final AstyanaxContext.Builder contextBuilder = getBuilder();
final AstyanaxContext<Keyspace> keyspaceContext = contextBuilder
        .forKeyspace(keyspaceName).buildKeyspace(
                ThriftFamilyFactory.getInstance());
keyspaceContext.start();
keyspace = keyspaceContext.getEntity();

其中 getBuilder 是:

  private Builder getBuilder() {
    final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
    .setDiscoveryType(NodeDiscoveryType.NONE)
    .setRetryPolicy(new RunOnce());

    final ConnectionPoolConfigurationImpl poolConf = new ConnectionPoolConfigurationImpl("MyPool")
    .setPort(port)
    .setMaxConnsPerHost(1)
    .setSeeds(value);

    return new AstyanaxContext.Builder()
    .forCluster(cluster)
    .withAstyanaxConfiguration(conf)
    .withConnectionPoolConfiguration(poolConf)
    .withConnectionPoolMonitor(new CountingConnectionPoolMonitor());
  }

第二次更新

  • 首先,这些问题不仅仅与删除有关。我在更新数据库中的记录时观察到类似的问题,读取它们,并且无法读取我刚刚编写的更新

  • 其次,我创建了一个执行 100 次以下操作的测试:

    • 将一行写入 cassandra
    • 在 cassandra 中更新该行
    • 从 cassandra 读回该行并检查该行是否确实已更新,如果没有,则在延迟后定期再次检查

    我从该测试中观察到的是:

    • 同样,当我在本地运行该代码时,所有 100 次迭代都会立即通过(无需重试)
    • 当我在远程服务器上运行该代码时,一些迭代通过,一些失败。当它们失败时,无论延迟有多大(我最多等待 10 秒),测试总是失败。

在这一点上,我真的不确定任何 cassandra 设置如何解释这种行为,因为我连接到同一台服务器进行测试,并且由于我插入的延迟远大于连接时运行测试可能需要的任何额外延迟从我的本地机器。

唯一相关的区别似乎是代码在哪台机器上运行。

第三次更新

如果在之前更新中提到的测试中,我在 2 次写入之间插入了延迟,如果延迟 >= 1,000 毫秒,则代码开始通过。比如说,100 毫秒的延迟没有帮助。我还修改了构建器,将默认的读写一致性设置为最苛刻的:ALL,这对测试结果没有影响(大约一半的时间仍然失败,除非写入之间的延迟>1s):

final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.NONE)
.setRetryPolicy(new RunOnce()).setDefaultReadConsistencyLevel(ConsistencyLevel.CL_ALL).setDefaultWriteConsistencyLevel(ConsistencyLevel.CL_ALL);
4

1 回答 1

1

To debug, try printing the full row instead of just the column names. When I say the full row I mean the column name, column value and the time stamp. A long shot is clocks are wrong on one of your test machines and this is throwing out your tests on the other.

Another thing to double check is that ip is indeed what you think it is, in both your application and cassandra. When you retrieve it print it between something, like println("-" + ip "-"). Before and after your try block for the execute in deleteSecureLocation do a get for only that column, not the entire row. I'm not too sure how to do that in astynax, on the cli it would be get[id][ip].

Something to keep in mind is that a delete won't fail even if there's nothing to delete. To cassandra it's a write, the only thing that will make it a delete is if on read it's the latest timestamped entry against that row/column name.

于 2013-04-08T14:40:47.510 回答