1

I'm experimenting with a 4-node Cassandra (1.2) cluster which I've just setup on CentOS 6.4 across 4 VMs. First, I created a keyspace with a replication factor of 3 and within it created a couple of tables and populated each one with a small number of rows - all using Cqlsh. Simple INSERTs, SELECTs and UPDATEs appeared to be working fine.

Then I started disconnecting some of the nodes randomly to see the capabilities of the cluster in action. While two of the nodes were offline, I ran a few SELECTs which returned the correct results. Subsequently, I attempted to update an existing row, which according to "nodetool getendpoints" was hosted on the offline nodes as well as on the local node on which Cqlsh was running. After bringing the two nodes back online, running a SELECT against the updated row did not return the updated data values. I waited a little and tried SELECTing again but that still kept returning the original data. I also tried the following, none of which returned the updated data:

  1. Re-running the UPDATE a few times
  2. UPDATEing a different column in the same row - the field wasn't updated
  3. Restarting all four nodes in the cluster

An UPDATE for the same column in a different row works fine, which along with #2 above leads me to think this is an issue with the row data.

The following snippet shows a SELECT returning the original data before and after a seemingly successful UPDATE:

cqlsh:demo> select email, active from users where email = 'john.doe@bti360.com';

email               | active
--------------------+--------
john.doe@bti360.com |   True

cqlsh:demo> update users set active = false where email = 'john.doe@bti360.com';

cqlsh:demo> select email, active from users where email = 'john.doe@bti360.com';

email               | active
--------------------+--------
john.doe@bti360.com |   True

I am new to Cassandra so I could very well be missing something. Any suggestions or troubleshooting tips (files to check or commands to run) to help uncover what is going on here would be much appreciated.

4

2 回答 2

6

This could be explained by a clock mismatch between servers. The timestamp of the updates is set by the server receiving the update from the client. If the servers are out of sync, you can get behaviour like this where an old update has a later timestamp so overrides subsequent writes.

To find out, firstly check the clocks on the servers. You should always run NTP between Cassandra servers so the clocks are the same.

You can confirm if this is the actual issue by using WRITETIME to get the timestamp:

select WRITETIME(active) from users where email = 'john.doe@bti360.com';

This is microseconds since the epoch. Write a value to a different row and get its timestamp. If that is earlier then this will be the cause.

于 2013-06-12T16:15:46.507 回答
0

The one reason I can think of, outside of time synchronization as mentioned by Richard, is a consistency of ANY or ONE, opposed to QUORUM or ALL. However, using QUORUM or ALL and having too many nodes down and you get timeouts on reads and writes.

However, even with a consistency of ONE, the data should eventually become consistent. How long it takes for it to become consistent is not specified, but on my end it looks like that's really fast.

于 2016-06-03T18:54:58.147 回答