linux - Cassandra global snapshot

Question

I am running a cluster with 3 nodes(EC2 instances) and replication factor=2. I execute a script from the first node which runs nodetool snapshot on all the nodes using pssh (parallel-ssh) utility. But the snapshot data for each node gets stored on that node itself. Is there a way we can get snapshot data of all nodes to the node from where I ran the script so that my script can easily copy the data to S3 from a single place?

Also, Suppose if I have a 5 node cluster and I have snapshots for each node. Now I want to restore this data to a 10 node clusters and a 2 node cluster with different replication factors. Is the below process correct for restore?

copy snapshot data from all the 5 nodes and merge all the files into a single folder.
run sstableloader command passing all the IP addresses (which are 10 or 2 in number) and single folder location. Will this properly split the data from 5 node to 10 or 2 nodes after restore ?

score 2 · Accepted Answer

我强烈建议使用Medusa 工具( doc ) 来备份和恢复您的 Cassandra 集群 - 它能够将数据备份到云存储，并且您可以将数据恢复到集群，即使使用不同的拓扑。

linux - Cassandra global snapshot

1 回答 1

Related

Reference