I am running a cluster with 3 nodes(EC2 instances) and replication factor=2. I execute a script from the first node which runs nodetool snapshot on all the nodes using pssh (parallel-ssh) utility. But the snapshot data for each node gets stored on that node itself. Is there a way we can get snapshot data of all nodes to the node from where I ran the script so that my script can easily copy the data to S3 from a single place?
Also, Suppose if I have a 5 node cluster and I have snapshots for each node. Now I want to restore this data to a 10 node clusters and a 2 node cluster with different replication factors. Is the below process correct for restore?
copy snapshot data from all the 5 nodes and merge all the files into a single folder.
run sstableloader command passing all the IP addresses (which are 10 or 2 in number) and single folder location. Will this properly split the data from 5 node to 10 or 2 nodes after restore ?