we planning deploy cassandra cluster 100 virtual nodes. store maximally 1tb (compressed) data on each node. we're going use (host-) local ssd disks.
infrustructure team used plainly backing whole partitions. we've come across cassandra snapshots.
difference between plainly copying whole disk vs. cassandra snapshots?
- there size difference?
- using whole partition backups, unnecessarily saves uncompressed data being compacted, motive behind snapshots?
there few benefits of using snapshots:
- snapshot command flush memtable sstables , creates snapshots.
- nodetool can used restore snapshots.
- incremental backup functionality can leveraged.
- snapshots create hardlink of data faster.
note: cassandra can restore data snapshot when table schema exists. recommended backup schema. in both made sure operation (snapshot or plain copy) run @ same time on nodes.