Cassandra backup: plain copy disk files vs snapshots -


we planning deploy cassandra cluster 100 virtual nodes. store maximally 1tb (compressed) data on each node. we're going use (host-) local ssd disks.

infrustructure team used plainly backing whole partitions. we've come across cassandra snapshots.

difference between plainly copying whole disk vs. cassandra snapshots?

- there size difference?

- using whole partition backups, unnecessarily saves uncompressed data being compacted, motive behind snapshots?

there few benefits of using snapshots:

  1. snapshot command flush memtable sstables , creates snapshots.
  2. nodetool can used restore snapshots.
  3. incremental backup functionality can leveraged.
  4. snapshots create hardlink of data faster.

note: cassandra can restore data snapshot when table schema exists. recommended backup schema. in both made sure operation (snapshot or plain copy) run @ same time on nodes.