Does a de-duplicating backup program make sense in the age of ZFS et al?

published Feb 19, 2016 06:40   by admin ( last modified Oct 01, 2016 11:15 )

Disclaimer: I haven't tested a de-duplicating FS yet. However I have tested Obnam, a de-duplicating backup system, and it did not work that well (could be operator error, but still). I could also test Attic and Borg, other de-duplicating backup systems.

But in a way I'd be happy to just use Rsync. The upside with Obnam, Attic and Borg is that they de-duplicate your data, which is great if you have the same files on several computers.

But come to think of it, there are de-duplicating file systems such as ZFS and btrfs. Why not use Rsync, make new copies galore on the backup server and have the file system take care of the de-duplication? My guess is that the code in at least ZFS and maybe also btrfs (I do not know much about it) is better than the code in the above mentioned de-duplicating backup systems, not because of lack of trying from those backup systems but simply for them having had less resources.

Update 2016-02-22: At least for ZFS you seem to need about 5GB of extra RAM or SSD for each TB on the volume: ZFS Deduplication: To Dedupe or not to Dedupe...

That does not work for the servers I have in mind. btrfs seems less mature than ZFS so I will not use it. Conclusion: It is probably better to just have enough disk space and "eat" the cost of having several copies of data.