Cloud backup & sync for Linux, Android: Comparison table & a winner

published Sep 25, 2018 01:20   by admin ( last modified Jun 09, 2019 01:37 )

Updated 2019-06-19

Summary

Rsync + Encfs for simplest backups, Restic all-in-one solution and Syncthing for synchronization. At the end of the article there is a comparison and analysis of  Restic Vs Borg Vs Duplicati.

The comparison table below lists some info on:
DropBox, siacoin, tahoe-lafs, cozy, nextcloud, owncloud, seafile, perkeep, (camlistore), Sparkleshare, Syncthing, Rclone, Duplicati, Restic, Borg, Bup, Back in time, IDrive, Amazon Drive, Backblaze B2, Jottacloud, Rsync.net, Hetzner SB, upspin, safe network, Mega, OneDrive, storj, Google Drive, Tarsnap, Rsync+, Rsync.net/Hetzner, Rsync+, Rsync.net/Hetzner and EncFS/gocryptfs.

Simplest candidate: Rsync + Encfs + Hetzner/Rsync.com

The rather boring answer as the simplest winner is Rsync + Encfs for personal backup, with cloud solutions Rsync.net or Hetzner Box as snapshotting back ends, or your own server of course. Encfs and Rsync have been around for a long time. I have used rsync+btrfs previously for snapshotting backups so it is not a completely new solution for me. The new thing is using the cloud.

The only reason Rsync is even in the running, is because there are at least two specialized storage services that do rsync backups and snapshots them automatically on the server side. This server-side snapshotting means that the client cannot delete old backups, something the other cloud backup solutions run the risk of.

Encfs has some problems of metadata leaking in normal mode, but it has keystretching (meaning your password is not directly used but a more difficult one is made from your password).

Magic folder (synchronization) winner

The winner here is Syncthing. I have been using it now (2019-06-09) for a month on non-critical data, Works fine.The runner-up, or actually the other candidate, is Sparkleshare. Unfortunately Sparkleshare is not well documented and if using it through the GUI it runs into error conditions that it doesn't tell you about.

All-in-one encrypted remote backup winner

In the category "All-in-one encrypted remote backup winner" which comprises Restic, Duplicati and Borg, the winner is Restic. It has been somewhat cryptographically reviewed, and it is in the standard Ubuntu repos. It also has better key stretching than the competition.

Rationale for cloud backup

I've had some really nice personal backup services in place: rsync, rdiff-backup, Camlistore (now called perkeep), time-machine-like contraptions on Btrfs (jury is still out for me if Btrfs is good though, I gotta check what's happened given the error messages. Update: It looks like a hardware fault in the Ext4 system SSD disk, so no fault of Btrfs; it still runs!). Problem is that they have all relied on my own servers, and as life goes by these servers tend to break or be repurposed.

Lists

Comparison table

Preliminary list of software and service for backup,  file synchronization and cloud storage components that work on and with Linux.

  • "Linux/Android"— If it is available for Linux and Android. "Standard" for Linux means it is already in the normal Ubuntu repositories.
  • "Their storage" means that you do not need to configure your own storage server. They, or a third party, supply server storage for free, or for a fee. Ideally combined with "Least authority". If free, capacity is listed.
  • "Magic folder" Files are transparently updated between devices.
  • "Multi sync"— Sync more than two devices/machines to the same backup or set of files.
  • "Libre" means if the source code is libre. I haven't checked client libraries always.
  • "LA—Least authority" means that you're in control of the encryption on the client side, and servers can't break it (hopefully). Comes in handy if servers get hacked. Some refer to this as "end to end encryption", however that is slightly different in definition. "Zero knowledge" is also used as a term.
  • "Post check"—Means you can verify backups by sampling them
  • "Go back"— That you can go back in time, for example with snapshots or versions

There are most likely mistakes below. Some non-Linux services included so you do not need to check them out because I may have missed them. "Yes" and anything that is not a "No" in a cell is good (from my perspective).

Service Linux/
Android
 
Magic
folder/
schedule
Multi
sync
Their
storage
$ Price
1TB/yr
LA/
Key
Stretch
Post
Check
Integration
Frontend/Back
Libre Redundant Go
back
Conflict
res.
Lang/
last
comm.
Extra
Features
DropBox Yes/Yes Yes Yes 2 GB $120 No   Magic folder No No No Branch    
siacoin Yes/No ?   For pay   Yes/?   Nextcloud, FS Yes Yes       Crypto coin

tahoe-lafs

Yes/No Yes   Optional   Yes/?   Magic folder,
web, sftp
Yes m of n Yes Shaky Python m of n redundant
cozy Yes/Yes No   5GB $120 No     Server No No      
nextcloud Yes/Yes Yes Yes 10GB $420 Beta   WebDAV/ Yes     Branch PHP Sprycloud price
owncloud Standard/Yes Yes   Optional       WebDAV/ Yes     Branch PHP  
seafile Standard/Wobbly Yes   No   Yes/?   Magic folder/Stand-alone Yes     Branch C Android app keeps crashing
perkeep
(camlistore)
Yes/Upload     Optional       Stand-alone Yes Replicas Yes Branch Golang Content
adressable
Sparkleshare Standard/No Yes   No   Yes/?   Magic folder Yes No Yes     "Not for backups"

Syncthing

Standard/Yes Yes   No   No   Browser Yes   Yes Branch Golang,days, 3 "Not for backups"
Rclone Standard/No No No Optional       34 backends Yes No backend   No    
Duplicati Yes/No No No Optional   Yes/sha256   26 backends,
incl. Sia, Tahoe
Yes No backend Yes   C# minutes, 5  
Restic Standard/No No/No   Optional   Yes/Scrypt
e.g. n18,r1,p3
Yes B2, GDrive, S3, etc. Yes No Yes   Golang, days verifiable backups
Borg Standard/No No/No No     Yes/PBKDF2 Yes Stand-alone Yes No     Python, days Mountable backups
Bup Standard/No No                        
Back in time                            
IDrive No/Yes                          
Amazon Drive Yes/Yes     5GB $60 No                
Backblaze B2         $100                 Only pay what you use
Jottacloud Yes/Yes No   5GB $84                  
Rsync.net Yes/No       $480                 ZFS backend
Hetzner SB         $96     Borg, rsync, WebDAV            
upspin Yes/No                         Early stages proj.
safe network - -   -   -   - - - - -   Beta crypto coin
Mega Sync/Yes                          

OneDrive

No                          
storj (N/A) - -   For pay   Yes   - Yes Yes       Alpha Crypto coin
Google Drive Yes/Yes No   15GB $100 No   Browser No No No     Editors
Tarsnap Yes/No No   For pay $3000+ Yes         Yes     Deduplicates
Rsync+
Rsync.net/Hetzner
Standard/Yes No/No Yes No $480/$96 No No No Yes No Yes None    
Rsync+
Rsync.net/Hetzner
+EncFS/gocryptfs
Standard/Yes No/No Yes No $480/$96 Yes/PBKDF2 (Scrypt for GocryptFS) No No Yes No Yes None    

Other offerings (or lack of such)

  • For syncany, the team has gone missing… Maybe they have been bought to work on some well-funded solution?
  • Filecoin has been funded to the tune of $250 million dollars. I hope to see something produced from them soon!

What I'm looking for

I would like to have full redundancy, all the way from the device. I had this before with two independent systems: synology diskstation and rsync. Fully independent, all the way from the data. I did try to use obnam at one time, but it did not work for me in a reliable way.

Magical folder

It's probably not a good idea to have two different programs share or nest magical folders. I guess the update algorithms could start fighting. It therefore seems like a better idea to use one magical folder service, such as dropbox, and then apply one or several backup services on that magical folder using a completely different backup system. Or even different systems.

Versioning

Your data could be accidentally overwritten by user processes. In that case you want to be able to go back.

Quick restore

You want to be able to be up and running quickly again, both on user devices and get a new backup server up and running again.

Redundancy in backups

This means using different systems already at the client, and also monitor what is going on.

Somebody else's storage

I'd like to try to use remote storage services. One way of doing that more securely is to have things encrypted client side, something called "Zero knowledge" on e.g. Wikipedia's comparison page. I prefer the term "Least authority" which is the "LA" in Tahoe-LAFS.

Least authority

One way of establishing this separately is to use EncFS and backup the encypted version. An interesting way is to keep the encrypted folder with read/write rights to it so it can be used with a backup client with low privileges. A downside with EncFS is that you more than double your storage need on the client computer, unless you use the reverse mount option, which actually is pretty handy.

One guy has tested how well deltas work with EncFS, and the further back in the file the change is, the better it works. A project called gocryptfs seeks to perform faster than EncryptFS paranoia mode.

Some quotes from Tahoe-LAFS which are a bit worrying

It seems to be a really solid system, but as with all complex systems, the behaviour is not always what you'd like. Some quotes from their pages:

"Just always remember that once you create a directory, you need to save the directory's URI, or you won't be able to find it again."

"This means that every so often, users need to renew their leases, or risk having their data deleted." — If you do not periodically renew, things may disappear. If you perish, so does your data. Maybe you can set the lease to infinity?

"If there is more than one simultaneous attempt to change a mutable file or directory […]. This might, in rare cases, cause the file or directory contents to be accidentally deleted."

Deduplication and versioning file systems

It seems like a good idea to use deduplicating file system to create snapshots which on a deduplicating file system could just be folders. Two file system that can do snapshots and that are regarded as stable are:

  • ZFS— Has license problems on Linux, but it is possible to use. Needs 8GB of RAM or thereabouts for efficient deduplication.
  • NILFS—Regarded as stable on Linux, according to Wikipedia: "In this list, this one is [the] only stable and included in mainline kernel.". According to NILFS FAQ it does not support SELinux that well: "At present, NILFS does not fully support SELinux since extended attributes are not implemented yet

Another way of doing snapshots seems to be on a slightly higher level:

RedHat has decided not to support btrfs in the future and are working on something called Stratis, which is similar to LVM thin pools, built ontop of the XFS file system.

We may also get an alternative to btrfs on Linux with bcacheFS.

Cloud backups — Rsync, borg, restic or  duplicati?

For cloud backup purposes it has narrowed down to four choices of which I may deploy more than one:

Rsync

rsync — Rsync can  work with the rsync.com site . Overall a simple and time-trusted setup. They use ZFS on their side and they do snapshots, and you can decide when those snapshots are happening. It can be a bit expensive though. The setup with rsync.com would be very similar to the setup I already have for local backups, with rsync to btrfs snapshots. However push instead of pull. It should also work fine with my phone with Termux or Syncopoli. No scheduling built in. Hetzner box is a cheaper alternative that does the same as rsync.com, although probably less reiably, which they are open about.

+ simple, tried and trusted. Available by default on all Linux distributions.

+ with rsync.com, the client cannot overwrite old backups. This is a truly big point!

+ There are any number of rsync clients for Android, such as Syncopoli.

- no scheduling

- you need to learn rsync syntax (I already know it though)

- No encryption. Although it may be a benefit if you use a good complement. Question is, what is? There is EncFS and a new competitor gocryptfs.

Borg

I had given up on Borg, since it needs a Borg back-end until I found Hetzner storage boxes. These work out of the box (pun intended) with Borg. However do I want to learn yet another configuration language?

Restic

Restic — Restic seems to get the nod from some very intelligent programmers, check for example this review by Filippo Valsorda. However it has no scheduling or process management of backups. That is kind of important, also in the respect of recovering from errors. But maybe the other alternatives have not put too much work into that anyway?

The parameters for scrypt in restic are something like,"N":262144,"r":1,"p":3. This is on the low side, consuming only about 32 MB RAM I believe. Restic is set up to read whatever values of these parameters so if you feel adventurous you can change the key files in the repo to higher values and make sure you know what the answer is of course.

+ In Ubuntu repo

+ Liked by some smart people

- No scheduling

- Need to learn the language

Duplicati

duplicati — This also comes recommended, however it is the only one of these four that is not in the Ubuntu repositories, and it has slightly less glowing reviews than restic. Currently one version, version 2.0, is in beta and the old version, 1.3.4, is not supported anymore. That is in itself weird.

+ Great user interface

+ Includes scheduling. The only one that does so of the shortlisted candidates

- Not in Ubuntu repos

- Keystretching is there but not as well-implemented. See next section for more info.

How good is the client side encryption & keystretching in EncFS, GocryptFS, Borg, Duplicate and Restic?

There are at least three components here:

1) The encryption used. They all use AES but there might be subtle differences.

2) Overall design and leaking of metadata

3) Keystretching. Passwords I believe can often be the weakest link, and some good keystretching could mitigate that.

Encryption

They all use AES although Borg is thinking of Chacha20, not sure if they have implemented it?

Keystretching

Of the techniques used by the components, the best one is Scrypt as long as it uses enough memory, followed by PBKDF2, and then after that at the last place applying sha256 over and over again.

Scrypt is used by Restic and GocryptFS.

Duplicati uses sha256 8192 times keystretching if you use the AES option. A sha256 miner could do short work of that, I guess, evaluating password candidates in parallel. There is however also a gpg encryption option. Not sure why they use sha256 8192 times, seems like a subpar choice. It can use OpenPGP too, and in GPG libraries there is a GCRY_KDF_SCRYPT option, not sure how much it is used though: https://www.gnupg.org/documentation/manuals/gcrypt/Key-Derivation.html. I can see no mention on the web of using scrypt for generating keys in GPG, so I'm not sure it can even be used in practice.

Borg uses PBKDF2.

EncFS uses PBKDF2 for whatever number of iterations that take 0.5 seconds, or 3 seconds in paranoia mode.

Overall design and leaking of metadata

Taylor Hornby has made audits of both EncFS and GocryptFS, the latter here: https://defuse.ca/audits/gocryptfs.htm

EncFS has some problems with leaking metadata that are widely known. But leaking metadata about files may not be all that bad for my use case?

Restic has actually been reviewed (sort of) by a cryptography specialist (Filippo Valsorda) and he gave a thumbs up, if not an all clear. It also has keystretching which I see as a requirement more or less! It uses scrypt for keystretching which I think is a good choice as long as you're not inside the parameters of a scrypt miner. It encrypts with AES-256-CTR-Poly1305-AES

And the winner is

Rsync

Rsync only wins because at least two storage providers have provided snapshots for it.

Addendum 2019 - file synchronization

Suddenly in 2019 I now have the need for synchronization between laptops. Here are some initial notes from the research I do now:

Sparkleshare - seems to use a bog standard git repository as back end, great! This ought also mean that there is a git repository on the front end side. This means that recovery from a botched central repository ought to be easy. The client is in the Ubuntu repositories. Encryption does exist but merges will always fail, which is understandable. Encryption is a symmetric key that cannot be changed later. The obvious step here would be to encrypt that key with an asymmetric key, which it seems they haven't thought of, which in itself may indicate a not completely thought through process. After installing on Ubuntu, there is no man page and basically no command line help. One thing to remember is that the ".git" suffix needs to entered for connecting to gthub and bitbucket. It does not give any warning, just churns while doing nothing if you enter an incorrect repository url.

On Ubuntu 19.04 you get a generic binary installed with, not sure if it's flatpak or snap. This means that the config files are in ~./config/org.sparkleshare.SparkleShare and not under e.g. ~./config/SparkleShare. Sparkleshare seems to identify a computer by the ssh public key used. It may be that this precludes using the same key for more than one computer.

Overall, the documentation is lacking for SparkleShare and when it is trying to connect to a repo it gives absolutely no information on what it is doing. In fact clear unrecoverable errors that you would see running from the command line give no communication through the GUI.

Syncthing - seems to be truly decentralized, relying on a cluster of discovery servers. However those servers seem to be shared with others. Is that desirable? The client and discovery servers are in the Ubuntu repositories. Encryption does not seem to be supported. However if files are never stored outside of your own machines, this may actually be moot. It seems relay servers, that are needed if at least one of your machines is firewalled, must be public. Or actually it is not clear, reading different things on GitHub. I guess ssh tunneling to a relay server from all involved parties could take care of the possibility to run private. Maybe even using a small VPS somewhere for that job.

Seafile - custom made servers but comes with high recommendations on the selfhosted subreddit. The client is in the Ubuntu repositories but not the server. Encryption is a symmetric key that most likely cannot be changed later. That key is in its turn encrypted with a user password that is keystretched (with PKDF2 and not scrypt but you cannot get everything). On the whole the encryption workflow indicates a thought-through process, as compared to e.g. sparkleshare.

Nextcloud - custom made server but comes with high recommendations on the selfhosted subreddit. The client is in the Ubuntu repositories but not the server.

It's probably not a good idea to run more than one of these on a set of files. Although with filters, you may use different ones for different files in the same dirctories, come to think of it.

I guess I will just need to install three or four of them and see how they perform! Sparkleshare is a no-brainer here since I can just get a git rep running in notime, so in fact no server setup phase!

Syncthing is p2p and goes point to point and if that doesn't work it (which due to NAT it often does not) it relies on a public cluster of relay servers, or you can run your own if you do not trust that the public servers are unable to read the encrypted traffic.

However the p2p nature of Syncthing becomes a bit of a problem if you want to sync between your own devices, because obviously the sync can only work if the machines are switched on and online at the same time. For your laptops, this is unlikely. And hence Syncthing does not work for that scenario. Unless you have an always on machine also in the mix (you can sync many machines, not just two).

But what do we call a machine that is always switched on? Yup, a server. Although the system would be robust since if you lose that machine you can just fire up another one and everything works again.

Still Syncthing feels like it is more for synchronizing files between people. And there git may be a contender. Still Syncthing looks great and I will see if I can tailor it to my needs. Worst case scenario I'll put two servers in the mix, one for relaying and one for making sure synching always works!

That would be two-server serverless architecture :) But with great resilience since the servers can be replaced at any time.