TimeMachine on the Linux box

Wednesday, February 23. 2011

So that headline was made to attract. TimeMachine as it comes on a Mac with that timel tunnel interface and it looks so easy and intuitive - I found nohing of that sort for Linux desktop even when there were some attempts to go into that direction. And I didn't try that, too, command line interface seemed good enough for me. 

Minus Interface, minus application-awareness remains this: create a series of dated backups with less granularity the more you go into the past in surprisingly compact format.

The best idea how to accomplish that i could find lives here, the tools we need are rsync, cp and rm.
It works like so: We have a list of sources we want to include in that backup and a destination directory. When called for the first time rsync creates a copy of all the backup sources at the destination, in a subdirectory of its own, let's call it backup.00. On the next run, this earlier backup gets renamed to backup.01 and then a new backup.00 gets created as a hard link of the previous one. Now rsync does it's job, comparing the content of the source directories with the hard linked copy and only if it detects a change it copies only the changed files and folders into the new backup folder. 

Seen from the hard drives point of view we now have a bunch of hard links in the backup folder with only a relatively small number of changed files sitting next to them. However, looking at it with a file manager it is a 1:1 copy of the original source folders at the time of creation. Rinse and repeat and we get a collection of snapshots of the state of our source directories.

 Done in a rather space efficient way a hourly cron job may pack a large number of snapshots next to each other but eventually the drive will still overfow, so we need to teach that digital memory how to forget. We need a range limit so when the backup folder names get shifted up and the oldest exceeds the given limit this folder gets deleted.

Up to now we just created houly snapshots, now we introduce more time classes, i.e. daily, weekly, monthly, seasonly, yearly. Add range limits for each of them and a script to add the latest of one time class as the first of the next (and shift the names of the folders up) and we have a clockwork of backups remebering the major changes on our sources for a long time while slowly forgetting the details.

After some time running we have in our snapShot folder hourly.00 .. hourly.23, daily.00 .. daily.06, weekly.00 .. weekly.03, etc. Performance depends mostly on hd througput and the size of the source data, cpu is only the third important factor.

In praxis, writing a lot of data onto a huge but somewhat slow external usb drive with a crypted filesystem I found  that two lines of the original script took most of the time: 
deleting the the oldest folder with rm -rf and creating the hard linked copy with cp -al. So I modified the script to calling rsync with the --link-dest option which turned out to be much faster. And Instead of deleting the recycled backup immediately I move it into a trash folder which in turn gets emptied by a clean-up script late at night when it doesn't hurt. 


If you have locate/updatedb installied (as many distros have) you might want to adapt the config at /etc/updatedb.conf since all those snapshots create an ocean of file pathes and you really don't want to get them listed if you try to locate a file. And running over the snapshots makes updatedb take forever...

In /etc/updatedb.conf there is PRUNEPATHS="", a space delimited list of directories which should not get included into the locate - data base. Just put the path to your snapShot directory into this.


Trackback specific URI for this entry
    No Trackbacks


    No comments

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
To leave a comment you must approve it via e-mail, which will be sent to your address after submission.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.