Tar (tape archiver) is an old Unix-command that has been largely forgotten among people who are not in touch with the unix world daily today. In several forums I hear people asking how to back up files in Linux in a simple and efficient way and what to use. Most seem to prefer a graphical solution but some are happy with a command line version as well.
Personally I generally distrust graphical backup softwares. It puts a layer between you and what is actually going on that is unnecessary and those that are not just graphical shells on top of programs such as tar are usually proprietary and you can’t rely on that there is something that can read the archives in even five years time.
Tar is a little different, it has been proven over time to be one of the most efficient and well functioning backup solutions. However, people today have generally forgot how to perform full archiving, incremental backups and differential backups using tar properly.
And, as always any backup solution that fails at restoring your data now or in the future is doomed from the start. Tar builds on a format that many archive handlers can read not to mention the source code is open source and freely distributable and not likely to disappear any time soon.
Types of backups
There are generally three type of backups that we will be discussing here.
Full archive
The first and by far simplest one is a full archive. This means that everything is archived. This is generally a very time consuming and space consuming task and not something you would want to do every day. The full archive is however the simplest to restore and does not need any special considerations except that you might have to split it over several volumes depending on your media, be that tapes, CD-R/DVD-R or hard disk volumes.
Personally I prefer hard disk volumes as my backup media. A full archive for me is somewhere clodse to 700 GB so using CD-R is not really feasible (945 volumes ca) and DVD-R is not that much better (170 volumes). Tapes are probably scarce today and their capacity is usually even lower than the optical medias so harddisks is what I use. I get a Western Digital MyBook storage disk (USB2 connection) and just disconnects it between backups. This way data should not be eraseable unless you physically plug it in.
Incremental backups
Then we have incremental backups. Incrementals work like this. First you dump a full archive with everything in it. Then periodically you back up everythin that has changed since the last time. This is a very efficient backup method if you want to make backups often to minimize data loss if there is an accident. The downside of it is that you will quickly come to have lots of files to keep track of and even more important, a restore operations means that you must restore all the files in the order they were created. This is very time consuming and more risky that something goes wrong.
However, incremental backups are incredibly popular also, they are usually fast. The more often you backup the faster the backup goes, at least in theory.
Differential backups
Then the last type we will be talking about are differential backup. They start out just like incrementals with a full archive copy of everything. Then everytime you backup you backup everything that has changed since the full archive was made. The difference here to incrementals is that you only have two active files at any time, the last full archive and the differential archive. A restore operation is therefore very efficient and a two-step operation only.
The downside with differentials is that over time the diff file will grow since more and more files have changed from the time of the last full archive, and therefore the efficiency over time is not great. When the differential has grown to the size of something like 50% of the full archive then it may be better to make a new full archive and start over with the differential.
Using tar
Using tar to perform a full backup is done like this:
tar -c -v -f archive.tar /home/
Using tar in windows under the Cygwin package you would have to change the /home/ path to /cygdrive/c/Documents\ and\ Settings/ or something similar because that is where your personal data will be located on the computer (unless your ”My Documents” has been moved to a different location for some reason.
Using tar to perform incremental backups requiers a two-step process. First you creat a full archive but with a separate date stamp file:
tar -g incremental.snar -c -v -f archive.0.tar /home/
The -g option is the same as –listed-incremental=incremental.snar option and allows the tar to store additional metadata outside the archive that can be used to perform increments later.
tar can also do without the external file, but since this put non-standard meta-data into the tar archive itself it is not recommended since it might break compatibility with non-gnu tools.
The next level or the first increment is thus performed such as:
tar -g incremental.snar -c -v -f archive.1.tar /home/
Since the incremental.snar file already exists only files newer than files referred in the meta-data file will be dumped. The meta-data file incremental.snar will be updated and you will have your first increment.
Keep going like this for each increment. When you want to perform a full restore again use a new incremental.snar file or delete the old one. The meta-data file is not necessary in order to restore the file system.
Restore is done with
tar -g /dev/null -x -v -f archive.0.tar
Repeat this for each increment you have done, i.e. archive.1.tar, archive.2.tar and so on. Remember that when using tar incrementally it will try to recreate the exact file system, i.e. it will delete files that did not exist when the archive was dumped. Therefore you will see the file system change until you have the last increment in place and it will be fully restored.
Differential files are simplest done by dumping files that have changed on or after the date of the full archive. In order to do this, create the full archive first. Then note the time stamp of the archive (I put it in the file name of the archive) thus:
tar -cvf full-archive-2010-05-01.tar /home/
Then to create a differential for all files that changed since the 1st of may 2010 you can perform the following:
tar -N 2010-05-01 -cvf diff-archive-2010-05-05.tar
The new archive will contain all the files that has changed on the date or later dates that you give to the -N option.
The next differential is created in the same way but at a later date. After that you may remove the old differential since it will be superseeded by the new one.
To restore simply untar the full-archive and then the latest differential. When those two operations have finished your file system is up to date again.
This version of the command will however NOT delete any files from the file system as the incremental version will do.
tar -xvf full-archive-2010-05-01.tar
tar -xvf diff-archive-2010-05-05.tar
That’s it for this time. Have fun with tar.