Ichimusai

Photos and other rants

Meny Stäng

Etikett: backup

Borg Backup

The Borg

A very useful utility is the Borg Backup system, or just Borg. It’s a deduplicating backup system meaning that is scans the files and when it finds data that is already in the backup the data in the second and all other subsequent files are replaced with a reference to the first instance of that data.

The idea is that the same data is only stored once. All the backups you take after the initial one stores only the differences and the new data that has been accumulated since the last backup. This means that backing up after the initial backup is done is very fast, efficient, saves bandwidth and storage space.

Traditional backups have usually a full backup every month and then they take increments daily or so. If you need to restore a file you need to take the latest full backup, then apply each increment that was taken after. With the borg backup that is not necessary as you can view the file system exactly as it looked upon each and every backup point taken.

In fact you can mount the whole backup as a file system and then traverse it from there. It’s very effective. So let’s get started because face it, you don’t back up as much as you should do!

Borg can be used for multiple platforms but my commands here will be for linux.

The first step is to create a repository, this may sit on a different machine, NAS, attached USB drive or even on the same machine, of course you want multiple backups really so you can take the borg backup locally and then rsync it to as many locations as you feel is necessary.

Take the backup

The first step is to create the dir of the the backup repo and then we need to initialize it for being used with borg. This is quite simply done as:

$ sudo mkdir /bup
$ sudo borg init /bup

When the repository has thus been created it is time for the first initial backup. The format should be clear in a bit, it’s not complicated and can look like this:

$ sudo borg create --progress --info --stats /bup::lenovo-170202_163423 /home /root /boot /etc /var

The command above should be a single line. The first thing we give to borg is the command, in this case it is create to create a new backup set for us. Then we have some flags, –progress shows a progress indicator while borg is working that details also the number of bytes being read, backed, compressed and deduplicated. The next –info sets the information level borg presents to us and –stats lets borg summarize the operation with some statistics.

The next part of the command the /bup::lenovo-170202_163423 specifies the backup location and backup name. The name is given after the double colon :: mark. In this case its composed of the date yymmdd and time hhmmss of when the backup is started, doing that makes it easy to find the right set of data later when a restore is needed.

Why did I prefix it with lenovo? Well my main linux laptop is a lenovo and I also have other computers, like an ASUS laptop etc.  The beauty with deduplicating backups is that I backup multiple machines to the same repo. By doing that it will deduplicate across the machines and if I have the same files and data in multiple places it will just be replaced by references to the data that is already in the backup.

The final part of the command is just all paths I want to include in this backup. They can vary from time to time. I might backup /home daily but /root only once a week if I want. No problem at all with borg.

Restoring a backup

No system of backup is actually deployed before you have attempted and successfully retrieved data from it so that you know what to do in an emergency as well as being able to extract old data mistakenly erased or restore a full system after a hard drive crash.

Restoring a borg backup works a little different from what you may be used to. First of all you can of course extract the data fully or just single files if you know their paths just like with any other backup system. The restore command is called extract in borg.

$ sudo borg extract /bup::lenovo-170202_163423

This will extract the entire archive and then you can move the files into their respective locations. You can also extract for example only the etc folder from the archive:

$ sudo borg extract /bup::lenovo-170202_163423 etc

Extraction always writes in the current working directory. Therefore you should first extract then move the files into their correct location in your file system or if  all the backups are taken from the root of the file system / then you can cd there before extracting but I recommend extracting on a different volume first and then restoring from there. The reason is that there is usually a lot of stuff in a backup that you may not always want to restore.

Mounting the backup as a file system

So borg actually offers another way also. You can mount the backup as a volume, or you can mount the whole repo and see all the backup points made, select which one you want and then just copy the files from there to the live system.

$ sudo borg mount /bup::lenovo-170202_163423 /mnt

This will mount the backup lenovo-170202_163423 in the file system at /mnt. You can then cd to /mnt and then use cp etc to copy the files to their right places.

When done you can dismount it (otherwise other processes can’t backup, the repo is locked while mounted)

$ sudo umount /mnt

Borg uses fuserfs to mount local directories.

You may also mount the whole repository:

$ sudo borg mount /bup /mnt

Now when you go into the /mnt folder you will see all your backup names as directories:

$ ls
161204_040001 170101_203409 170113_040001 170117_040001 170121_040001 170125_010344 170128_030332
161206_040001 170108_040001 170114_040001 170118_040001 170122_214910 170125_040001 170128_040001
161218_174848 170111_040001 170115_040002 170119_040001 170123_040001 170126_040001 170129_040001
161225_040001 170112_040001 170116_040001 170120_040001 170124_040002 170127_040001 170201_082851

As you can see I generally name my backups with YYMMDD_HHMMSS just so it’s easy for me to find a specific date.

I can then cd to one of them

$ cd 170112_040001
$ ls
boot etc home root var vmlinuz vmlinuz.old

When done, don’t forget to unmount the archive as no new backups can be taken while it is mounted.

There you go. Start using.

 

Linux snapshots med rsync

En klurig sak man kan göra för att ordna med snapshots om filsystemet man använder inte stöder det är att göra ett lokalt backupskript som genererar sådana med jämna mellanrum. Men för att spara diskplats skulle man ju vilja ha möjligheten att bara spara förändringarna mellan varje snapshot samtidigt som det vore lätt att gå tillbaka t.ex. tre dagar i tiden utan problem.

Detta kan lösas med hårda länkar i Linux och det finns en hel del skrivet om det på nätet. Det intressanta med hårda länkar är att en fil som är länkad på det sättet fortsätter existera tills den sista länken är borttagen. Det är alltså ingen som helst skillnad mellan en hård länk och den egentliga filen.

Om du skapar fil A och sedan hårdlänkar den till fil B så är A och B på riktigt alltså samma fil. Om du ändrar A så ändras också B. Däremot om du raderar A så raderas inte B, länken är då bruten och man kan säga att B är lika mycket originalfilen som A en gång var. Nu kommer det riktigt intressanta: Om du skriver en ny fil A så existerar den separat från B. Länkningen mellan dem är bruten från när du raderade den.

En fil raderas egentligen aldrig men när den har 0 hårda länkar är den inte längre åtkomlig och platsen den tog upp på disken är nu fritt villebråd för andra filer att använda.

Ett intressant fenomen med rsync är att när den skriver filer gör den alltid delete på dem först! Eller egentligen gör den ”unlink”, det är ju ett bättre namn. Därför kan vi börja med att göra en rsync på filerna vi vill bevara. Exempelvis är det vanligt att man vill backa upp /etc /home /root i ett Linux-system.

Först skapar vi någonstans att hållas:

# mkdir /bup
# chown root:root /bup
# chmod 700 /bup

Därefter kan vi synka katalogerna till /bup/snapshot med kommandot:

# cd /bup
# rsync -a --delete /etc /home /root /bup/snapshot/

Om vi kör ovanstående (som root) kommer vi få en backup på de tre utpekade katalogerna i /bup/snapshot och de kommer vara kopior av de riktiga filerna. Nu kommer finessen. När vi vill spara vår snapshot kopierar vi dem men gör bara hårdlänknin från kopian. Genom att göra detta tar vi inte upp nämnvärt med displats och vi länkar till samma data på disken!

# cp -al snapshot snapshot.1

Du kan verifiera detta genom att slå

# du -sh *

Du kommer då se att det är betydligt mindre data i snapshot.1 än i snapshot och det beror på att det är bara länkarna i sig som vi har sparat på.

Nästa steg är att ta en ny snapshot med rsynk. När vi gör det kommer länkarna mallan snapshot och snapshot.1 att brytas i de filer som rsync uppdaterar eftersom den gör först unlink, sedan skriver en ny fil till den!

Om vi vill ha dagliga snapshots som roterar t.ex. tre dagar bakåt i tiden kan vi köra detta skript:

#!/bin/bash

cd /bup

if [ -x snapshot.2 ]; then
    mv snapshot.2 snapshot.3
fi

if [ -x snapshot.1 ]; then
    mv snapshot.1 snapshot.2
fi

if [ -x snapshot ]; then
    cp -al snapshot snapshot.1
fi

rsync -a --delete /etc /home /root /bup/snapshot/

Lägg sedan upp detta som ett cronjobb genom att redigera crontab (som root) med

# crontab -e

Lägg sedan till en rad exempelvis:

00 04 * * *     /root/backup-daily

Spar sedan skriptet ovan som /root/backup-daily så körs det kl 4 varje morgon så du alltid har en snapshot att gå tillbaka till om du gjort något klantigt i din hemkatalog…

 

Using tar for backing up your data

Tar (tape archiver) is an old Unix-command that has been largely forgotten among people who are not in touch with the unix world daily today. In several forums I hear people asking how to back up files in Linux in a simple and efficient way and what to use. Most seem to prefer a graphical solution but some are happy with a command line version as well.

Personally I generally distrust graphical backup softwares. It puts a layer between you and what is actually going on that is unnecessary and those that are not just graphical shells on top of programs such as tar are usually proprietary and you can’t rely on that there is something that can read the archives in even five years time.

Tar is a little different, it has been proven over time to be one of the most efficient and well functioning backup solutions. However, people today have generally forgot how to perform full archiving, incremental backups and differential backups using tar properly.

And, as always any backup solution that fails at restoring your data now or in the future is doomed from the start. Tar builds on a format that many archive handlers can read not to mention the source code is open source and freely distributable and not likely to disappear any time soon.

Types of backups

There are generally three type of backups that we will be discussing here.

Full archive

The first and by far simplest one is a full archive. This means that everything is archived. This is generally a very time consuming and space consuming task and not something you would want to do every day. The full archive is however the simplest to restore and does not need any special considerations except that you might have to split it over several volumes depending on your media, be that tapes, CD-R/DVD-R or hard disk volumes.

Personally I prefer hard disk volumes as my backup media. A full archive for me is somewhere clodse to 700 GB so using CD-R is not really feasible (945 volumes ca) and DVD-R is not that much better (170 volumes). Tapes are probably scarce today and their capacity is usually even lower than the optical medias so harddisks is what I use. I get a Western Digital MyBook storage disk (USB2 connection) and just disconnects it between backups. This way data should not be eraseable unless you physically plug it in.

Incremental backups

Then we have incremental backups. Incrementals work like this. First you dump a full archive with everything in it. Then periodically you back up everythin that has changed since the last time. This is a very efficient backup method if you want to make backups often to minimize data loss if there is an accident. The downside of it is that you will quickly come to have lots of files to keep track of and even more important, a restore operations means that you must restore all the files in the order they were created. This is very time consuming and more risky that something goes wrong.

However, incremental backups are incredibly popular also, they are usually fast. The more often you backup the faster the backup goes, at least in theory.

Differential backups

Then the last type we will be talking about are differential backup. They start out just like incrementals with a full archive copy of everything. Then everytime you backup you backup everything that has changed since the full archive was made. The difference here to incrementals is that you only have two active files at any time, the last full archive and the differential archive. A restore operation is therefore very efficient and a two-step operation only.

The downside with differentials is that over time the diff file will grow since more and more files have changed from the time of the last full archive, and therefore the efficiency over time is not great. When the differential has grown to the size of something like 50% of the full archive then it may be better to make a new full archive and start over with the differential.

Using tar

Using tar to perform a full backup is done like this:

tar -c -v -f archive.tar /home/

Using tar in windows under the Cygwin package you would have to change the /home/ path to /cygdrive/c/Documents\ and\ Settings/ or something similar because that is where your personal data will be located on the computer (unless your ”My Documents” has been moved to a different location for some reason.

Using tar to perform incremental backups requiers a two-step process. First you creat a full archive but with a separate date stamp file:

tar -g incremental.snar -c -v -f archive.0.tar /home/

The -g option is the same as –listed-incremental=incremental.snar option and allows the tar to store additional metadata outside the archive that can be used to perform increments later.

tar can also do without the external file, but since this put non-standard meta-data into the tar archive itself it is not recommended since it might break compatibility with non-gnu tools.

The next level or the first increment is thus performed such as:

tar -g incremental.snar -c -v -f archive.1.tar /home/

Since the incremental.snar file already exists only files newer than files referred in the meta-data file will be dumped. The meta-data file incremental.snar will be updated and you will have your first increment.

Keep going like this for each increment. When you want to perform a full restore again use a new incremental.snar file or delete the old one. The meta-data file is not necessary in order to restore the file system.

Restore is done with

tar -g /dev/null -x -v -f archive.0.tar 

Repeat this for each increment you have done, i.e. archive.1.tar, archive.2.tar and so on. Remember that when using tar incrementally it will try to recreate the exact file system, i.e. it will delete files that did not exist when the archive was dumped. Therefore you will see the file system change until you have the last increment in place and it will be fully restored.

Differential files are simplest done by dumping files that have changed on or after the date of the full archive. In order to do this, create the full archive first. Then note the time stamp of the archive (I put it in the file name of the archive) thus:

tar -cvf full-archive-2010-05-01.tar /home/

Then to create a differential for all files that changed since the 1st of may 2010 you can perform the following:

tar -N 2010-05-01 -cvf diff-archive-2010-05-05.tar

The new archive will contain all the files that has changed on the date or later dates that you give to the -N option.

The next differential is created in the same way but at a later date. After that you may remove the old differential since it will be superseeded by the new one.

To restore simply untar the full-archive and then the latest differential. When those two operations have finished your file system is up to date again.

This version of the command will however NOT delete any files from the file system as the incremental version will do.

tar -xvf full-archive-2010-05-01.tar
tar -xvf diff-archive-2010-05-05.tar

That’s it for this time. Have fun with tar.

 

Backup your Windows Mobile

Dotfred’s Space has a backup software free of charge to back up everything on your Windows Mobile device. The problem here is that WM required you to back up contacts and so on using an MS Exchange server. This software allows you to create a backup file and then restore it and does not rely on any other software to do it.

Great stuff if you are using a Windows Mobile based PDA/Mobile phone such as the Ericsson X1 or the HTC series of mobiles. Works with both touch-screen and non-touch-screen mobiles.

Att lagra bilder för framtiden

Ända sedan den första riktiga kameran såg dagens ljus 1826 så har man lagrat bilder på olika medium även om slutprodukten – en pappersbild – ofta är väldigt lika i dag som den såg ut över 200 år sedan så är negativen väldigt annorlunda. Från början använde man silverbelagda glasplåtar, sedan övergick man mer mot fotokänslig emulsion, i mer modern tid använde man kamerafilm av cellulosa med en ljuskänslig silveremulsion och nu för tiden fotograferar de flesta digitalt även om film fortfarande förekommer så vinner digitalfotot hela tiden mark.

Under hela tiden har man haft problem med att lagra negativ på ett säkert sätt. Emulsionen och silverbeläggningen trots framkallning och fixering åldras och det blir svårare och svårare att framställa kopior från originalen. Men trots detta kan man 200 år efter de första fotografierna fortfarande, om än med viss svårighet, använda negativen.

Om vi skall omsätta detta i digitala termer blir problemet uppenbart. Handen på hjärtat hur många datorer som säljs idag har ens diskettstation? Hur många har kvar 5 1/4″ (fem och en kvarts tum) disketter? Har ni ens sett en 8″ (åtta tums) diskett användas? Var hittar man en sådan gammal diskettläsare? Och nu pratar vi inte 200 år tillbaka i tiden, vi pratar knappt 20 år. 

Inte nog med att digitala media går väldigt snabbt framåt och åldras mycket snabbt, arkivbeständiga media är det väldigt snålt med. Jag pratade med en arkivarie en gång som berättade att till de digitala media som räknades som arkivbeständiga med en livslängd på 150 år eller mer räknade man i princip bara hålremsa och hålkort som måste tillverkas med speciellt syrafritt papper, s.k. arkivpapper. På disketter lagrar man informationen med svaga magnetiska signaler som faktiskt med tiden klingar av så sakta och blir allt svårare att läsa tills informationen i princip inte länge kan skönjas. Tidsperspektivet här är kanske något tjugotal år. Magnetband lider av en liknande problematik och vem kan ens få tag i en trådspelare för att spela upp magnetisk tråd som föregick rullbandspelarna som gjorde brett intåg på femtiotalet?

Hårddiskar har samma problem som disketter, de måste underhållas ofta för att man skall kunna känna sig säker på att informationen fortlever och finns kvar. I princip måste man räkna med att en hårddisk är kasserbar efter ca fem år. Dels för att gränssnitten förändras men även för att datat på den sakta klingar av och drabbas av felaktigheter. Det går inte att lägga undan en hårddisk i ens tio år och förvänta sig att man kan använda den efter detta.

CD och DVD lagrar informationen optiskt, med en laser så skjuter man små hål i ett informationsbärande aluminiumskikt i skivan. Detta skikt ligger precis under ytan på ovansidan av skivan, minsta repa här kan göra den helt oandvändbar. Ovanpå aluminumskiktet finns olika lackskikt och skyddsplast och andra lager och en speciell typ av lim används för att foga samman lagren både i de skivor man bränner själv och de skivor som pressas i fabrik för kommersiella produkter från en glasmaster.

Ingen av dessa har en särskilt lång beständighet, jag har själv CD-skivor som har börjat bli förstörda av ålderns tand utan att de för den skull blivit repiga eller liknande. En del är skivor jag i princip aldrig lyssnar på de ligger mest i en kartong. Skivor man bränner själv har knappast mer än 3-5 års livslängd även om de aldrig spelas. Förvaras de mörkt och i rätt luftfuktighet kanske de kan hålla några år längre men glöm 200 år. Dessutom utvecklas tekniken hela tiden. Just nu är den bakåtkompatibel, en gammal CD kan spelas i en modern DVD-läsare i en dator för det mesta men inget säger att detta måste fortsätta ens tio år till.

Vad har vi kvar för olika medier egentligen att fundera över? Gamla minnestyper som ferritminnen och trummor och liknande kan vi slå ur hågen för de har dels alldeles för liten lagringskapacitet och finns inte att få tag i kommersiellt längre. Hålremsa och -kort är dött, det går knappt uppbringa en sådan läsare eller stans för pengar och även om det gick så är kapaciteten väldigt låg, en enda bild från en modern kamera skulle medföra buntvis med hålkort eller rejäla rullar hålremsa så vi kan utesluta dessa helt, trots arkivbeständigheten.

Det egentligen enda vettiga lagringsmediet idag är nog hårddisken och jag tänker särskilt på den löstagbara USB eller FireWire-anslutna hårddisken som har egen strömförsörjning och kan flyttas lätt. USB och Fire-wire som interface har funnits några år nu och kommer nog inte försvinna i brådrasket. Så även om själva hårddiskarnas gränssnitt har vandrat från IDE och SCSI till SATA och liknande så är fortfarande datagränssnittet USB eller Fire-wire detsamma nu som för fem år sedan. När tekniken så tar ett nytt trappsteg måste datat flyttas till det nya interfacet och det är just detta som är grejen, datat måste underhållas för att format och media skall kunna läsas även i framtiden.

Det går inte att bara låsa in mediat och tro man kan använda det om 20 år, det måste hela tiden transporteras till nya media och följa utvecklingen.

Till sist ett tips: Många hårddiskar kan själv fixa sektorer som börjar bli dåliga. Men för att de skall upptäckas måste ibland hela disken läsas igenom. Detta är en av orsakerna till att det är så bra att ta fullständiga backuper minst en gång i månaden. På de viset hinner systemet upptäcka dåliga sektorer innan de är så dåliga att data tappas. På moderna diskar finns ett antal extrasektorer som ”mappas in” där det behövs. Det förlänger livslängden på disken och ökar dataintegriteten.

Backup your photographs

You never know, the accident is closer than you might think. A few days ago my workdisk started having funny noises and thrashing about. I quickly checked that backups was running as they should and just for the heck of it made sure I got a full backup of the whole thing before it went dead.

Then I went out and bought myself a new USB attached disk, 1 TB of capacity is a lot but I am sure me and my camera will fill it soon enough. So I started transferring data and then the old disk just died. Flat out died. Lucky to have a backup on my other disk I started transferring the backup onto the new disk. A few percent in it started sounding bad and then the disk failed.

Not knowing what to do I quickly dismantled my backup disk, another external disk although attached by firewire instead of USB. The disk was so hot I could not touch it so I quickly realized the problem was overheating, it wasn’t really designed for extensive data transfers like this. So I let the disk cool down a bit and rest, then I disassembled it completely so it could run in free air, I used the cabinet fan to cool the hotest circuit, I put ice clamps around the back side of it and the sides and right now it is running fine, I have extracted all my photos up to 2005 so far.

This just goes to show that you can’t be too careful. Keeping your disks cool are important but keeping more than one recent backup is even more important. I have also considered getting a disk to swap with a friend to put backups on to it and I will push and see if she is interested. Then we can back up and meet up bi-weekly or monthly or so and exchange the disk. I am scared of what would happen to my collection of photographs if there was a mishap, burnout, smal fire, accident, electic overload, lightning strike… well there are many things that may potentially damage more than one disk at a time.

We are talking several hundreds of GB here people. CD’s are not an option, a full backup would mean close to 600 CD’s to burn. Just the sheer time of it is incomprehensible, but let’s say a full data CD takes 8 minutes to burn top speed. That’s 80 hours. Two full working weeks.

DVD’s are not much better either, they have a magnitude up on the CD’s or so but it is definitely not an option either, a full backup there would be 150 DVD’s or so, they take longer to burn than CD’s so I bet the time necessary to do that is about the same.

That’s why I use hard disks to store everything and I then schedule backups from one to the other. The whole idea is that a single point of failure should never mean a loss of data. My worries after tonight is that if there is a second point of failure I might be toast. I don’t like that.

Yeah, I got much stuff on my Flickr account, but not the negatives, the RAW files, the working copies, the drafts the ORIGINALS and more important my own notes and stuff. I need a solution for this soon. The broadband ISP has a solution with online storage but only 10 GB. I need at least 60 times that in order to feel safe.

It’s not an easy solution, disks do go bad, the only thing you can do to make sure your data stays around is to copy it and copy it again and keep it current. Digital data screams to be copied of course but it scares me sometimes how easily I could lose everything I have done.

And don’t talk to me about RAID systems. I have trouble shooted and fixed enough electronic systems to know that when stuff runs together it needs only to be an overvoltage problem in a single power supply and your whole raid stem is fried.

My backup copying is now at May 2005. Perhaps I will make it this time as well. I swear, I will buy a new backup disk on salary day.

I use tar as my backup software almost exclusively and here is the trick:
Every week or two weeks or when you think it is necessary you make a full backup
Every time there has been changes or just daily to backup the changed files since the last full backup

Using tar this is pretty simple, in the first case

# A full backup of my entire home catalogue

tar c -f backup.tar /home/ichi

# I normally never compress my archives. There are two reasons for this and
# the most important is that there is a better chance to save the contents
# if the file is damaged if it is not compressed. The second reason is because
# of the large amount of data compressing / decompressing takes a lot of time
# the full backup is already several hours of work.

# An incremental backup since my last full backup

tar c -x -N 2008-10-21 -f backup-partial-`date -I`.tar /home/ichi

# The -N flag backs files that were touched after this date (and on this date).
# so in this example the last full backup was at 2008-10-21. All files touched on
# this date and later will then be backed up.
# The `date -I` is a nifty trick inserting todays date in the file name