Rob van der Woude's Scripting Pages

Backups

In a perfect world, nobody would make mistakes, disaster would never strike, and backups would not be necessary.

In real life, however, computers do sometimes crash or get infected with malware, to name only a few of many threats your data is exposed to.

We all know, of course, that we should make regular backups to prevent the loss of data turning into a disaster.
Well, in fact, just making backups does not suffice, verifying the restore procedure is equally important!

Many true and sad stories exist about backups without restore procedures.

My personal "favorite" is the one about an employee riding 12 kilometres on his bicycle, 5 days a week, to swap backup tapes in a remote unmanned facility, only to find out after 6 years, when the server crashed, that all backup tapes were empty because no one had ever thought about selecting any files to include in the backup.

I recently read a report stating one third of small businesses do not make backups at all.
Though my own guess would have been worse, it still is a shocking number, considering another report stated that 90% of businesses don't survive a major data loss.

The number of private computer owners not making backups are, most likely, far worse.
Though the financial impact for private computer owners won't be as dramatic, it really does hurt to lose one's family photo album.

So, look at backups as an investment.
It is like a health insurance policy: consider yourself lucky if in the end you have invested "for nothing".

If we agree on that, the next question is how much (time and money) you are willing to invest.
It isn't realistic to try and prevent every type of mistake or failure, but it is possible to get close.

Apart from clustering, the best option would be multiple full backups (plus at least one image backup for disaster recovery) every day, written on read-only media, and a duplicate stored off-site:

Note: Malware is a special case of disaster, since it may strike weeks or even months after your systems are infected, in which case your (off-site) backup sets may be infected as well.

The table below lists some of the techniques used to protect and backup your data, and what they can and can't do.

I included several techniques that can improve the availability of your data, like RAID, clustering and version control, but using these techniques does not mean you can forget about true backups!

 

Technique (1) Protects data from (2) Advantages/Disadvantages
Deleting Overwriting Disk Failure Computer Crash Disaster
RAID & disk mirroring ✔️ ✔️ 👍 Fast disk recovery, improved performance
👎 Redundancy at the cost of disk space
Server clustering, single site ✔️ ✔️ 👍 High availability, load balancing
👎 Expensive, complex to setup
Server clustering, multiple sites ✔️ ✔️ ✔️ 👍 Even higher availability
👎 Very expensive, complex to setup
Image backup ✔️ ✔️ ✔️ ✔️ ✔️ 👍 Single all-purpose backup, including disaster recovery
👎 Slow backup (much data), slow restore of individual files
Full backup ✔️ ✔️ ✔️ ✔️ ✔️ 👍 Fast restore (less media sets) compared to incremental
👎 Slow backup (more data) compared to incremental
Full + incremental/differential backup ✔️ ✔️ ✔️ ✔️ ✔️ 👍 Faster backup (less data) than full backup
👎 Slow restore (more media sets) compared to full backup
External harddisk, full copy (3) ✔️ ✔️ 👍 Simple; fast backup & restore
👎 Only 1 version of each file (4)
External harddisk, incremental copy ✔️ ✔️ ✔️ 👍 Fast backup & restore; keeps last version of deleted files
👎 Only 1 version of each file (3); full restore includes deleted files
Revision Control (a.k.a. File History) ✔️ ✔️ ✔️ 👍 Fast restore of any version of any file
👎 High disk space and CPU demand
Undelete ✔️ ✔️ 👍 Fast restore of the last version of a deleted file
👎 Deleted files may be overwritten soon

 

Notes: (1) Not all techniques listed here are backup techniques, but they all help improve the availability of your data.
For the backups multiple (rotating) backup media sets are assumed, stored off-site.
  (2) Protection of data from deleting means you can restore files that have been deleted.
Protection from overwriting means you can restore files that were accidentally modified or overwritten or corrupted by malware.
Protection from disk failure means that your data is safe even when a harddisk fails, or that you can restore your data on a replacement harddisk.
Protection from a computer crash means that your data is safe even if a server fails, or that you can restore your data on a replacement server.
Protection from disaster means that your data is safe even if disaster strikes at your site, or that you can restore your data on a replacement server.
For disaster recovery, backup sets must be stored off-site.
The amount of data you lose depends on the backup frequency. All modifications made since the last backup will be lost (except for Revision Control, where all modifications made since the last time you saved a file will be lost).
The number of file versions you can restore depends on the backup frequency, the maximum age of the files you can restore depends on the media rotation schedule.
  (3) Full copy in this context means that an exact copy of the data on the 'source' drive is kept at the 'target' drive, e.g. ROBOCOPY.
  (4) One version of each file per backup or disk; it is possible to rotate disks and/or to rotate the destination directory to allow for multiple copies.

 

It is better not to rely on a single backup technique, but to combine several techniques.

For your servers, you'll need at least some disaster recovery techniques (to restore a crashed server from scratch, on new hardware, fast): maybe a combination of mirroring, clustering and image backups of the boot partitions.

For the data on the servers you'll need full backups on rotating media sets, either daily (or even multiple backups daily) or weekly plus daily incremental/differential backups.
A media rotation schedule that is often used is daily incrementals (or differentials if you don't mind the larger backup size), rotated bi-weekly, weekly full backups rotated bi-monthly, and monthly backups rotated per year.
An alternative option found in some enterprise backup systems is to keep only a fixed number of file versions. In practice, I have always found it hard to explain to customers why I could restore their reports from 3 years ago, but not their e-mail from 3 weeks ago.

You probably need to archive some data as well, which requires non-rotating media sets.
Keep in mind that you'll need to test and/or rebackup (restore and backup again) archived data on a regular basis and with every upgrade of your backup software or replacement of your hardware (tapestreamers etc.), as no backup medium nor backup file format will live forever.
Backups on CD/DVD may be easy to restore on other hardware, but the expected 30 years lifetime for (pressed) music CDs does not apply to (recorded) CDR/DVD±Rs (3 years is a more realistic figure).

Notes: Database servers (including e-mail servers) will often require special backup software, because of the huge size of the modified files and (fulltime) availability requirements.
  Virtualization of servers can speed up image backups (actually: VM backups) and restores enormously.
  Cloud backup is gaining momentum fast.
One of its biggest advantages is that the data is stored off-site immediately. The major downside is that you have no control over what happens to your off-site data.
For non-sensitive data it is arguably the best way to make off-site backups, though keeping your own off-site copies on physical media is still recommended.
For sensitive data additional encryption before backup is recommended (do make sure the decryption keys are safely stored somewhere else, off-site).
Another disadvantage is the enormous amount of time required for the initial backup (after that, incremental backups take far less time).

Summarizing and over-simplifying:
  • if you don't want to setup and maintain a complex backup schedule
  • but you don't want to lose a single file of your data
  • and your total amount of data is only a couple of Gigabytes
  • and you don't care about prying eyes
nothing beats file history combined with cloud synchronization (e.g. make your local Dropbox or OneDrive or any other cloud provider folder the target folder for file history); but you still need to store some backups off-site on read-only media to protect your data against malware infections.

A final sidenote: even if you can restore your files, you still need the right software to access restored data, e.g. I still need to keep a copy of the WordPerfect installation disks (and the serial numbers) to make sure I can read my old documents.


page last modified: 2022-10-23; loaded in 0.0072 seconds