My Personal Data Backup Strategy

considering update frequency, importance, expected history duration, etc.

Background.

My backup hard drive broke last week and I lost some pretty good memories of my early software & hardware projects. I suddenly realized the importance of backup data safety. Considering several factors of the files or folders, I can determine the proper configurations of the backup:

  • Update Frequency.
    • Is it updated in a weekly, monthly, or yearly manner?
    • Some ongoing git project can have daily update.
  • Importance.
    • Is it tolerable to lose the data?
  • Expected History Duration.
    • Do I care the history version of files, and how long I would like to have it?
    • Maybe I want to check for deleted files long time ago, like some experimental data.
  • Frequency to Retrieve.
    • How often do I need to retrieve the data?
    • Some incremental backup strategy have slower and slower data retrieve when the incremental path is long.

Currently I use Macrium Reflect backup software, which offers rich functionalities as well as stability. The business version is $75 for each machine, plus $15 annual fee.

Backup Plan

Since every hard drive has only 3 to 5 years lifetime, it's essential to backup data in a rollover manner on at least two hard drives. I bought a 4TB WD hard drive on Aug 2021, so the next hard drive I should buy is on Aug 2023, and then Aug 2025, etc. Each new hard drive should have at least 4TB higher than the existing data volume, to be able to hold incremental data for the next 4 years. Price: taking WD My Book Desktop hard drive as an example, 8TB costs $200, 12TB costs $300, 16TB costs $380.

Assuming there are always two hard drives working simultaneously, each of them should hold the same data, but backup independently. In order to minimize probability of permanent data loss, I put one of them in my office and the other at home. All hard drives come with the same unique password that's never used for other accounts (anyway if one is able to hack one of the hard drives, he should've already got all the data).

Also note that, the two hard drives should have interleaved backup time, so to minimize the amount of data loss when an accident really happens. That is, if the backup plan is weekly, then one could backup on Friday and the other on Tuesday. Although the data on these two hard drives are not exactly the same, it doesn't matter in the long term.

The discarded hard drive (after 4 years rollover cycle) should be either stored securely or fully reformatted to be used in other unimportant scenarios (like storing movies).

1. Backup plan of Thinkpad X1 Extreme 2021.9

1) Github + Owncloud

This backup includes folders of my Github projects as well as the synced folder of my Nextcloud server. All my recent research project is in Github folder (with experimental data ignored by git but are expected to have proper backup in the hard drive). My materials and notes for courses are in the Nextcloud synced folder. This synced folder throws out data after 1 year to save space in my server, but it's essential that I can find them in the backup. Weekly backup is comfortable to me, so the worst case would be re-do the work for the last week (or half the week under the interleaving backup strategy).

Plan:

  • backup weekly
    • full backup every 16 weeks
    • incremental backup every week

2) Whole Disk

This backup plan includes the two 1TB SSD on my computer. It can be used to rescue operating system if my computer breaks accidentally. It can also be used to create virtual machines so that I don't have to wait for a new laptop before I continue to work.

Plan:

  • backup every 12 weeks
    • full backup only once
    • incremental backup every time

Since I seldom use a computer for more than 4 years, the whole disk backup should be no more than 16 incremental images and should not impact the performance a lot. When a new computer is replacing the old one, do a full backup on that one after installing frequently used software!

Avatar
Yue Wu
2-nd Year Ph.D.

My research interests concentrate on computer systems, including quantum control systems, wireless systems and embedded systems.