[prev in list] [next in list] [prev in thread] [next in thread] 

List:       bacula-users
Subject:    Re: [Bacula-users] feedback requested on new backup strategy
From:       Phil Stracchino <phils () caerllewys ! net>
Date:       2021-11-08 20:19:21
Message-ID: 42213edc-fe57-6e6b-5f82-a6b1b4b3ab59 () caerllewys ! net
[Download RAW message or body]

On 11/5/21 13:43, mark.bergman@pennmedicine.upenn.edu wrote:

> We're using the GPFS filesystem, and doing filesystem snapshots every
> 15 minutes, with a limited set retained for at least 2 months. The
> snapshots allow for almost instant restores of recent data and comparison
> between different versions of files, without system administrator
> intervention.
> 
> Because of snapshots, I'm planning to eliminate all nightly incremental
> & differential backups to tape. Tape backups would be only for
> archival/disaster-recovery purposes and for compliance with grant and
> data management requirements.

I'm not sure this is wise.  Remember that if you lose an array, you lose 
all of its snapshots too.  Or would you consider that a 
disaster-recovery scenario?


> The new strategy would be to do a full backup every 2 months, kept for
> 5 months. One backup would be kept for at least 2 years, the others would
> be rotated (media reused). For example:
> 
> 	January 2021		keep until January 2023
> 	March 2021 	 	keep until August 2021
> 	May 2021 		keep until October 2021
> 	July 2021 		keep until December 2021
> 	September 2021		re-use March 2021 media, keep until February 2022
> 	November 2021		re-use May 2021 media, keep until April 2022
> 	January 2022		keep until January 2024

This is going to be complex.  I think it's DOABLE, but you will need a 
complex set of Pools and Schedules because of the way you're setting up 
multiple rentention times for backups of the same jobs at the same levels.

What you might need to do is run all of your Full backups to one Pool 
that has five months retention, then every six months run a Copy job 
that archives the most recent set of Full backups to a second Pool with 
two years retention.  This is probably the method that will result in 
the least tearing out of hair.


> All tape backups would be done from a snapshot, so that no files within
> the source of the backup change during the process. A "run before job"
> script would dump coherent copies of databases, then create a filesystem
> snapshot dedicated to the backup. That snapshot would be removed when
> the backup is complete.
> 
> We've got about 700 top-level directories for user accounts and research
> projects. We'll probably run an individual backup job for each group of
> directories alphabetically (A*, B*, etc), so that the 400TB will be spread
> (unevenly) across about 45 Bacula jobs.


This MOSTLY seems sound, with the proviso that I am not familiar with 
the details of GPFS.  But I've implemented similar schemes on top of 
Solaris' ZFS.

The scheme of backing up snapshots is sound and a good plan.  It 
entirely sidesteps the problem of files being changed while they are 
being backed up.  Does GPFS offer you a way to create incremental 
snapshots containing the changes since a stipulated previous snapshot? 
That might be a way to get viable intermediate incremental or 
differential backups.



-- 
   Phil Stracchino
   Babylon Communications
   phils@caerllewys.net
   phil@co.ordinate.org
   Landline: +1.603.293.8485
   Mobile:   +1.603.998.6958


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic