--Proposal for Alternate Session Purge Method--

Do you want a new option in AISBackup? Do you want an existing option enhanced? Ask here.
Post Reply
kdmoyers
Posts: 35
Joined: Wed Jul 14, 2004 2:39 pm
Location: New York State, USA

--Proposal for Alternate Session Purge Method--

Post by kdmoyers »

Presently, AIS Backup purges sessions eldest first. This is the simplest and most generally useful method.

This is a quasi-formal proposal for an additional optional method.

First, a few definitions: for any integer x, FindFirstOne(x), or ffo(x), is the position of the rightmost one in the binary representation of the number, counting from 1 on the right. For example:

x ffo(x)
--- ------
1 1
2 2
3 1
4 3
5 1
8 4
40 4
63 1
64 7

Now, assume that every time a backup runs, a global generation number is incremented. The first time the backup runs, the generation number is 1, the second time, it's 2, etc. Call this number g. It's an internal number the user never sees, kept for the lifetime of the backup job.

Now, we organize all our sessions in date order, and index them. The youngest session is in slot 1, the eldest session is in slot m, where m is the number of sessions we are keeping.

Now we are ready to decide which session slot to purge. It's ffo(g). Pretty simple to compute, but what is the resulting behavior?

The behavior is this: every other day, yesterdays session is purged. Every fourth day, the day before yesterdays session is purged. Every eighth day, the day before that is purged, etc.

The effect is, when you examine the ages of the sessions, that they tend to follow powers of two, (with a small moving offset). For example, the ages might be:

slot age
---- ----
1 1
2 2
3 4
4 8
5 16
6 32
7 64
8 128

What advantages does this method have? It covers a much greater span of time for a given number of sessions. For example: with five sessions, the simple method covers five days. The proposed method covers sixteen days. If I need to cover 60 days, the simple method requires 60 sessions!! the proposed method requires only 7 sessions.

This, I assert, is a powerful advantage, worthy of some small discomfort.

The proposed method also acknowledges that, generally, more recent days are more likely to be needed. Distant sessions are less likely to be needed.

A useful extension of this method is to combine it with the simple sequential method by adding a small constant to the slot-to-purge. For example, if we purge ffo(g)+3, the slot ages might look like this:

slot age
---- ----
1 1
2 2
3 3
4 4
5 5
6 7
7 10
8 18

This gets us sequential days in the recent past, followed by powers-of-two days to get a longer maximum age. This maximum age is often called "horizon".

Objection one: Using this method, it is not always obvious exactly which dates are available in the back up, since the emergent behavior of the system is somewhat complex to work out in your head. You have to trust that the pattern is working. When a mistake happens and you need to restore a file, you look into the program to see what dates are available, and pick the closest date before the mistake happened. This is exactly what you would do using the Grandfather/Father/Son or Sequential methods anyway. The difference here is that, for a given number of sessions, you are more likely to find a covering session.

Objection two: it is somewhat upsetting to find that the day before yesterday's backup has already been purged! The "combined" modification above takes care of this.

Objection three: what if ffo(g) returns a number greater than your eldest slot? Just purge the eldest slot.

Objection four: couldn't you just run multiple simultaneous simple backups, each at different intervals? Yes, but this has significant disadvantages: it's more complex to administer, more complex to restore from, and most importantly it duplicates files unnecessarily. For installations with dozens of gigabytes to back up, the wasted storage can be tremendous.

Final note one: it might help user trust if a small chart showing the available backup session ages is displayed. The user can readily see the decreasing frequency behavior, and confirm his backup horizon.

Final note two: this option might be presented to the user as "how many sessions to follow powers of two purge method?" For example, if the user selects 8 sessions to keep, and 3 to follow powers of two purge method, then the purge expression would be ffo(g)+5 (because 8 - 3 = 5). If the user selects 0 to follow the powers of two method, then the expression is simply 8, and we are back to the simple sequential method.

Final note three: what to call it? "powers of two method", "Towers of Hanoi method" (after the classic puzzle game), "extended horizon method" are all possibilities.

I hope I have made a convincing case for the implementation of an optional alternate session purge method for AIS Backup. The method is simple to compute and offers much greater horizon, at the cost of some simplicity.

Thank you,
Kirby Moyers
:D
Barry
Site Admin
Posts: 1529
Joined: Tue Aug 20, 2002 3:16 pm

Another idea

Post by Barry »

I have a proposal of my own I’d like to run past all of you:

There may be a requirement to base retained backups on business events which are usually calendar based, e.g. week-end processing , month-end processing etc. There may also be processing done mid month on or the next working day after a specified date.

Here are a few definitions:

Daily: all backups except those designated weekly or monthly.
Weekly: One backup per week except those designated monthly (that fall on the same day).
Monthly: One backup per month.

The Weekly backup is to be done at a week end which is on or the next backup after one chosen day of the week, e.g. Friday.

The Monthly backup is defined as either on or the next backup after the specified date or last day of the week in the month. Month end could in fact be set as a dated mid-month but not a mid-month day of the week.

Given the above definitions there are a possible 6 daily backups and 1 weekly backup per week, rather than 7 daily backups.

Here is an example of a possible backup pruning script (this will be on a form):

Keep 10 daily backup sessions.
Keep 4 weekly backup sessions taken on or after FRIDAY (just in case no backup is made on FRIDAY).
Keep 24 monthly backups taken on or after the last day of the month.
Keep 12 backup sessions taken on or immediately after the 3 day of the month.

The retained sessions may be 0, e.g. keep 0 weekly backup sessions, but bear in mind that there will now be 7 daily backup sessions per week.

If only keep 10 daily backup sessions is used this is exactly the same way AISBackup works now.

What do you think?

Also the manual session removal (undo backups) will be updated to allow mid-session removal.

Barry
kdmoyers
Posts: 35
Joined: Wed Jul 14, 2004 2:39 pm
Location: New York State, USA

Post by kdmoyers »

Barry,

Superb. Anything like this will be excellent. Sounds super flexible.

Please consider the case of two backups in one day. This is sometimes really handy for just-before-update, just-after-update situations.

Sounds like you have already considered the missed-day problem. (Arcserve has kittens when you miss a day.)

-Kirby
You must be the change you wish to see in the world
Barry
Site Admin
Posts: 1529
Joined: Tue Aug 20, 2002 3:16 pm

Session retention: Addendum

Post by Barry »

Yes, I almost forgot the ‘today’s’ backups. This could be handled like this:

Keep (x) backups per day, delete excess backups (today / next day AISBackup is run).

This slightly complicates the daily backups as they must count as 1 per day (even if more are retained).

Barry
kdmoyers
Posts: 35
Joined: Wed Jul 14, 2004 2:39 pm
Location: New York State, USA

Post by kdmoyers »

Barry- I really appreciate the continual improvements. I'm shouting "AIS" from the mountaintops over here, hope you're doing well. -Kirby
You must be the change you wish to see in the world
Hughg
Posts: 94
Joined: Sat Feb 01, 2003 11:25 pm

Post by Hughg »

Thanks for drawing my attention to this thread, Barry. I agree entirely with Kirby on both counts -- I think your pruning idea is superb, and I, too, can't praise AISBackup highly enough. It's already saved my life several times.

I love the formal elegance of Kirby's idea, but I think your suggestion is better because of the flexibility that Kirby admires in it. The only comment I'd make is that it took me a while to work out what was meant by the operational definitions of weekly and monthly backups. Then it dawned on me that there might have been some missing commas. Does this still capture what you mean?

"The Weekly backup is to be done at a week end which is on, or the next backup after, one chosen day of the week, e.g. Friday."

and

"The Monthly backup is defined as either on, or the next backup after, the specified date or last day of the week in the month. Month end could in fact be set as a dated mid-month but not a mid-month day of the week."

I'd find what you're suggesting extremely useful. Go for it!

Cheers
Hugh
Barry
Site Admin
Posts: 1529
Joined: Tue Aug 20, 2002 3:16 pm

Auto-prune update - version 1

Post by Barry »

Version 2.0 beta has been uploaded, this contains the first Advanced Session Pruning option.

The number of each day’s, daily, weekly, monthly and up to 5 dated backups may be retained. ‘Today’s’ backups may be pruned today or the next backup day.

This option has already freed up gigabytes redundant backups, I, for one, is glad you kept pushing this upgrade.

There are more right click options on the Session page of the main form, Disable Auto-Prune should be used to keep important backups that would otherwise be deleted by the auto-prune option.

I hope the set-up form is easy to understand. The Sessions for displays which session will be pruned prior to you committing the changes. The files are pruned after the next backup session.

The manual prune option (Undo backups) has been updated to allow any sessions to be manually removed.

You may also like the option of assigning more than one drive (by unique volume name) to USB / FireWire backups to make the management of on-site off-site backups much easier.

Barry
Post Reply