--Proposal for Alternate Session Purge Method--
Posted: Tue Nov 09, 2004 4:13 pm
Presently, AIS Backup purges sessions eldest first. This is the simplest and most generally useful method.
This is a quasi-formal proposal for an additional optional method.
First, a few definitions: for any integer x, FindFirstOne(x), or ffo(x), is the position of the rightmost one in the binary representation of the number, counting from 1 on the right. For example:
x ffo(x)
--- ------
1 1
2 2
3 1
4 3
5 1
8 4
40 4
63 1
64 7
Now, assume that every time a backup runs, a global generation number is incremented. The first time the backup runs, the generation number is 1, the second time, it's 2, etc. Call this number g. It's an internal number the user never sees, kept for the lifetime of the backup job.
Now, we organize all our sessions in date order, and index them. The youngest session is in slot 1, the eldest session is in slot m, where m is the number of sessions we are keeping.
Now we are ready to decide which session slot to purge. It's ffo(g). Pretty simple to compute, but what is the resulting behavior?
The behavior is this: every other day, yesterdays session is purged. Every fourth day, the day before yesterdays session is purged. Every eighth day, the day before that is purged, etc.
The effect is, when you examine the ages of the sessions, that they tend to follow powers of two, (with a small moving offset). For example, the ages might be:
slot age
---- ----
1 1
2 2
3 4
4 8
5 16
6 32
7 64
8 128
What advantages does this method have? It covers a much greater span of time for a given number of sessions. For example: with five sessions, the simple method covers five days. The proposed method covers sixteen days. If I need to cover 60 days, the simple method requires 60 sessions!! the proposed method requires only 7 sessions.
This, I assert, is a powerful advantage, worthy of some small discomfort.
The proposed method also acknowledges that, generally, more recent days are more likely to be needed. Distant sessions are less likely to be needed.
A useful extension of this method is to combine it with the simple sequential method by adding a small constant to the slot-to-purge. For example, if we purge ffo(g)+3, the slot ages might look like this:
slot age
---- ----
1 1
2 2
3 3
4 4
5 5
6 7
7 10
8 18
This gets us sequential days in the recent past, followed by powers-of-two days to get a longer maximum age. This maximum age is often called "horizon".
Objection one: Using this method, it is not always obvious exactly which dates are available in the back up, since the emergent behavior of the system is somewhat complex to work out in your head. You have to trust that the pattern is working. When a mistake happens and you need to restore a file, you look into the program to see what dates are available, and pick the closest date before the mistake happened. This is exactly what you would do using the Grandfather/Father/Son or Sequential methods anyway. The difference here is that, for a given number of sessions, you are more likely to find a covering session.
Objection two: it is somewhat upsetting to find that the day before yesterday's backup has already been purged! The "combined" modification above takes care of this.
Objection three: what if ffo(g) returns a number greater than your eldest slot? Just purge the eldest slot.
Objection four: couldn't you just run multiple simultaneous simple backups, each at different intervals? Yes, but this has significant disadvantages: it's more complex to administer, more complex to restore from, and most importantly it duplicates files unnecessarily. For installations with dozens of gigabytes to back up, the wasted storage can be tremendous.
Final note one: it might help user trust if a small chart showing the available backup session ages is displayed. The user can readily see the decreasing frequency behavior, and confirm his backup horizon.
Final note two: this option might be presented to the user as "how many sessions to follow powers of two purge method?" For example, if the user selects 8 sessions to keep, and 3 to follow powers of two purge method, then the purge expression would be ffo(g)+5 (because 8 - 3 = 5). If the user selects 0 to follow the powers of two method, then the expression is simply 8, and we are back to the simple sequential method.
Final note three: what to call it? "powers of two method", "Towers of Hanoi method" (after the classic puzzle game), "extended horizon method" are all possibilities.
I hope I have made a convincing case for the implementation of an optional alternate session purge method for AIS Backup. The method is simple to compute and offers much greater horizon, at the cost of some simplicity.
Thank you,
Kirby Moyers
This is a quasi-formal proposal for an additional optional method.
First, a few definitions: for any integer x, FindFirstOne(x), or ffo(x), is the position of the rightmost one in the binary representation of the number, counting from 1 on the right. For example:
x ffo(x)
--- ------
1 1
2 2
3 1
4 3
5 1
8 4
40 4
63 1
64 7
Now, assume that every time a backup runs, a global generation number is incremented. The first time the backup runs, the generation number is 1, the second time, it's 2, etc. Call this number g. It's an internal number the user never sees, kept for the lifetime of the backup job.
Now, we organize all our sessions in date order, and index them. The youngest session is in slot 1, the eldest session is in slot m, where m is the number of sessions we are keeping.
Now we are ready to decide which session slot to purge. It's ffo(g). Pretty simple to compute, but what is the resulting behavior?
The behavior is this: every other day, yesterdays session is purged. Every fourth day, the day before yesterdays session is purged. Every eighth day, the day before that is purged, etc.
The effect is, when you examine the ages of the sessions, that they tend to follow powers of two, (with a small moving offset). For example, the ages might be:
slot age
---- ----
1 1
2 2
3 4
4 8
5 16
6 32
7 64
8 128
What advantages does this method have? It covers a much greater span of time for a given number of sessions. For example: with five sessions, the simple method covers five days. The proposed method covers sixteen days. If I need to cover 60 days, the simple method requires 60 sessions!! the proposed method requires only 7 sessions.
This, I assert, is a powerful advantage, worthy of some small discomfort.
The proposed method also acknowledges that, generally, more recent days are more likely to be needed. Distant sessions are less likely to be needed.
A useful extension of this method is to combine it with the simple sequential method by adding a small constant to the slot-to-purge. For example, if we purge ffo(g)+3, the slot ages might look like this:
slot age
---- ----
1 1
2 2
3 3
4 4
5 5
6 7
7 10
8 18
This gets us sequential days in the recent past, followed by powers-of-two days to get a longer maximum age. This maximum age is often called "horizon".
Objection one: Using this method, it is not always obvious exactly which dates are available in the back up, since the emergent behavior of the system is somewhat complex to work out in your head. You have to trust that the pattern is working. When a mistake happens and you need to restore a file, you look into the program to see what dates are available, and pick the closest date before the mistake happened. This is exactly what you would do using the Grandfather/Father/Son or Sequential methods anyway. The difference here is that, for a given number of sessions, you are more likely to find a covering session.
Objection two: it is somewhat upsetting to find that the day before yesterday's backup has already been purged! The "combined" modification above takes care of this.
Objection three: what if ffo(g) returns a number greater than your eldest slot? Just purge the eldest slot.
Objection four: couldn't you just run multiple simultaneous simple backups, each at different intervals? Yes, but this has significant disadvantages: it's more complex to administer, more complex to restore from, and most importantly it duplicates files unnecessarily. For installations with dozens of gigabytes to back up, the wasted storage can be tremendous.
Final note one: it might help user trust if a small chart showing the available backup session ages is displayed. The user can readily see the decreasing frequency behavior, and confirm his backup horizon.
Final note two: this option might be presented to the user as "how many sessions to follow powers of two purge method?" For example, if the user selects 8 sessions to keep, and 3 to follow powers of two purge method, then the purge expression would be ffo(g)+5 (because 8 - 3 = 5). If the user selects 0 to follow the powers of two method, then the expression is simply 8, and we are back to the simple sequential method.
Final note three: what to call it? "powers of two method", "Towers of Hanoi method" (after the classic puzzle game), "extended horizon method" are all possibilities.
I hope I have made a convincing case for the implementation of an optional alternate session purge method for AIS Backup. The method is simple to compute and offers much greater horizon, at the cost of some simplicity.
Thank you,
Kirby Moyers