Consultor Eletrônico

Status: Verified

GOAL:

Why you should NOT truncate the bi of the target database when using an ai disaster recovery plan ?

GOAL:

Is it safe to truncate the bi of the hotspare database after roll forward?

GOAL:

How to manage the size of the bi file on the target (hotspare) database?

GOAL:

Can you truncate the bi file of a hotspare database when active transactions are zero?

GOAL:

It it supported to truncate the bi file of a hot standby database when there are zero active transactions?

GOAL:

Why is the bi file larger on the target hotspare database than the original source database with ai roll forward?

GOAL:

Why does rolling forward after-image notes (AI) on the hotspare database cause the bi file to grow larger than the live database?

FACT(s) (Environment):

All Supported Operating Systems
Progress/OpenEdge Versions

FIX:

Truncating the target (hotspare) database, if 0 active transactions as seen in the database log file message:

At the end of the .ai file, 0 transactions were still active. (1636)

at the completion of the last ai that was rolled forward against the target database, is not a supported method of managing bi growth.

The reason is because that usage was not one which was anticipated or considered when the multiple ai extent feature was designed, way back for the first version 7 release. Therefore, while it would be a good thing, it was not considered in the implementation and there was no requirement that it work correctly.

Now it has been discovered that it /appears/ to work is not enough to be able to depend on it as a standard practice, although in an emergency it can probably be relied on now and then, knowing that if it fails rebaselining would be an alternative that does work. It is easy to make a mistake and unintentionally truncate a bi log that does have in-flight transactions.

It is something that is "not supported" not because it is undesirable but because no one thought of it, it is not intended behavior, it has not been tested or documented, and there are no regression tests to ensure that it continues to work this way in future releases. The truncate bi function does not check for this scenario and warn the user of the consequences when there are in-flight transactions.

Stopping the live database, truncating the bi, probkup and restore the target database is the way forward and here is why:

When the database is accessed, our crash recovery model presumes that ANY outstanding transaction (without a commit) will be aborted. We can't tell what the intention of the connection will be.

Why do we say that ANY connection (including a truncate bi) to the hot standby (target) will put the database through crash recovery?

This is because we always go through the three phases of crash recovery when the database is opened, EXCEPT in the case of rolling forward a busy ai extent. In all other connections, all transactions that have not been committed are presumed to be aborted. You therefore not only risk losing data when you prematurely connect to the target database, transaction(s) may span ai extents for example, but subsequent roll forwards will fail due to timestamp mismatches between the master block and the .bi file.

The design and use of the hot standby is a model of continually redoing (physical REDO phase of crash recovery) database until it needs to reflect the original database at the time of a crash or disaster. At this time only, we presume the user will put the database through the last two phases of crash recovery when re-establishing the live database, namely: "physical undo" and "logical undo" which are only performed when outstanding transactions are found.

The difference between the crash recovery physical redo phase and the ai physical redo phase is that the crash recovery physical redo phase only reads the last two clusters, where the ai physical redo phase does every cluster.

Why the bi on the target sometimes grows much larger than the bi on the source database is that during the redo phase of crash recovery, a cluster timestamp may be updated. If this occurs, all subsequent timestamps are updated and those clusters all need to re-age before they are used. This causes us to allocate additional clusters and grow the bi file during roll forward. The original timestamp needs to be updated because a note was detected in the cluster whose update counter indicated it needed to be applied. This appears to be happening frequently with notes that cover the extending of the database. For a more detailed discussion of this from a dbinternals perspective, please refer to Progress Solution P3622

Each ai recovery environment is dif.ferent. A lot depends on the database activity, number of transactions per second, long running transactions amongst others. A cluster will be re-used when:
a) It has aged appropriately
b) There are no open transactions in the cluster - transactions without an end-note.

The following may be tuned in minimizing bi growth on the target database:

1.) In Progress verisons prior 9.1E02 Make sure that -G is at the default of 60 seconds, don't go any lower than this otherwise there's a high risk loosing cached data. After 9.1E02, the default is 0 facilitated by changes to fdatasync(). In any Progress version, take care increasing the -G value above the default value. A particular case had this value at 180 seconds, with ai notes at 5 minute intervals averaging 40MB each. Roll forward was performed in half-hour batches (6) which was quite possible within the 3 minute cluster ageing interval. However, they experienced 150% difference in bi filesize between the source and target databases due to the 3 minute ageing on the target side, where the roll foward of the associated ai files took under 3 minutes each.
2.) Ensure that "Delay of Before-Image Flush" -Mf is at the default of 3 seconds on the source database. A particular site had this parameter set to 120 and experienced 40% difference in bi growth on the target database. After dropping it back down, the bi files were within acceptable comparisons.
3.) The bi cluster size on both databases must be the same. If the bi clustersize is changed on the source, then re-baseline the target database from a new backup.
4.) Consider increasing the ai note interval to allow more transactions to complete. Alternatively, consider an ai switch strategy based on 'bytes' rather than time switches by using fixed ai extents that switch only when filled.
5.) Roll forward the ai notes in 'real time' or in smaller batches to allow cluster ageing to take effect..