Kbase P124628: Fathom Replication: source database crashes and does not restart due to error 3773 5350 on ai file
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  5/4/2011 |
|
Status: Verified
SYMPTOM(s):
Source replication database goes down with errors 10601 3779 3773 5350 on an after-image file
SYSTEM ERROR: Attempted to exceed maximum size on file dbname.an (10601)
Can't extend ai extent dbname.an (3779)
Can't switch to after-image extent dbname.an+1 it is full. (3773)
Database Server shutting down as a result of after-image extent switch failure. (5350)
Failed to switch to next after-image extent. (3784)
probkup fails with error 3775
Can't switch to after-image extent it is full. (3775)
rfutil -C aimage empty reports no FULL ai extents
There are no Full extents (3687)
database fails to restart with errors 3773 5350
FACT(s) (Environment):
file-name in errors refers to an after-image file dbname.an
The Target replication agent has shut down previously
-aistall is not in use on the source database
All Supported Operating Systems
OpenEdge Replication 10.x
Fathom Replication 3.0A
CAUSE:
These error messages occur when:
- the extent is the only ai extent.
- trying to extend a Variable-length after-image extent. The extent cannot be extended having reached filesize|user limits or running out of diskspace and the next ai extent in the ai sequence is not a free "EMPTY" ai extent. In other words the remaining ai extents are "LOCKED" or "FULL" and therefore not available.
- when the current BUSY FIXED ai extent needs to switch to the next ai extent whose status "LOCKED" or "FULL".
Under the OpenEdge Replication model, when the replication agent of the target database terminates and/or the target database server goes down, the replication server (RPLS) on the source database will also terminate after the connect-timeout has expired. At this stage, the source database is still running and so is the after-imaging. The ai files continue to fill up during this time recording the database activity in ai transaction notes. As they switch to the next ai extent, their status changes from "BUSY" to "LOCKED" under the replication model. The "LOCKED" status will/can only change when the target database is restarted (and therefore the RPLA) and the "dsrutil source -C restart server" on the source database, so that the RPLS can connect to the RPLA and begin to apply the ai notes at a block level where it last left off.
In other words:
Whenever a "FULL" ai file has not been applied to the target database, it will stay in the "LOCKED" status until such time as it has been applied fully to the target database. Once it has been applied, its status will change to "FULL" when it can then be made available again with the "RFUTIL source -C aimage empty". This is how the model works. There is no way to change the LOCKED status to anything else while Fathom Replication is enabled. It is therefore imperative to monitor the after-image extent availability and during times when the RPLS and RPLA have lost connection, take proactive measures.
FIX:
There is no need to disable replication on the source database, there are other possibilites to to recover from this scenario depending on the current status of the ai extents and the exact conditions of the replicated environment at the time. These essentially involve making existing or new ai extents available in order for the source database to continue operations while the RPLS reconnects with the RPLA in order that the target database is synchronised with the source database and ai notes can then continue to be applied eventually bringing the target inline with the source.
It is worth stopping ai switch batch/cron jobs during this recovery operation.
The current status of the ai extents can be queried with "RFUTIL source -C aimage list"
Without the -aistall startup parameter on the source database, the source database would have shut down first, so this is the start-point of the methods outlined below.
Scenario A.) IF there are any FULL ai extents:
- archive these off with OS copy utilities then manually marked these as empty: "rfutil source -C aimage empty"
- then restart the target and source databases.
NOTE: if -aistall had been in place in the source database startup parameters, it would only have been necessary to restart the Replication Server process (RPLS) with:
"dsrutil source -C restart server" as the source database would still have been running but no updates allowed until ai extents became available. By making these available, the -aistall will immediately lift and normal processing resumes.
Scenario B.) IF there are still available EMPTY variable ai extents, but no diskspace available:
- shut the source database down, "proshut source -by" if -aistall is in use otherwise the source database will already be down.
- move the ai extents that were available (EMPTY) but had no diskspace and the current "BUSY" ai extent to another disk
- run: "prostrct list source source.st"
- edit source.st to reflect the new absolute file location of the moved ai files
- run: "prostrct repair source source.st"
- run: "prostrct list source source.st" again and check the resulting source.st output to ensure that the Control Area of the source database knows where the ai files are where they have been moved to
- start the source database; start the target database.
Scenario C.) IF there were no "FULL" or "EMPTY: ai extents, in otherwords all ai extents were marked "LOCKED" (except of course the current BUSY ai file):
IMPORTANT NOTE: This Option is only available in Progress 9.1E or Open Edge 10.0B and later, (with the exception of OpenEdge 10.1B 10.1B01 and 10.1B02. Please refer Progress Solution: P71887 "Unable to switch to new ai extent after adding a new ai extent to the database" )
- shut the source database down, "proshut source -by" if -aistall is in use otherwise the source database will already be down.
- add more ai extents by running: "prostrct add source addai.st" where addai.st defines where the new ai files will be placed. These new ai extents can be added anywhere there is disk space available.
- run: "prostrct reorder ai source" to ensure that the EMPTY ai extents immediately follow the current BUSY ai extent. This is an offline utility.
- start the source database; start the target database.
Scenario D.) Manually roll forward the LOCKED ai extents onto the target, then restart source target and the LOCKED files will get cleared down very quickly
IMPORTANT NOTE: This Option is only valid if the current BUSY extent still has space to write to, in otherwords, the source database can be started an still has a small amount of ai filespace to write to. Regardless, the fol.lowing is also a very good technique to get the target in line with the source when replication has been down for some time and there are a lot of ai notes to synchronise.
[TARGET]
i.) The Agent needs to be in Pre-Transition state, so verify the status of the RPLA either:
$ dsrutil target -C monitor
A. Replication agent status
" State: Pre Transition "
or parse the target.lg file for message:
RPLA 5: A TCP/IP failure has occurred. The Agent's will enter PRE-TRANSITION, waiting for connection from the Replication Server. (11699)
if the agent is NOT in Pre-Transition state, force this state as follows:
$ dsrutil target -C triggertransition agent
NOTE: The target database needs to still be ONLINE. You cannot trigger transition if the Replication Agent is still connected to the replication server.
ii.) Find out how far the target is behind the source, then roll forward ai files not already applied to the target database.
[source]:
get the current state of each source.an file
$ rfutil source -C aimage list
only ai files with Status = LOCKED & BUSY are relevant
[target]:
find the last ai file that was being applied
$ dsrutil target -C RECOVERY agent > recagent.out
The following information from 'recagent.out' is relevant to this Example:
Replication local agent information:
Last Block: Incomplete
ID of the last TX begin: 1613
ID of the last TX end: 1642
Time of last TX end: Thu Oct 04 18:16:08 YYYY
After Image File Number: 6
Completly Applied to Target: No
[target]:
roll forward ai notes, starting with the ai extent listed in the example above eg:
$ dsrutil target -C ApplyExtent source.a6
the target.lg file will show similar messages to the following upon sucessful completion:
RPLA 5: Application of Source database AI Extent source.a6 has begun.
RPLA 5: Retry transaction point located at dbkey 0 note type 13 updctr 0. (6806)
RPLA 5: Retry point located at dbkey 662272 note type 25 updctr 6. (6807)
RPLA 5: Source database AI Extent source.a6 has been applied to this database.
NOTE: Existing recovery notes would normally transitioning the target database at this stage:
$ dsrutil target -C transition agent
In this case, it is NOT the intention to transition the database in this case, merely continue where we left off.
iii) After all the LOCKED ai extents have been sucessfully applied, start the source database, start the target database. The LOCKED files will get cleared down very quickly once the two databases are synchronised.
After applying the method particular to the current scenario, once the target database is synchronised with the source, the ai notes will be processed against the target database while activity is allowed to continue on the source database. As soon as each "LOCKED" ai extent has finished being processed, it will be marked "FULL" and therefore available again once they are marked EMPTY with "rfutil source -C aimage empty". The progress of this activity can be monitored with the "DSRUTIL target -C monitor", Option A: Replication Agent. The key factor in this scenario is the availability of ai files during times when replication has .ended and normal processing continues against the source database..