Kbase P21490: Killing a hung batch process leaves the database in a hung state
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  11/11/2008 |
|
Status: Verified
SYMPTOM(s):
Database is down
Batch process is killed
Zombie _mprosrv process
Several _mprosrv processes from ps -ef
Cannot access database with Promon
Proshut db-name -by -F hangs
FACT(s) (Environment):
UNIX
Progress/OpenEdge Product Family
CAUSE:
Killing the batch process has left the database broker and server processes hanging, therefore the database cannot be accessed through shared memory connections.
FIX:
If possible, backup the database with an OS copy. Any other databases on the system should be properly shutdown and the server rebooted to clear the memory. Any scripts that start these databases should be disabled. when the server comes back up, the problem database can be accessed with
pro db-name
This will force crash recovery, which can be confirmed in the db.lg file:
Begin Physical Redo Phase at 0 . (5326)
Physical Redo Phase Completed at blk 0 off 191 upd 0. (7161)
Now run proserve. This should start the database in multi-user mode. The other databases can now be started as well.
If the database fails to start and gives an error message, check the Solutions for a possible resolution. If the database is unable to start, restore the last good backup and roll forward the ai files (if using after imaging).
A last resort before going to backup, is to issue a kill -8 on the broker process (_mprosrv) for the problem database. To determine the broker process, run ps -ef | grep db-name
The database broker is _mprosrv as are the spawned server processes, and will not have the -m1 parameter as the spawned servers do:
root 16688 1 0 08:49:46 - 0:00 /progsv73e/dlc/bin/_mprosrv <<broker
root 21636 13698 0 09:05:41 - 0:00 /progsv73e/dlc/bin/_mprosrv -m1 <<spawned server
A "Zombie" process might look like this:
root 28692 1 0 08:58:01 - 0:01 /progsv73e/dlc/bin/_mprosrv -m1
Notice there is only one PID, 28692, as for the broker, but the -m1 indicates it is a spawned server (child process).It should have 2 PIDs listed as for the spawned server above, but it is in a "zombie" state.
The kill -8 on the broker process _mprosrv will likely give several errors, Error writing msg, socket=<n> errno=10054 usernum=<n> disconnected. (796)
Then run pro db-name and proserve as mentioned above.