Kbase P23968: Investigating Broker disappeared, updating .lk file. (4194)
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  8/13/2010 |
|
Status: Verified
SYMPTOM(s):
Broker disappeared, updating <file-name>.lk file. (4194)
Database is brought down with 4194 error message
Database crashes with 4194
Database re-starts after crash
.bi file NOT growing beyond 2GB
FACT(s) (Environment):
All Supported Operating Systems
Progress/OpenEdge Product Family
OpenEdge Replication
CAUSE:
As per the description of this message:
"The watchdog noticed that the broker process is no longer active on the system. The watchdog is updating the .lk file to prevent database corruption and will shutdown the database"
Finding the reason for the watchdogs actions, is what this Solution hopes to address.
FIX:
Investigative measures:
1.) Cron Jobs:
o Are there kill commands in the scripts in use?
o Could it be that the cron job is disconnecting the WDOG user?
o If there are more than one Progress Version running in the same machine, how are the cron jobs differentiating between a userid on a database running version 9.1D of Progress and another on Progress 9.1E, for example?
o Is the cron job first checking for active transactions against the users that it disconnects?
o If there are more than one Progress installation on the machine, are all the environment variables pointing to the correct version?
Review the cron jobs, with particular reference to the above and points 2, 3 and 4 below.
2.) Could the broker have been terminated manually?
For example:
o Could this broker have been terminated by a user on the server?
o When disconnecting a PROCESS they're actually disconnecting the BROKER?
o When disconnecting a process, it still has locks in shared memory?
3.) Are the (4194's) following a "time pattern' in the log file?
o Is there a time associated error on the system (Dr Watson or sys.logs).
Interrogate the system logfile during the times of the 4194's from the server experiencing this problem
4.) What about the permissions on the executables and specifically the .lk file when created?
5.) Could it be that the user that starts the database logs off, particularly if ProControl or Progress Explorer are in use? In this case the _mprosrv needs to be started as a service or set up with a (restricted) Administrator Account.
7.) Could be a high degree of "ungraceful" terminations on the system that the broker is forced to shut down?
For example:
o HANGUPS (ungraceful terminations) + dead users + cron job running to disconnect users > TIME LIMIT.
8.) Could the filesystem be running out of disc space around the time this is happening? (temporary files for example?) or the kernel parameters for proc-per-user, subproc-peruser, sem-mni and sem-msl kernel settings?
9.) Is a terminal emulation package called "FACETERM" in use?
If so it needs to be configured to NOT send a kill -9 to a process.
10.) Are there any error messages pointing to shared memory or buffer latches?
For example in the database log file evidence of:
User <num> died with <num> buffers locked. (2523)
User <num> died holding <num> shared memory locks. (2522)
System Error: redundant lwake user <n> latch <x>
Begin ABNORMAL shutdown code (2249)
It is important to determine what is causing the user process to die and try to resolve that problem. It may be worth running the clients as "remote" clients, even if they are logging directly into the machine where the database is, so that shared memory is not accessed directly by the client but through the server process. The server process will then manage the shared memory and buffer latches on behalf of the client, so that while investigations into what is causing the user process to terminate is underway, the server will still be alive to clean up remaining latches if the client dies.