Consultor Eletrônico



Kbase P92449: Client process disconnects from the database but doesn't terminate.
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   4/27/2010
Status: Verified

SYMPTOM(s):

Client process disconnects from the database but doesn't terminate.

Process is still running in the UNIX process list.

_progres process hangs

FACT(s) (Environment):

User process is disconnected from the database.
UNIX
Progress 9.x
OpenEdge 10.x

CAUSE:

The exact cause is unknown at the time of this writing

FIX:

Additional signal logging has been implemented in 9.1D09 to assist development with determining what state a hung process is in. This new logging shows development what signals are being sent to a process and how the process is reacting to those signals. Therefore, if you are using 9.1D09 or above it will assist in resolving this issue.

If you experience a hung process, please log a support call with Progress Technical Support prior to disconnecting or killing the process.
If you must disconnect this process immediately, please try to obtain some basic information regarding the process:

1. Gather information regarding latches via Promon:

A process should only hold a latch for a very short period of time (milliseconds), so if a user is holding it for longer than that then there is a problem.

To determine whether a process is holding a database latch, do the following:

Run "promon -NL dbname | tee promon.txt" -> Type "R&D" -> Type "debghb" -> Choose "6" (hidden menu) -> Choose "11" (Latch Counts).

Note that there are two pages of latches to check, press <Enter> to view the second page.

Check the "Owner" column (which is the UserID) to see if any users are holding a latch. If there is, Choose "U" to update the latch screen again to confirm whether it's holding it for longer than a millisecond (you may have just caught a user holding a latch at that millisecond).

If there is indeed a user holding a latch for a long period of time (more than a millisecond) then you should log a call with Progress. However, you may be in a hurry to get the database usable again so you should first try to get some basic data for when you log a call later with Progress Technical Support. Note that you WILL need to shutdown and restart your database, but you only have one opportunity to gather the information below so please take the time:


2. Get the Process ID of the user (owner) holding the latch, via Promon:

Type "promon -NL dbname | tee -a promon.txt" -> Choose "1" (User Control) -> Choose "2" (Match User Number) -> Type in the UserID.

You will see a column titled PID, get the PID value for step 2 below and exit Promon.

3. Check a few times using "ps -ef | grep PID | tee -a promon.txt" and see whether that process is still using CPU resources.


4. Try to obtain a stacktrace from the hung process. The commands available will depend upon what operating system you are using:

kill -16 PID >> pid.txt (most UNIX's)
or
pstack PID >> pid.txt (on Solaris)
or
procstack PID >> pid.txt (on AIX)

Once you have determined whether or not a stacktrace is able to be produced, if one is being generated then please run the command a few times so that we can capture if the process state is changing.

NOTE: Because a latch is held, when you kill that user (with a signal other than -16) it will crash the database. It should not cause corruption, but it will cause all users to be disconnected and the database will shutdown.

If kill -16 isn't generating a stacktrace or is hanging and you do not have pstack or procstack available, you can try to generate a core file by using "kill -8 pid".


5. Please find out what the last thing was that the user was doing. It may be that they turned off their machine or exited their telnet session without first exiting the Progress session.
>
6. Retain the log file from the database startup prior to the issue, up to and including the database shutdown..