Consultor Eletrônico

Status: Verified

SYMPTOM(s):

AppServer is not releasing sockets

Error 8109 appears in the broker log after starting a server thread

Server exec error : Too many open files (errno:24) (8109)

syslog.log verifies the matching entries are found in the brokers log

OS logfiles report the same error: Java: Number of open files: 4096

lsof command reports that the Appserver broker process has around 13000 file descriptors for named pipes (FIFO) in less than 3 days

FACT(s) (Environment):

maxfiles (per process) limit (4096)
nfile is unlimited
Errno 24 indicates that the system is running out of file descriptors
The client connects to the AppServer, remains connected for a long period of time during which many requests are made
Disconnect is performed at end of the client session.
autoTrimTimeout is set to 1800
Direct connect and state-reset are used
raising ulimit does not resolve the problem it just delays the time to crash
The problem is observed when starting the agents
The problem is not observed with versions prior OpenEdge 10.1C
The current Broker and Server log files are overwritten when restarting the Broker that died instead of bein appended to.
Broker and Server ubroker.properties configuration is set to append:
brkrLogAppend=1
srvrLogAppend=1
OpenEdge 10.1C
OpenEdge 10.2x
ia64 (Itanium)
PA-RISC 64-bit
IBM Power (64-bit)
ia64 (Intel Itanium 64-bit)
x86_64 (AMD64/EMT64)
Sun Solaris SPARC 64-bit

CAUSE:

Bug# OE00196672

CAUSE:

This problem is caused by the way the AppServer broker manages its connected agents:
- The broker maintains a thread for each agent.
- When an agent is shut down (either manually via asbman/PE/OEE or auto-trim), the thread terminates and the associated object goes out of scope, making the object eligible for garbage collection.
- When the object is garbage collected, it's resources are also garbage collected. The FIFO connections to the agent are closed as part of this process.

Unfortunately, the exact timing of the release of these connections is dependent on the JVM. In releases prior to v10.1C (i.e. jdk 1.4 and earlier), this happened relatively quickly, so we got the desired behavior. However, starting in v10.1C, we deployed with jdk 1.5, which uses a different garbage collector by default. Jdk 1.4 (and prior) defaults to using the "-client" profile, which employs a serial garbage collector; in contrast, jdk 1.5 (and newer) defaults to the "-server" profile, which uses a parallel garbage collector. Parallel garbage collection is not releasing the resources in the same way as in prior versions, resulting in the observed accumulation of resources (errno 24 file descriptors)

So this problem essentially started in OE 10.1C and primarily found on 64-bit OpenEdge releases where the "-client" switch is not available for 64-bit jvm.

FIX:

Upgrade to OpenEdge 10.2B02 Service Pack where this issue has been addressed by closing the FIFO connections ourselves instead of leaving it to the java garbage collector.

Until the upgrade is possible, a workaround for all supported environments is to set autoTrimTimeout to 0 in the ubroker.properties, then restart the AppServers.

For 32-bit supported environments, change the java virtual machine to '-client'. For example:

Edit $DLC/bin/java_env and modify, in the section "HP-UX", the variable VMTYPE to "client" not "server" or "hotspot" or "classic"

The result will be like (extract of java_env):

"HP-UX") # HP UNIX 11.00, No jdk but jre
THREADS_FLAG=native_threads
JDKHOME=
as01/oe/hpux64/v101C00/dlc/jre
if [ ! -f $JDKHOME/bin/javac ]
then
JDKHOME=$env_jdkhome
fi
case "$PLATID" in
34) # HPUX-32bit
ARCH=PA_RISC2.0
VMTYPE=client
SHLIB_PATH=$SHLIB_PATH:$JREHOME/lib/$ARCH/$THREADS_FLAG:$JREHOME/lib/$ARCH/$VMTYPE:$JREHOME/lib/$ARCH
JVM=${JREHOME}/bin/$JVMEXE
;;
...
esac
export SHLIB_PATH
export LD_LIBRARY_PATH
export JVM