Consultor Eletrônico



Kbase P147903: Large performance decrease occurs after applying Sun Solaris patches
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   2/23/2010
Status: Verified

SYMPTOM(s):

Large performance decrease occurs after applying Sun Solaris patches

System CPU is being substantially utilized

Progress processes consume much more System CPU than would normally occur, sometimes over 50%

Progress user response times are extremely slow

All CPUs in the machine occasionally spike to 100% System CPU

When all CPUs spike to 100% System CPU, there are no system calls being made

Lots of cross calls (xcalls) are occurring

Machine performance degrades as more users log on and more processes start up.

FACT(s) (Environment):

Machine has lots of memory. e.g. 64GB
Machine has lots of CPUs. e.g. 16 CPUs
A large number of Progress processes are being run on the machine, including Progress clients, AppServers, WebSpeed agents, Sonic Adapters etc.
Solaris 10
OpenEdge 10.x

CAUSE:

A change in one of Sun's patches has caused more aggressive memory management to occur. The exact Sun patch that caused the behavior has not been identified due to the number of patches that were applied at the same time.

Sun bug number 6642475 exists for this issue.

The performance problem is caused by Sun's aggressive page coalescing which is turned on by default. Even though this feature was enabled (by default) prior to the performance problem occurring, after applying the patch the algorithm changed with relation to how it searches for large contiguous pages.

To explain in more detail; When a process like _progres is spawned it will create it's process heap which is used for it's own work area (not shared with other processes). It allocates it's memory in 8kb pages until it a time when it crosses a 4MB boundary. At that time, the operating system will try to coalesce (consolidate) the 8kb pages of memory into a 4MB page.

The reason for doing this coalescing is to improve performance. Instead of having 8kb pages scattered over many different system boards in the machine, it tries to coalesce them onto the same system board as the CPU being used by the process. This makes the path from the CPU to memory much shorter.

The problem that is occurring is that this task of coalescing the memory consumes an increasing amount of system CPU resources as the number of processes on the machine increases.

The reason why many CPUs simultaneously go to 100% System CPU is because the _progres process performs mmap to acquire memory. This operation can be quite expensive on a machine with a number of CPUs, as each time mmap is done by a process it results in cross calls (xcalls) to all CPUs on which the process has run. This essentially means that it locks all of those processors for a short period of time. You will not see these mmaps as sys calls (system calls), so while you see that System CPU has gone to 100% there will be no sys calls.

FIX:

Disable aggressive page coalescing.

This can be done both online or offline.

Offline method (takes effect after a reboot):

Add the following entry to the /etc/system file:
set pg_contig_disable=1

Online method NOT RECOMMENDED - please discuss with Sun before using this method:
This method is not supported or recommended by Sun as it updates the running kernel directly. A mistake can crash the system or cause other issues.

echo "pg_contig_disable/W 1" | mdb -kw

Only processes started AFTER setting pg_contig_disable will use the new setting.