Consultor Eletrônico



Kbase P2028: Sporadic and unusual database failures occurs when executing heavy workloads.
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   10/16/2008
Status: Unverified

FACT(s) (Environment):

IBM AIX POWER 5.1
Caldera Open UNIX/SCO UnixWare

SYMPTOM(s):

Sporadic and unusual database failures occurs when executing heavy workloads.

Could not locate find descriptor for <filename> (219)

SYSTEM ERROR: Memory violation. (49)

SYSTEM ERROR: Database block has incorrect recid: . (355)

SYSTEM ERROR: Block <num> use count underflow . (3629)

Other error messages regarding invalid buffer use counts processes waiting forever for database buffers

Using 8-way or higher symmetric multiprocessor based systems

Often with high user counts and with heavy workloads.

IBM POWER4 processors Intel Xeon, Pentium 4, and P6 processors running Unixware, OpenUnix.

CAUSE:

Progress suspects that the failures may be related to the modes of operation of the memory caches on these high-end systems which use newer types of processors and advanced memory-subsystem designs.
The failures present a wide variety of symptoms consistent with a hardware malfunction, such as a memory or disk controller problem. Symptoms, among others, may include the errors above.

FIX:

Progress Software has become aware that in certain limited circumstances when running the Progress RDBMS on some new 8-way or higher symmetric multiprocessor based systems with very high user counts, sporadic and unusual database failures may occur when executing heavy workloads. Progress has informed the hardware vendors of the problem and is following up closely with them to identify corrective actions.

- For IBM AIX 5.1 for the POWER4-based pSeries 690 systems (known as the "Regatta"), a patch (9.1C15) is available.

- For the Intel Pentium 4, Xeon, and P6 based systems, Progress is investigating the matter with the hardware and operating system vendors.

WORKAROUND:
In the interim, a workaround is to use an undocumented startup parameter, the parameter -mux 0
This parameter has the effect of changing one of the locking algorithms Progress uses for regulating access to the lock table and the database buffer cache to an alternate algorithm. This alternate algorithm may provide slightly lower performance.

Note:
Due to the broad range of symptoms, the exact cause of the failures is extremely difficult to diagnose. If you are experiencing these symptoms, you should examine the operating system log files for hardware related messages and take corrective action as necessary.