Kbase P101215: Poor performance on high powered UNIX machines - bottlenecking on the TXE
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  9/24/2009 |
|
Status: Verified
SYMPTOM(s):
Processes taking a very long time to complete.
Reports that normally would take a couple of minutes appear to be hanging.
User processes look like they are hanging.
Processes continuing to get CPU but are not completing their jobs.
Users pressing control C to backout their transactions in an effort to restart the jobs to see if the problem is specific to their session or not.
Users backing out transactions can cause even worse performance for other users in the database.
Powerful UNIX machines that do not appear to be working very hard when the performance problem occurs.
Backing out transactions can cause the performance problem to become worse.
Very high TXE Commit Waits in promon -> R&D -> debghb -> 6 -> TXE Activity
Database hang
FACT(s) (Environment):
UNIX
Progress 9.1E
OpenEdge 10.0x
OpenEdge 10.1A
OpenEdge 10.1B
OpenEdge 10.1B01 Service Pack
OpenEdge 10.1B02 Service Pack
CAUSE:
Bug# OE00115461
CAUSE:
TXE algorithm wasn't scaling well based on types and volume of transactions being processed. TXE Commit locks were being starved as a result of the current algorithm.
FIX:
Fixed in 10.0B03 and later.
In cases were a fix is not currently available, increasing -spin or setting -napmax 80 in the database startup parameters may help to reduce the latch timeouts and get through the performance problem period more quickly.
The TXE algorithm was changed in an effort to ensure that TXE Commit Locks will not be starved. The new algorithm provides for 10,000 TXE Share and TXE Update Lock Requests to be processed prior to the TXE Commit Lock Requests. The goal is to have this new algorithm auto or self tuned in the future. The current fix makes use of this new algorithm and also provides a way to manually tune it via promon based on where you see the bottleneck.
The following is output from a Readme file created by development regarding this new algorithm.
--------------------------------------------------------------------------------------------------------------------
A new parameter was introduced to tune the TXE locking behavior. We don't usually recommend customers to change its value, but if they do want to, they can tune the parameter in the following way:
promon dbname
Enter the following commands:
R&D
debghb
6
10
3
The window is called "Adjust TXE Options", the option to tune this paramter is option 3: "TXE commit lock skip limit". The default value is set to 10000. This value can be changed under the following circumstances:
1. Set it to 0 (zero) will restore to the old TXE locking algorithm.
2. Usually increasing the value will get better performance. However, if the customer notices very long COMMIT or UPDATE locks queued (from the promon "Activity: TXE Lock Activity" window), and also the system performance slows down, then this value should be decreased.
3. The new algorithm gives Record Delete Operation lower priority. If customer wants to increase the priority for this kind of operations, the value should be decreased or even set to 0.
Notice that this parameter sets the maximum SHARE/UPDATE locks that can skip the queued COMMIT locks. If it has to be changed, do it in a rather large scale. For example, by changing it from 10000 to 10001 should not make any difference.
There is another change in promon "Activity: TXE Lock Activity" window, where a new row called "Upgrades" locks was added. This counts the total requests for upgrade cases (from SHARE to UPDATE), the rate of it (per second), and the percentage of the upgrades versus SHARE requests. This window will provide more debug information for PROGRESS developers regarding this TXE bottleneck issue.