Kbase 15359: Performance and proutil tabanalys - When to Dump and Reload
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  10/05/1998 |
|
Performance and proutil tabanalys - When to Dump and Reload
Version 7 Case Study: Should we Dump and Reload?
The following contains statistics from a production database and
some useful information from development on how to interpret
tabanalys output.
Database Version: 7.3A05
PROGRESS Platform: HP 9000 K-400
Database Environment: Approx. 2GB using 20 100MB extents on
5 500MB drives. The database was converted
to V7 using proutil conv67.
Dump/Reload Time: 30 Hours
Production Environment: 24X7
Performance has been degrading on this database, and we need to
determine whether the degradation is due to fragmentation. An
analysis was done using protuil tabanalys.
The database has 1025 tables and 9938646 records. The
fragmentation index is either 0.0 or 1.0 for the tables in this
database. The scatter index ranges between 4 and 9, with an
average of 5.3. There is no freshly loaded database to compare
these values. Since a Dump and Reload is not easily managed on a
large database, how do we determine whether it will improve the
database performance?
According to the Sys Admin Guide, pages 15-34 and 15-35, the
fragmentation index should be between 1.5 and 2.0, and the
scatter index should be determined by using a freshly loaded
database. The scatter index may vary between databases.
Here is the response from development.
The only time the fragmentation index should be 0 is for tables with
no records. Otherwise, the minimum value for the fragmentation index
should be 1.0. Of the handful of customer databases which I have
analyzed, the fragmentation index has been low. Our space allocation
algorithm is biased towards minimizing the record fragmentation, so it
is not surprising to me that the tabanalys does not report record
fragmentation.
The scatter index is a measure of how contiguous the records are for
a given table. The value is computed by summing the distance between
records in a table, where distance is calculated as the difference
between the RECID values for the two records. The scatter index is
the sum of the actual distance found between records divided by the
minimum distance possible given the size of the record. What is
actually printed in the tabanalys is the log10 of the above value plus
1. Since our space allocation algorithm does nothing to attempt to
minimize this distance between records for the same table, it is not
surprising that this value is large. So 5.4 means on average that
the distance between records for a given table is 25000 times greater
than if the records for the table were all contiguous in the database.
The aspect of performance which is impacted by scatter is sequential
access.
Example query:
FOR EACH customer WHERE cust-num > 100 and cust-num < 200:
Assume that 5 customer records fit into each database block. The
minimum possible number of block reads to fetch the 100 customers is
20. Assume each block read contains only one customer record
(because the balance of the space in the block is occupied by records
from other tables), then 100 block reads will be required to read the
100 customer records. That is, the query will take 5 times longer to
complete. Dumping and reloading a database tends to cause records for
the same table to be clustered together in blocks and for those blocks
to be contiguous on disk. This greatly improves sequential access.
When reloading a database, the least scatter will result by loading
the data for the table with the smallest average record size first
and the table with the largest average record size last.
There is no simple way to predict how much a dump reload will reduce
record scatter, although it should usually be lower than 5.4.
Furthermore, there is no analytical way to predict how much if any a
lower scatter factor will improve performance. The only way to know
is from experience, that is, dump and reload and see what the scatter
factor is. Assume the scatter index is 1.8. Restart the system and
see whether performance improves. Assume at some later date users
begin to complain about performance -- run the tabanalys report and
see what the scatter factor is -- assume it is 3.5. Then you know
that when the scatter factor reaches 3.0 that you should do a dump
and reload.
If you have the space, you can restore a backup and run some
representative query. Measure how long the query takes to complete.
Dump and reload and again measure how long the query takes to
complete. If the query takes much less time and it is truely
representative of the application, then they will have some idea of
the performance improvement you might expect.
LJF
Dec-27-95
Progress Software Technical Support Note # 15359