Consultor Eletrônico

Status: Verified

GOAL:

Space utilization within database blocks

GOAL:

How to understand space utilization and database blocks.

GOAL:

How is space handled in blocks

GOAL:

How space is allocated for records

FACT(s) (Environment):

Progress 9.x
OpenEdge 10.x
All Supported Operating Systems

FIX:

The purpose of this solution to chronicle some differences that are encountered when converting a database from one block size to another . When doing this kind of conversion, the size of the database will in many cases appear to increase, and this is mainly due to the way we fill blocks. There is a difference in the way we fill data and index blocks, but this has not changed with the advent of 8k blocks, however it does affect the way the data is distributed within the database. Because the major changes occur in 8k blocks, the solution includes 2 and 4k blocks when referring to 1k blocks.
1. DATA BLOCKS
With 1k, 2k, and 4k blocks we allow up to 32 records, and this is regardless of the record size. We also allow a growth factor of 75 bytes in each block in an effort to cut down on record fragmentation. In 8k blocks the number of records allowed per block changes to 64, while the growth factor remains the same. 8k blocksize benefits databases with large records, or small records that are expected to grow considerably in size. We will reserve slots for 64 records, and if the records are small, the remaining space will not be reallocated. Proutil dbanalys is a good tool to guage the average size of records when choosing a blocksize. Small records in large blocks is an inefficient use of space and is not encouraged. If the records are expected to grow significantly, however, then we could justify a larger block size. The following screen is a sample of some dbanalys output:
DATABASE SUMMARY
NAME Records Indexes Combined
Bytes Tot% Bytes Tot% Bytes Tot%
agedar 559 0.2 475 0.2 1034 0.4
customer 5401 2.2 1444 0.6 6845 2.8
item 3938 1.6 1382 0.6 5320 2.2

1a. Block Acquisition
Progress allows the growth factor mainly because adding blocks to the database is expensive both disk space and performance wise, and this is magnified in 8k blocks. When we extend the database, we search the free chain for empty blocks. When we have used up all the free blocks, we then have to get more from the os. The first time we acquire new blocks, we take 16, then 32 the second time, and then we increment upwards by 16 blocks until we reach 128. This is the highest number of blocks we will take from the os at one time. The performance hit is significant with 1k blocks, and the bigger the block size, the bigger the performance hit as well as the reduction in available disk space.
2. INDEX BLOCKS
Index blocks are handled much differently than data blocks. Each index block contains entries for exactly one index, and there can be as many entries as will fit in the block with no space left in the block for growth. The most notable difference here is how we fill the block. Data blocks have 32 slots for 1k,2k, and 4k blocks while 8k blocks have 64 slots, and these slots are filled sequentially with records leaving the aforementioned growth factor. Index blocks fill with as much entry data for exactly one index as will fit in the block. We fill it from top to bottom with no space left for growth. If the index entry does not fill the block, then the empty space is left and not reused. Here again we see that an index is going to occupy an entire block regardless of its size, therefore a small index in an 8k block is not an efficient. use of space. If the index entries do not fit in the block, we will take another empty block. An important thing to note here is that if the entries for an index occupy one or more 1k blocks, it will then occupy the same number of blocks regardless of the block size. For example, if an index occupies nine 1k blocks (9216 bytes) and the database is converted to 8k blocks, the index will then occupy two 8k blocks (16,384 bytes). Even though the second 8k block is not close to full, the index still takes the entire block, and this is reflected in the database byte count as well. When this behavior occurs for all indexes in the database, the number of bytes for the database can appear to grow considerably (double in some cases)even though the actual amount of data is not changing. The percent utilization column in the proutil ixanalys and dbanalys detail how many blocks and how much of an individual index block are occupied by index entries. Deletions of index entries can cause holes of empty space in the index blocks at which time an idxbuild must be performed to regain usage of that space. The following is an example of proutil's dbanalys:

INDEX BLOCK SUMMARY (3925)
_Field Index Fields Levels Blocks Bytes %Util Factor
_Fld-Nm 3 2 2 13 8217 63.5 2.0
_Fld-Pos 4 2 2 7 5357 76.8 1.8
_File/Fld 2 2 2 16 7961 50.0 2.2
3. FRAGMENTATION
Occasionally deletion of records and addition of new ones causes gaps on the disk where the data is stored. To measure these gaps, or fragmentation, we look at scatter index in proutil's tabanalys . Here is an example:
RECORD BLOCK SUMMARY (6129)
---Record Size--- ---Fragments--- Scatter----
Table Records Bytes Min Max Mean Count Factor Factor
agedar 26 559 19 29 21 26 1.0 2.4
customer 33 5401 147 184 163 33 1.0 1.1
-----------------------------------------------------------
Totals: 2097 204K 6 1153 99 2174 1.0 2.7
The optimum value for the scatter index varies from database to database, however a personal benchmark can be taken from a freshly dumped and loaded database as that is the best the database will ever be in regards to record fragmentation. Purging large amounts of data is a common cause of fragmentation, and this can be resolved by a dump and load after a considerable data purge. Index blocks on the other hand can become under utilized over time, and again the optimal degree of index block utilization can be contingent upon the type of database access being performed. Applications that retrieve large amounts of data perform better with more densely packed index blocks as we have to access fewer blocks to retrieve
a record. This greatly cuts down on disk i/o and improves performance as we retri.eve more data with every read from or write to disk. A loosely populated index block may perform better in an update intensive environment as we have more room to insert new data without having to take another block from the free chain. Proutil's ixanalys utility shows the important factors for analyzing index block utilization. Although the following screen appears in dbanalys, it is essential when evaluating the need for rebuilding indexes. Here is an example:
INDEX BLOCK SUMMARY (3925)
Index Fields Levels Blocks Bytes %Util Factor
order
cust-order 19 2 1 1 227 22.8 1.0
order-date 20 1 1 1 101 10.1 1.0
order-num 21 1 1 1 188 18.9 1.0

The percent utilization field shows how much of the index block is being used, and the aforementioned factors determine what are optimum numbers for a particular database. For applications that primarily retrieve data, 85% utilization or higher is optimal. The factor field in ixanalys is based on both the utilization and size of the index. Again, these numbers are not always useful for databases that have continuous insertions and deletions as utilization will vary. A factor range of 1-2 means the
index is well utilized and no rebuild is necessary. 2-2.5 means the index is less than 50% utilized and a rebuild should be considered. 2.5-3 means utilization is less than 25% and the index should be rebuilt. The idxbuild is the only way we to regain the gaps created by additions and deletions. Also as personal benchmark, we can run the ixanalys after an index rebuild to determine the maximum attainable density for an index. .