Consultor Eletrônico



Kbase 17004: Space utilization within database blocks
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   5/10/1998
Space utilization within database blocks

BLOCK MANAGEMENT

The purpose of this knowledgebase is to chronicle some
differences that are encountered when converting a database
from one block size to another . When doing this kind of
conversion, the size of the database will in many cases
appear to increase, and this is mainly due to the way we
fill blocks. There is a difference in the way we fill data
and index blocks, but this has not changed with the advent
of 8k blocks, however it does affect the way the data is
distributed within the database. Because the major changes
occur in 8k blocks, the knowledgebase includes 2 and 4k
blocks when referring to 1k blocks.


1. DATA BLOCKS

With 1k, 2k, and 4k blocks we allow up to 32 records, and
this is regardless of the record size. We also allow a
growth factor of 75 bytes in each block in an effort to cut
down on record fragmentation. In 8k blocks the number
of records allowed per block changes to 64, while the
growth factor remains the same. 8k blocksize benefits
databases with large records, or small records that are
expected to grow considerably in size. We will reserve
slots for 64 records, and if the records are small, the
remaining space will not be reallocated. Proutil dbanalys
is a good tool to guage the average size of records when
choosing a blocksize. Small records in large blocks is an
inefficient use of space and is not encouraged. If the
records are expected to grow significantly, however, then
we could justify a larger block size. The following screen
is a sample of some dbanalys output:

DATABASE SUMMARY
NAME Records Indexes Combined
Bytes Tot% Bytes Tot% Bytes Tot%
agedar 559 0.2 475 0.2 1034 0.4
customer 5401 2.2 1444 0.6 6845 2.8
item 3938 1.6 1382 0.6 5320 2.2


1a. Block Acquisition

Progress allows the growth factor mainly because adding
blocks to the database is expensive both disk space and
performance wise, and this is magnified in 8k blocks. When
we extend the database, we search the free chain for empty
blocks. When we have used up all the free blocks, we then
have to get more from the os. The first time we acquire new
blocks, we take 16, then 32 the second time, and then we
increment upwards by 16 blocks until we reach 128. This is
the highest number of blocks we will take from the os at
one time. The performance hit is significant with 1k
blocks, and the bigger the block size, the bigger the
performance hit as well as the reduction in available disk
space.

2. INDEX BLOCKS

Index blocks are handled much differently than data blocks.
Each index block contains entries for exactly one index,
and there can be as many entries as will fit in the block
with no space left in the block for growth. The most
notable difference here is how we fill the block. Data
blocks have 32 slots for 1k,2k, and 4k blocks while 8k
blocks have 64 slots, and these slots are filled
sequentially with records leaving the aforementioned growth
factor. Index blocks fill with as much entry data for
exactly one index as will fit in the block. We fill it from
top to bottom with no space left for growth. If the index
entry does not fill the block, then the empty space is left
and not reused. Here again we see that an index is going to
occupy an entire block regardless of its size, therefore a
small index in an 8k block is not an efficient use of
space. If the index entries do not fit in the block, we
will take another empty block. An important thing to note
here is that if the entries for an index occupy one or more
1k blocks, it will then occupy the same number of blocks
regardless of the block size. For example, if an index
occupies nine 1k blocks (9216 bytes) and the database is
converted to 8k blocks, the index will then occupy two 8k
blocks (16,384 bytes). Even though the second 8k block is
not close to full, the index still takes the entire block,
and this is reflected in the database byte count as well.
When this behavior occurs for all indexes in the database,
the number of bytes for the database can appear to grow
considerably (double in some cases)even though the actual
amount of data is not changing. The percent utilization
column in the proutil ixanalys and dbanalys detail how many
blocks and how much of an individual index block are
occupied by index entries. Deletions of index entries can
cause holes of empty space in the index blocks at which
time an idxbuild must be performed to regain usage of that
space. The following is an example of proutil's dbanalys:


INDEX BLOCK SUMMARY (3925)

_Field Index Fields Levels Blocks Bytes %Util Factor
_Fld-Nm 3 2 2 13 8217 63.5 2.0
_Fld-Pos 4 2 2 7 5357 76.8 1.8
_File/Fld 2 2 2 16 7961 50.0 2.2

3. FRAGMENTATION

Occasionally deletion of records and addition of new ones
causes gaps on the disk where the data is stored. To
measure these gaps, or fragmentation, we look at scatter
index in proutil's tabanalys . Here is an example:

RECORD BLOCK SUMMARY (6129)
---Record Size--- ---Fragments--- Scatter----
Table Records Bytes Min Max Mean Count Factor Factor
agedar 26 559 19 29 21 26 1.0 2.4
customer 33 5401 147 184 163 33 1.0 1.1
-----------------------------------------------------------
Totals: 2097 204K 6 1153 99 2174 1.0 2.7

The optimum value for the scatter index varies from
database to database, however a personal benchmark can be
taken from a freshly dumped and loaded database as that is
the best the database will ever be in regards to record
fragmentation. Purging large amounts of data is a common
cause of fragmentation, and this can be resolved by a dump
and load after a considerable data purge.
Index blocks on the other hand can become under utilized
over time, and again the optimal degree of index block
utilization can be contingent upon the type of database
access being performed. Applications that retrieve large
amounts of data perform better with more densely packed
index blocks as we have to access fewer blocks to retrieve
a record. This greatly cuts down on disk i/o and improves
performance as we retrieve more data with every read from
or write to disk. A loosely populated index block may
perform better in an update intensive environment as we
have more room to insert new data without having to take
another block from the free chain. Proutil's ixanalys
utility shows the important factors for analyzing index
block utilization.Although the following screen appears in dbanalys,
it is essential when evaluating the need for rebuilding indexes.
Here is an example:

INDEX BLOCK SUMMARY (3925)
Index Fields Levels Blocks Bytes %Util Factor
order
cust-order 19 2 1 1 227 22.8 1.0
order-date 20 1 1 1 101 10.1 1.0
order-num 21 1 1 1 188 18.9 1.0


The percent utilization field shows how much of the index
block is being used, and the aforementioned factors
determine what are optimum numbers for a particular
database. For applications that primarily retrieve data,
85% utilization or higher is optimal. The factor field in
ixanalys is based on both the utilization and size of the
index. Again, these numbers are not always useful for
databases that have continuous insertions and deletions as
utilization will vary. A factor range of 1-2 means the
index is well utilized and no rebuild is necessary. 2-2.5
means the index is less than 50% utilized and a rebuild
should be considered. 2.5-3 means utilization is less than
25% and the index should be rebuilt. The idxbuild is the
only way we to regain the gaps created by additions and
deletions. Also as personal benchmark, we can run the
ixanalys after an index rebuild to determine the maximum
attainable density for an index.

Progress Software Technical Support Note # 17004