Kbase 18293: Should your database blocksize always be 8K?
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  11/2/2007 |
|
Status: Verified
GOAL:
Should your database blocksize always be 8K?
GOAL:
What to consider when deciding on database block sizes?
GOAL:
What is the best database blocksize to use?
FACT(s) (Environment):
Progress 8.x
Progress 9.x
OpenEdge 10.x
All Supported Operating Systems
OpenEdge Database Category: Performance
FIX:
Beginning with Progress 8.2A, the ability to specify the database blocksize was introduced (1KB, 2KB, 4KB, 8KB). On most UNIX platforms an 8K blocksize is the default filesystem blocksize when creating a filesystem with the NEWFS command and not specifying the -b to be otherwise.
Before the introduction of this Progress feature, the database blocksize was limited to one Progress database blocksize specific to the supported platform. This blocksize frequently did not match the actual filesystem blocksize because these values were chosen many years ago before modern file systems evolved.
The advantage of matching the database blocksize with the filesystem blocksize, is that database I/O is better optimized. The best performance is obtained when the database blocksize is the same as, or a multiple of the filesystem blocksize. Having a mismatch of blocksize may cause unnecessary OS reads when a database block needs to be written to disk.
One should always use a database block size that matches the filesystem's page size or block size (depending on the type of filesystem). Using a smaller database block size gives bad performance. Using a larger database block size decreases reliability slightly. It introduces a small possibility that a database block could be partially written to disk if a complete system crash (say a power failure) occurs at the right moment. In most cases, database writes will
be atomic, but not always. Database writes will usually be performed as a single disk write by the OS even though two filesystem pages may be involved. But sometimes a database block could be split across a track boundary and the write not be atomic.
8k blocksize will /generally/ give better performance than 4k. There are exceptions, such as if the records per block is less than 32 and the average record size is very small and there are many of them.
The reliability problem is real. The probability of it happening is very low, but if it does, then whatever data block was partially written cannot be corrected by crash recovery. Crash recovery will fail and and the database becomes unusable.
When the database blocksize (say 1 KB) is smaller than the filesystem blocksize (say 8KB), the following happens:
On reads:
1.) Progres requests a 1 kb block from the OS through a read() system call.
2.) The OS finds the correct filesystem block and reads 8K into a kernel buffer then it copies 1K into the database buffer pool.
On writes, the things can really get bogged down:
1.) Progress requests a write of 1K through the write() system call
2.) The OS then has to read the appropriate 8K block from disk because it is no longer in the kernel buffers
3.) Copy in 1K from the write call
4.) Write the 8K kernel buffer
So I/O operations on writes can be doubled and in addition, you may have to wait for the disk to spin all the way around.
If the OS blocksize is 1K (check that it is), then if you have an 8K database blocksize, the OS will need to perform 8 physical I/O's for every database block read. However you will see a benefit regarding index blocks. With larger index blocks more keys can be stored in a single block, so less block reads need
to be performed to find specific keys. This can have a dramatic performance benefit on your system.
If you can change the filesystem blocksize to 8K, you will see a benefit all around (assuming you change the database blocksize to 8K as well).
To change your database blocksize to 8K do the following:
1) Dump and Load your database.
2) Create the database with PROSTRCT utility using -blocksize 8192.
3) procopy $DLC/empty8 into the newly created database.