Kbase 19611: Extent size optimization under Unix
Autor |
  Progress Software Corporation - Progress |
Acesso |
  Público |
Publicação |
  3/21/2000 |
|
Unix stores pointers to file data in structures called INodes.
One INode holds pointers to a number of data blocks. If the
file is small, then the INode just points to each block of data
in the file. Unix can read the INode, and then go to the
correct physical block on the disk to read the data.
INode
-----------------
| | | | |
-----------------
| |
| |
| | --------------
| -----------| Data block |
| --------------
|
| --------------
---------------| Data block |
--------------
If the file is larger, then the INode will contain pointers
to other INodes, which contain pointers to the actual data.
This is first-level indirection, since the file is accessed
by reading the first INode, which points to the second set
of INodes, which points to the data.
INode
-----------------
| | | | |
-----------------
| |
| | INode
| | -----------------
| -----------| | | | |
| -----------------
|
| INode
| -----------------
---------------| | | | |
-----------------
| |
| |
| | --------------
| -----------| Data block |
| --------------
|
| --------------
---------------| Data block |
--------------
The argument continues as the file grows. Again, if the file
is (very) much bigger, then the first INode contains pointers
to other INodes (as before), which point to other INodes, which
point to the data (second-level indirection). The meaning
of third-level indirection can be derived from here.
The problem here is that Unix has to walk the INode tree until
it reaches the data every time a read or write is performed.
The more levels of indirection, the more disk I/O needs to
be done in order to read or write a block from the database.
Unix can cache INodes, but it may not be able to hold all of
them and therefore, what we would consider one I/O may actually
be two or three (worst case). Also, there is some CPU overhead
in walking the INode tree, or adding a new block to it, because
new INodes may have to be added and (in a worst case), additional
higher-level INodes may be needed as well.
On most Unix systems, an optimal extent size is between 300 and
500 MBytes. Anything larger than this will result in triple
indirection, which can impact performance (see KBase #15911,
"Why extents optimize at 300-500meg under UNIX").