Consultor Eletrônico



Kbase P91250: Does loading small records first still affect fragmentation in the Type II Storage Area architecture
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   17/04/2009
Status: Verified

GOAL:

Does loading small records first still affect fragmentation in the Type II Storage Area architecture ?

GOAL:

How is fragmentation of records improved in TypeII Storage Areas?

FACT(s) (Environment):

OpenEdge 10.x
All Supported Operating Systems
OpenEdge Database Category: Maintenance

FIX:

When optimizing a database, fragmentation and therefore performance is affected by the order of load and how data are being loaded. For Type I Storage Areas, this still holds true, table fragmentation (and therefore i/o performance when retrieving data) is assisted greatly by the order in which tables are loaded - it is perferabe to load the smallest records before large records in order to ensure minimum fragmentation and maximum contiguous compaction (low scatter).

Whether loading into a Type I or Type II Storage Area, there should still only ever be one load operation per Storage Area at any time so as not to introduce fragmentation during the load. However, time-line considerations sometimes outweigh this strict "rule" and particularly for data of application-associated tables or static tables that were initially carefully considered during the dump phase ito record sorting (aka: Logical dump). Additionally, the "fragmentation" in the Type II case, will always be at a higher granularity to Type I Storage Areas (see example below).

When loading into an Type II Storage Area, it doesn't really matter if you load the tables with large records first, because Blocks in a Cluster are: contiguous and belong to the same object. ?Database objects? can be in one or more clusters, but a cluster can only have ONE database object assigned to it. So, unlike Type I Storage Areas, if there's 'spare space' in the blocks in the cluster, this space will NOT be filled by smaller records from another database object. Again, advice is one load per Storage Area - otherwise loading more than one load per Storage Area, will introduce the "fragmentation" on disk at a scatter of clusters.

Consider the following example:

IF table A is a table with big records and tables B and C are tables with small records - and they all belong to the same Storage Area 1 (SA1):

IF these tables are loaded all at the same time .. the ?fragmentation? is at cluster level. So a full read of TABLE A, because it's spread over more than one cluster - we have to go to other clusters to read the full table content, there is still some need to reposition on the disk but it is dramatically reduced compared to what one might get if these tables had been loaded concurrently into a Type I Storage Area.

IF the tables are loaded all at once:
SA1 CL 0 Area root cluster
CL 1 TABLE A
CL 2 TABLE B
CL 3 TABLE C
CL 4 TABLE A
CL 5 TABLE B
CL 6 TABLE A
IF the tables are loaded one at a time - big records first:
SA1 CL 0 Area root cluster
CL 1 TABLE A
CL 2 TABLE A
CL 3 TABLE A
CL 4 TABLE B
CL 5 TABLE B
CL 6 TABLE C
IF the tables are loaded one at a time - small records first:
SA1 CL 0 Area root cluster
CL 1 TABLE C
CL 2 TABLE B
CL 3 TABLE B
CL 4 TABLE A
CL 5 TABLE A
CL 6 TABLE A
As with any optimization, you have to understand your data, your expansion of that data and the variability of that data. The example above is over simplified - you wouldn't for example want a large table with small records in the same storage area as a small table with large records because the "records per block" would not be the same.