Consultor Eletrônico

Status: Unverified

GOAL:

Customized Incremental Dump and Load

GOAL:

Database Migration - Impossible Mission summarized

FIX:

A Progress Project Manager wrote within "SmartNews" February 2000 an article titled:
Database Migration - Achieving Mission Impossible
Here is a slightly modified version of this article which was published with some figures.

****

The Mission was to migrate an 18GB version 8.2b Progress database and application from SCO Unix 5.x to Progress version 9.0b running on HP-UX 11 The total shutdown time had to be less than 40 hours. Progress databases are incompatible between SCO & HP-UX 11; a test of the Dump and Load strategy showed a requirement over 50 hours. Effort was applied to developing a migration strategy which, could be completed in the shutdown window and did not disrupt use of the current system. This required:

. Analysis of the existing databases structure and content
. Review & testing of the most suitable migration techniques
. Identification of static and dynamic data by year & period
. Development of checklists and procedures to ensure total quality
. Confirmation of database integrity at multiple levels.
. Development, testing & optimization of customized data extraction and loading utilities.

An alternative migration technique to a standard dump and load was developed which can be described as an INCREMENTAL DUMP AND LOAD, fracturing the largest table contents into smaller portions to permit phased migration, loading portions as soon as they are dumped.

The technique used is possible when

. The database has tables which contain large portions of data that is static or becomes static after an event or status change, i.e. previous financial years, sales history, closed purchase orders and finalized invoices.
. Static data is significantly greater in volume compared to dynamic data and new data is easily and reliably identified.
. System capacity is available to run the incremental dumping programs on the same system or another system without disruption to the live application.
. Static data can be dumped & loaded into new system in advance of shutdown of live system.
. Dumping of dynamic data can only be done with live system shutdown, this data is then loaded into the new system to complete the data transfer.

The tricky part is making sure what appear to be static data remains static and that the dynamic data can be reliably identified during the migration exercise. Database analysis techniques and repeated testing identified which tables contained static data:

. Use the proutil table analysis utility to identify largest tables and the data dictionary to identify the data stored in them. Load report into a spreadsheet for easy manipulation and subsequent quality control tracking.
. Review the fields of these tables to identify means of discriminating between dynamic data and static data, ie Financial Year & Period and Finalized status fields.
. Identify parent-child (entity relationships) between tables. Joins with full index support will help dump performance greatly.
. Review indexes of tables to determine performance issues and whether the ordering of fields provides a suitable uniqueness and break points for splitting data, ie by Year and Period for financial transactions. Consider adding indexes if there is a long term benefit and space available.
. Develop test procedures to validate the application data relationships and performance overheads - avoid table scans at all costs. Beware, the application may not store data in the manner the data dictionary reports may suggest
. Produce a summary log of the data in the table by use of break-by logic on the index fields. The summary will then detail the ranges of data and help identify what might be static. Compare the dumped data against previous dumps by file comparisons or and/or checksums.
. Maximize the number of tables dumped through joins where supported by indexed fields, providing perform.ance does not reduce for extra tables.
. Repeat testing and compare the record count summaries of data dumped to identify if data was truly static and not semi-static.

Some tips:

. Application knowledge is valuable for the analysis, i.e. can a purchase order be updated or deleted that belongs to a previous financial year, if so what status field values are involved?
. Choose a migration date that minimizes the amount of dynamic data, ie early period in the current financial year & month.
. If there is any doubt as to static content of a table, then it must be considered dynamic.

A rental server was used for development, testing and refinement of the incremental dumping and loading strategy, providing these advantages:

. Increased the processing power available for dumping operations when used in parallel with the existing system.
. Permitted the incremental static dumps to be done in advance of cut over week-end, thereby reducing the critical workload. Repeat runs allowed for testing and refinement of the process.
. Provided a full database backup system should it have been required.
. Permitted an index rebuild to be undertaken during a weekend prior to the migration. This assisted in reducing dump times on the existing server after shutdown. A full index rebuild had not been done for several years due the excessive time required to complete on the existing server.
The results of repeated analysis revealed that 63% of the database records were static, and that just 6 tables made up 75% of the database. While some tables appeared to be static, after 2 reviews they were found to be dynamic, ie purchase orders records.

Incremental dumping requires development of specialised dump programs for static and dynamic data. Dynamic data can only be dumped with the application shutdown. Loading of data can be handled by a modified bulk load utility for handling multiple data files per table.

To dump largely static tables, special progress 4gl batch procedures were written that utilised the relevant fields and indexes to extract only the static data from the tables producing ascii dumps in separate file for ranges of the data. In general a common criteria for static tables will emerge which can be used to form a data file naming convention when combined with the table name. In this instance each static table involved a status field and the financial year & period fields. Each dump procedure generated statistical information to assist in the analysis of data, verification of integrity and calculation of time required to execute.
To minimise the number of dump procedures written, the dynamic portion of large static tables were dumped after application shutdown by re-running the static procedures with revised criteria to extract dynamic data. For all large dynamic tables, procedure templates were used to permit rapid development of customised dumps. Generally any table taking more than 1 hour to dump through the dictionary was dumped via a dedicated procedure.
All the small and numerous tables were dumped by a procedure that utilised a table list and standard database dictionary dump programs. The tables not containing any data (ie empty) were eliminated from the table lists. To optimise dumping, the dump times of these small files were first captured, then collated into 3 sorted lists, then dumped in 3 concurrent processes.
This resulted in the dumps finishing approximately at the same time, maintaining a consistent but controlled load on the system.
All table data of the existing database was dumped into ascii format files (either as incremental static data in separate files or whole table data in one file), this permitted use of the bulk load utility supplied with Progress. The bulk load utility requires command line arguments, and can not deal with segmented table data without manual intervention. The number of dump files genera.ted by incremental dumps, required a means of automation of bulk loading to ensure reliability and accuracy. A unix script was developed which combined with 4GL procedures and the bulkload facility to provide the control and automation required. The functionalities of the loader script and supporting 4GL code are:

. Provide dedicated loading directories for dump files to be deposited in
. Determine the corresponding database table name automatically by examination of the dump file name
. Ensure the dump file was only loaded once.
. Generate the required descriptor file for the bulk load automatically
. Trap error codes and produce warnings.
. Provide a log file to trace activity.
The loader script was developed to support dump files generated from all three types of dump procedures, ie static data dumps, whole table dumps, dictionary dumps. The convention used for dump file names was the basis for determination of the database table name to receive the data. This standard was based on use of the "_" (underscore) character to prefix any filename characters not related to the table name. This assumes all table names of the database do not contain any "_" characters. Where the underscore was not used in the filename of the dump produced, this usually represented the table dump name as recorded in the metaschema. All dump procedures must use the same convention for incremental dumps to permit utilisation of one loading mechanism.
A special 4GL procedure was written to validate the derived database table name from the dump filename, allowing the loader script to be self table selecting. It is important to note that dump names generated by the progress dictionary dump procedures utilise a dump name limited to 8 characters, hence interrogation of the metaschema is required to properly validate all dump files.

For succeeding the migration and benefiting of it, here a list of tips to control and implement quality:

. Use the output of proutil table analysis is perfect for comparing the before and the after record counts per table and reviewing the total records dumped by procedures and loaded by the bulkloader. Load data into a spreadsheet to help this and include dump times so that you can compare between runs to refine performance
. Check if the customized dump, involving joins, did extract all records from a table or left out orphan records; the table analysis will reveal this when compared to dump procedure record counts. Correct all data anomalies prior to migration to ensure all records will match
. Find the optimal number of records per area. Database analysis can be used to compare the new database structure with the old, and identify improvements between test loads to further optimise the loading process. It is advisable to keep the first table analysis produced on the migrated database so that fragmentation can be reviewed as the database grows.
. Have a Migration Check List, a checklist based on test runs with indicative times for each task. Remember you are racing against the clock and you need to plan for recovery time in case something goes wrong. The incremental dumping strategy relies on doing as much in advance as possible, and parallel execution of tasks during the shutdown. Automation of processes must be done with proper error control and reporting for you to remain in control.
. Test your strategy by doing at least one test run. Optimise the process and then plan the actual migration. Ensure all elements are included in the plan, don't assume anything, be conservative by under promising and then over deliver. Track your performance to predict your finishing time.
. The acid test is the application running on the database, identify the best summary reports that will indicate data integrity, preferably consolidating data from multiple tables and over a long period of time. Plan for the legacy system to run these reports prior to shutdown and allow u.sers to run there own before and after tests.
The execution of this strategy took exactly 30 hours from shutdown of the current system to start up of the new system, excluding the dumping of static data done in advance, index rebuilds and table analysis. There was 10 hours left of the shutdown window to allow the customer to test the application and prepare prior to live usage.
Without the use of this incremental dump and load strategy this migration would have required up to 4 days of application shutdown.

.