Consultor Eletrônico



Kbase 18950: Support for Compaq NT Clustering Failover Version 8.3A
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   25/01/2005
Status: Unverified

GOAL:

How to set up and configure your NT Clusters system to run Progress database servers and allow failover of those servers to provide continuous client access to your database servers.

FACT(s) (Environment):

Progress 8.3A
Windows NT 32 Intel/Windows 2000

FIX:

This solution includes some examples used to test the functionality. These examples include scripts to start and shutdown ProService as well as examples to start and stop specific database servers. There is also some sample code for the client to exhibit the considerations that must be made in the event of one of the cluster nodes failing.

NT Clusters


NT Clusters provide a mechanism for server application providers to create/configure their server applications to run seamlessly from the client's perspective.

An NT Cluster is comprised of two identical Windows NT server systems connected to a shared SCSI storage device. The cluster shared disk storage can either be disks on the same SCSI bus or disks connected through a smart SCSI Controller. Compaq NT Clusters' documentation describes all the possible and supported configurations.

Within the scope of the NT Cluster, there are a few tools to manage the cluster. The primary tool is the Cluster Administrator from which the Failover Manager manages the cluster failover resources.

Failover is defined as the moving of resources from a primary node to a secondary node when the primary node either crashes or there is manual intervention to force resource reallocation from one node to another.

Cluster Administrator


The Cluster Administrator is a GUI tool that allows you to define Failover Groups and Failover Objects that failover from one node to the other. Failover Groups are user-defined entities of Failover Objects. Failover Objects are the specific parts that are determined necessary to allow server products to run.

Examples of Failover Objects include disks, network protocols, and scripts. Application server providers can also provide DLLs to make more direct calls to their products from the Failover Manager.

Examples of these are the MS SQL Server, ORACLE Database Server, Lotus Notes, and Web servers.

Failover Manager


The Failover Manager is an image that runs on each node in the cluster. It handles the reading of the Failover Group information and performs the startup and shutdown of the various Failover Objects in the specific order in which they were defined.

The Failover Manager determines when one node has either crashed or has shutdown in some manner, then it performs actions based on the objects that were running on the failed node. The Failover Manager also allows for manual intervention to failover specific Failover Groups as if the primary node failed.

Failover Groups


Failover Groups are created by the cluster system manager using the Cluster Administrator interface. The Failover Group must contain at least one Failover Object and a cluster node must be chosen as the primary node. Within each Failover Group, the Cluster Administrator requires that there be at least one shared disk Failover Object.

After minimum requirements are met, any number of other Failover Objects can be placed into or removed from the Failover Group. When removing Failover Objects from a Failover Group the Cluster Administrator ensures that there is at least one object in the group.

Failover Groups can be deleted, in which case the Failover Objects that were in the group are returned to the general pool of Failover Objects. A Failover Group contains all the dependencies to allow the group to function as a unit on a node.

Failover Objects


Failover Objects are defined by the cluster manager using the Cluster Administrator interface. Failover Objects are created and placed into a pool of available objects. When the Failover Group is created, an object is placed into the group and removed from the pool of objects.

A Failover Object cannot be placed into two Failover Groups. In this scenario, two identical object.s need to be created. Failover Objects have startup and shutdown attributes. When a Failover Group is brought online, the startup option is run. Conversely, when the Failover Manager determines that the group must be failed over, the shutdown option is run.

The two primary Failover Objects are Shared Disks and TCP/IP Alias. These objects give the cluster the ability to make the cluster nodes transparent to the client(s).

The shared disk objects are created automatically when the shared disks are given a name by the Cluster Administrator interface. When a disk object is placed into a Failover Group, it is brought online to the primary node by the Failover Manager. Due to the nature of SCSI, only one node can directly access or own a shared disk at any one time.

The TCP/IP alias objects are defined by the cluster manager using IP addresses that are not currently assigned to any other node. The cluster TCP/IP alias allows clients to access the nodes in the cluster without the clients needing to know which nodes they are connected to.

A TCP/IP alias object is associated with only one node at a time and migrates from the primary to the secondary node, when the Failover Manager fails over a Failover Group. In this scenario, the client just reconnects to the same name,
but now the client is attached to a different node.

There can be more than one TCP/IP alias name per cluster. This would happen when each node is acting as a server to a different database and defines only one IP address with the database server.

Management of the TCP/IP alias name dependencies is left to the cluster system manager. It is recommended that a TCP/IP alias object be present for each Failover Group that has objects that require TCP/IP services.

For Progress, the only other Failover Object used is the script object. This object allows the Failover Manager to either run a script or run a specific image as defined by the cluster system manager.

Scripts or images that are to be run can either be placed onto a shared storage device or onto the system disk of each node. When using the system disk, the drive letter for the system disk must be the same on both nodes or an environment variable must be used on both nodes to mask the drive letter.

When it uses shared storage for a script object script or image, the shared disk object must be placed before the script object in the Failover Group order. This ensures the disk is online before the Failover Manager attempts to run the script object.

Progress example:

For Progress, a Failover Group might be comprised of a shared disk object, a TCP/IP alias object, and a script object that would start a database server.  Thus, in order for the database server to start, it must first have a disk and then a TCP/IP address before the server can start. The script object can either start the database server directly or the script object can run a written script that starts the database server.

Configuring the NT Cluster System


The configuration and setup of the actual NT Cluster is described in the NT Clusters documentation. The order in which you install the NT Cluster software and your Progress software is very important.

If you install Progress after you install your NT Cluster software, you must reboot the node to allow the NT Cluster software to read the Registry and system environment variables so it knows about the PROGRESS software.

If you install Progress before you install the NT Cluster Software, there are no problems. Each node in the cluster must have PROGRESS installed onto its system disk. It is strongly recommended you install identical versions and patches of PROGRESS on both nodes to ensure the same operating environment.

One other configuration issue that you should be aware of is that the system disk drive letter must be the same for both no.des in your cluster. (Usually this letter is either C: or D: depending on how you partition your node's system disk.) Use of the same system disk drive letter lets you create and run scripts from your NT Cluster Failover Manager. The script location is stored in the Failover Manager and must be listed with the same physical location on both systems.

After the two systems have the NT Cluster software running, use the Windows NT Disk Administrator to assign a drive letter to each shared disk. Then, use the Cluster Administrator to assign a name to your disks and to create your disk Failover Object(s).

Use the Cluster Administrator to create your TCP/IP alias Failover Object(s) also. Once they are created, these objects appear in the Cluster Administrator display and you can begin to create your Failover Groups for Progress.

PROGRESS Database server considerations and configuration


On the server side, think of the cluster as one node. Although from a management viewpoint, you still have nodes onto which you must install, configure, and update the product.

The original setup is the most time consuming, because you must install, configure, and maintain the Progress environment, in addition to updating and maintaining the TCP/IP SERVICES file on each node. However, from the user or application viewpoint, there is only one host and one service.

For TCP/IP, the service name you associate with your database server is a name that is used cluster-wide. Therefore, you must edit the TCP/IP SERVICES file on both nodes and also add the same service name and socket number on each node. You must use the same names and numbers, because in a failover, the secondary node starts the database server and your clients need to reconnect to your service by node and service number.

After Progress is installed on both nodes, you must determine which cluster shared disk(s) is to hold your database. Next, use the Cluster Administrator to create a new Progress Failover Group. Place the shared disk object(s) into the group.

Doing this places the disk(s) online on the node that you chose to be the primary node. From the primary node, use the Progress Database Administration tools to create or copy a database onto the shared disk(s).

Next, use ProControl to configure the database server information, using a cluster TCP/IP alias for the Host (-H) parameter name and the cluster-wide service name from the SERVICES file for the Service (-S) parameter name.

You must duplicate this effort on both nodes, so it is recommended that you have a ProControl screen visible on both nodes to be sure the duplication is exact.

Within ProControl, you can choose to have the database server started automatically or not. If you have only one cluster-shared disk, it is recommended that you allow ProControl to start the database servers automatically when it starts ProService. However, if you have multiple cluster shared disks and you have chosen to have each node act as the primary node for specific database servers, this approach is not recommended.

At this time, ProControl only knows three commands, (Start), (Stop), and (Status). Therefore, if one node fails and those database servers failover to the secondary node when you have database servers running on both nodes, you have to (Stop) and (Start) ProControl in order to start the automatic database server startup. Doing so disrupts current users of the secondary node.

In the single cluster shared disk model, ProControl should not be already running on the non-primary node. So, in a failover, you start ProControl with automatic database server startup set and your database servers are started for you.

Continue creating Failover Groups and adding/modifying ProControl for each database server on each shared disk. When this is done, use the Cluster Administrator to create Script Failo.ver Objects to be placed in your Progress Failover Group(s).  These script objects can be either actual scripts or they can be the direct command line interface into ProControl. In either case, you must use the ProControl command line interface.

ProControl Command Line Interface


In order to start servers to Progress databases on an NT node, ProControl runs the Progress image, PCCMD.EXE. This image controls either starting or stopping ProService and the various database servers.

If you cannot start the ProService by the use of PCCMD.EXE, try to configure the Progress database with Windows NT Cluster Administrator by starting ProService as any NT service (by the name it is known in the control panel).
.