Consultor Eletrônico

Status: Verified

GOAL:

How to change the connection timeout for the replication server to agent connection

GOAL:

How to stop the replication server process from shutting down so quickly

GOAL:

Where to set the timeout of the replication server process

GOAL:

Is there anything that can be done to make OE replication better able to deal with short network "hiccups"?

FACT(s) (Environment):

OpenEdge Replication
All Supported Operating Systems
Progress 9.x
OpenEdge 10.x

FIX:

In the [control-agent.name] section of the server properties file there is a connect-timeout parameter that can be adjusted. This parameter specifies the number of seconds the replication server will attempt to connect to the configured target database agents. The minimum (and default) value is 120 seconds with a maximum of 86,400. This parameter specifies how many seconds the Replication agent waits for connection from the OpenEdge Replication server before the Replication agent shuts itself down.

[control-agent.agent1]
name=agent1
database=target
host=localhost
port=4501
connect-timeout=120
replication-method=async
critical=0

Note: Ensure that the connect-timeout setting does not exceed the time that it would take for the pica buffer to fill, otherwise you will experience the database hang while it waits for data to be replicated.

When the replication server process is running, it uses a buffer to store pointers to AI blocks within the AI files (often referred to as a pica buffer). If this buffer becomes full then the database must wait for a pointer to become free by replicating an AI block and freeing a pointer before it will allow further writes. If the pica buffer becomes full because the network is broken and the replication server is still running, it obviously can't replicate any AI blocks to the target database so it can't free a pointer, therefore the database freezes.

If the replication server is shutdown, it does not use the buffer but instead accumulates the data within the AI files for processing at a later time when the replication server and agents have been started and are connected.

It therefore makes sense that the connect-timeout should not exceed the time it takes for the pica buffer to fill. That way, the replication server process will shutdown automatically before the buffer is full and will allow data to build up in the AI files instead.
If you wish to monitor the status of the replication server to detect when it is not running, there is a dsrutil command that can be run to return a status value:

dsrutil <dbname> -C status -detail

Please check the OpenEdge Replication Users Guide for valid return codes, there are many of them.