Consultor Eletrônico



Kbase P62064: TCP/IP failure is not being detected via Fathom Replication on Linux
Autor   Progress Software Corporation - Progress
Acesso   Público
Publicação   02/10/2007
Status: Verified

FACT(s) (Environment):

Linux Intel
Fathom Replication

SYMPTOM(s):

Fathom Replication server does not detect the rebooting of target machine

Connect-timeout is not working properly with Fathom Replication on Linux

The connect-timeout between replication server and agent is being detected when target machine is back online after reboot

The connect-timeout between replication server and agent is being detected when target machine is back online after tcp/ip failure

TCP/IP failure is not being detected via Fathom Replication on Linux

lsof -i output says that the socket is still established and appear to be alive from TCP/IP point of view

CAUSE:

This behavior is due to default Linux Intel TCP/IP settings, more exactly the tcp_retries2 kernel parameter.

The tcp_retries2 value tells the kernel how many times to retry before killing an alive TCP connection (eg before saying this connection is really "dead"). This limit is specified to a minimum of 100 seconds in RFC 1122, but is normally way to short.

The variable takes an integer value and is set to 15 per default. This value corresponds to 13-30 minutes depending on the Retransmission timeout (RTO), which is being calculated based on CPU speed, so the timeout may vary.

FIX:

Modify the tcp_retries2 parameter as follows:

- To change it immediately:
echo 2 > /proc/sys
et/ipv4/tcp_retries2
and restarting the network (/etc/init.d
etwork restart)

- To make it permanent (when server gets rebooted):
edit /etc/sysctl.conf
add/modify line
net.ipv4.tcp_retries2 = 2

The value 2 might be too low (or too aggressive) for applications they have "slow" responses, baseline experimentation may be necessary in order to find the ideal value for each environment. Other parameters which may need considering are:


/etc/sysctl.conf file:

net.ipv4.tcp_retries2 = 2
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 3