This document is a bit terse at this point, but it contains some important information for those building large, high-throughput internet sites using Enhydra Director. During load testing we discovered that a great many connections are left in the TIME_WAIT state after closure. This document describes how to greatly reduce the impact of these lingering connections.
The TCP protocol used by Enhydra Director has in its design a "cooling off period" for connections that have recently been closed. This is the infamous "TIME_WAIT" state. It is demanded by the TCP specification in order to be "sure" that stray data from an old defunct connection on the same ports does not find its way onto the new connection. The specification RFC 793 "Transmission Control Protocol" states that this should be four minutes.
The problem is that the four minute wait is far too conservative for high performance transaction-oriented server implementations. Suppose we have a server that is in steady state receiving and processing 200 requests per second. The current implementation of enhydra director opens a new connection for each transaction to the back-end server, much like Apache JServ. This means that 200 connections per second on average are going into the "TIME_WAIT" state when they finish. After four minutes, older TIME_WAIT connections begin to disappear as expected. This means that there will, on average, be about four minutes' worth of TIME_WAIT connections at any given time. Doing the math, we get 200 * 4 * 60 = 48000 connections!
The problem here is that the maximum number of connections that any one server can have in any state is 65535. This is because the TCP specification (RFC 793) specifies that the "port" which defines the local endpoint for the connection is only a 16 bit unsigned integer, and the largest such number is 65535. Worse, ports 0-1024 are usually reserved for well-known services and are not available as dynamically assigned (or "ephemeral") ports. In the best of all cases, we have 64511 ports available. With a 4 minute TIME_WAIT, we can compute 64511 ports / ( 4 minutes * 60 seconds per minute) = 268 conn. per sec. This means that no matter how fast your machine, connections will start failing at 268 connections per second average load. Actually the failures will usually starte a little before that threshold is reached. The situation here is similar to having a brand new Ferrari on a crowded freeway at rush hour. You might be able to do 180 MPH, but you're still going no faster than 20.
The current solution to this problem is to reduce the amount of time old connections spend in TIME_WAIT. Some purists will sound off that doing so is a violation of RFC 793. Technically this is true, but I argue that on modern networks, this violation is a very venial sin, compared to the performance improvements it gives. Here's why.
The purpose of TIME_WAIT is to prevent lingering stale duplicates from old connections from finding their way into an existing connection, causing data corruption. Back in the days of 9600 baud X.25 networks, where a packet could very well circulate around for many tens of seconds among remote routers, four minutes was a good conservative number. The Web had not been invented yet, and if it had, no one would have accused a PDP/11 of being capable of sustaining 200 or more transactions per second anyhow.
On a typical Enhydra high throughput server, all of the back-end EnhydraDirector connections will be taking place on a fast 100Base-T or Gig ethernet subnet. In such a situation, a packet is either going to make it to its destination or die within milliseconds if not microseconds, unless severe routing misconfigurations are present.
On the Internet side of the server, connections are coming from all different places. It is quite possible that stray packets could linger for quite some time. For a single client-server pair, the four minute TIME_WAIT is not going to be a problem and may be reasonable if the connection is being reestablished for each transaction. However, a web server is a server with many different clients. HTTP requests from a single client are in most cases piggybacked into one TCP session using HTTP "Keep-Alive" semantics. Most of the new connections are coming from different clients, which will prevent the 'stray segment' problem anyhow.
The bottom line is, for a stray segment to make it into a new connection, the following conditions must ALL be met.
1. The local port of the new connection MUST match the local port of the old connection. 2. The local IP address of the new connection MUST match the local IP address of the old connection. 3. The remote port of the new connection MUST match the remote port of the old connection. 4. The remote IP address of the new connection MUST match the remote IP address of the old connection. 5. The TCP sequence number in the old segment packet MUST such that the new connection will think it is valid. This is highly unlikely, even in the already unlikely event that the other four conditions have been met.
Given these conditions, it should be very safe to set a TIME_WAIT interval that is quite low. We recommend that you choose a TIME_WAIT value that keeps the average number of outstanding TIME_WAIT connections below 30000. That is, TIME_WAIT_SECS * CONN_PER_SEC < 30000. A value of 60 seconds will allow a theoretical maximum of 1072 connections per second, and up to 500 connections per second without going over 30000 TIME_WAIT lingerers. If you need more, then lower TIME_WAIT even more. I have seen thirty seconds quoted as a good value, and have heard of one vendor suggesting one second. I doubt you will have problems if you choose either of these values for a typical Enhydra Director server setup.
On RedHat 6.1, run the following command:
/sbin/sysctl net.ipv4.ip_local_port_range
If you haven't already changed this value, it is 1024 to 4999. This is the range of ephemeral ports available by default, and as you can see it is too low for a high performance server. Use the command
/sbin/sysctl -w net.ipv4.ip_local_port_range="1024 65535"to change the range to 1024-65535. By default the TIME_WAIT interval on Redhat Linux 6.1 is one minute. This will allow up to 1072 connections per second and 500 per second without exceeding 30000 TIME_WAITS. If you need more, you have to change a kernel header file and recompile your Linux kernel. Assuming /usr/src/linux is a symlink to your current kernel, the file to change is:
/usr/src/linux/include/net/tcp.hChange the line:
#define TCP_TIMEWAIT_LEN (60*HZ)to:
#define TCP_TIMEWAIT_LEN (where:* HZ)
TIMEWAITSECS is the number of seconds for connections to remain int the TIME_WAIT state.
Example to change the TIME_WAIT period to 15 seconds:
#define TCP_TIMEWAIT_LEN (15 * HZ)
Once you have saved 'tcp.h', do a COMPLETE rebuild and install of the Linux kernel following the usual Linux kernel rebuild procedure.
On Solaris, the TIME_WAIT interval is a tuneable parameter and can
be changed using 'ndd'. The value is specified in milliseconds so to
specify a value of 30 seconds, use 30000 milliseconds. The command is:
Solaris 2.6 and before:
ndd -set /dev/tcp tcp_close_wait_interval 30000Solaris 2.7 and later:
ndd -set /dev/tcp tcp_time_wait_interval 30000
The change takes effect immediately on all new connections. Old connections will still wait for the old interval until they expire. You should put the above command into a system startup file so that it is run each time the system is rebooted.
I haven't noticed the TIME_WAIT issue as a problem during our load testing, in NT, but it is possible you'll run into it. The way to set the TIME_WAIT interval in WinNT is to edit the registry key under HKEY_LOCAL_MACHINE:
SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpTimedWaitDelay
If this key does not exist you can create it as a DWORD value. The numeric value is the number of seconds to wait and may be set to any value between 30 and 240. If not set, WinNT defaults to 240 seconds for TIME_WAIT. I've also heard that Microsoft put undocumented hooks into WinSock to allow IIS to get around the TIME_WAIT problem, but I have not personally substantiated this.
This information was obtained from the Compaq Tru64 kernel tuning FAQ. This is a very useful page that describes numerous tuneable parameters for the Tru64 OS.
To change the TIME_WAIT and Port Range at runtime: (as root of course)
The easy way: Run 'Kernel Tuner'. From root's CDE Desktop...
CDE Menu Bar --> Manager Icon --> System_Admin --> MonitoringTuning --> Kernel Tuner --> 'inet' subsystem.
For TIME_WAIT period, change the 'tcp_msl' attribute to the number of seconds. Let's say 30.
For port range, change 'ipport_userreserved_min' to 1024, and 'ipport_userreserved' to 64511. This will allow all ports from 1024 to 65534 to be used.
You should be able to make these changes 'permanent' meaning that they will take effect automatically even if the machine is rebooted.
The hard way #1: (At runtime manually every time you reboot)
Arrange for the following script to be run by or as an '/etc/rc3.d' script at boot time.
#!/bin/sh # Change TIME_WAIT period to 30 seconds. Factory default is 60. /sbin/sysconfig -r inet tcp_msl=30 # Change Port Range to 1024-65534 (Total 64511 available ephemeral ports) /sbin/sysconfig -r inet ipport_userreserved_min=1024 /sbin/sysconfig -r inet ipport_userreserved=64511
The hard way #2: (Permanent using the sysconfigdb command)
To permanently set these values every time you boot, you use the /sbin/sysconfigdb command to read a stanza file into the bootup config. To change the TIME_WAIT, create a file called 'msl.txt' that contains:
inet: tcp_msl = 30
Then run '/sbin/sysconfigdb -a -f msl.txt' to register the change. Finally, reboot.
To change the ephemeral port range, create 'port.txt' as follows:
inet: ipport_userreserved_min = 1024 ipport_userreserved = 64511
Then run '/sbin/sysconfigdb -a -f port.txt' to register the change. Finally, reboot.