| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
| <HTML> |
| <HEAD> |
| <TITLE>Connections in FIN_WAIT_2 and Apache</TITLE> |
| <LINK REV="made" HREF="mailto:marc@apache.org"> |
| |
| </HEAD> |
| |
| <!-- Background white, links blue (unvisited), navy (visited), red (active) --> |
| <BODY |
| BGCOLOR="#FFFFFF" |
| TEXT="#000000" |
| LINK="#0000FF" |
| VLINK="#000080" |
| ALINK="#FF0000" |
| > |
| <!--#include virtual="header.html" --> |
| |
| <H1 ALIGN="CENTER">Connections in the FIN_WAIT_2 state and Apache</H1> |
| <OL> |
| <LI><H2>What is the FIN_WAIT_2 state?</H2> |
| Starting with the Apache 1.2 betas, people are reporting many more |
| connections in the FIN_WAIT_2 state (as reported by |
| <code>netstat</code>) than they saw using older versions. When the |
| server closes a TCP connection, it sends a packet with the FIN bit |
| sent to the client, which then responds with a packet with the ACK bit |
| set. The client then sends a packet with the FIN bit set to the |
| server, which responds with an ACK and the connection is closed. The |
| state that the connection is in during the period between when the |
| server gets the ACK from the client and the server gets the FIN from |
| the client is known as FIN_WAIT_2. See the <A |
| HREF="ftp://ds.internic.net/rfc/rfc793.txt">TCP RFC</A> for the |
| technical details of the state transitions.<P> |
| |
| The FIN_WAIT_2 state is somewhat unusual in that there is no timeout |
| defined in the standard for it. This means that on many operating |
| systems, a connection in the FIN_WAIT_2 state will stay around until |
| the system is rebooted. If the system does not have a timeout and |
| too many FIN_WAIT_2 connections build up, it can fill up the space |
| allocated for storing information about the connections and crash |
| the kernel. The connections in FIN_WAIT_2 do not tie up an httpd |
| process.<P> |
| |
| <LI><H2>But why does it happen?</H2> |
| |
| There are several reasons for it happening, and not all of them are |
| fully understood by the Apache team yet. What is known follows.<P> |
| |
| <H3>Buggy clients and persistent connections</H3> |
| |
| Several clients have a bug which pops up when dealing with |
| <A HREF="../keepalive.html">persistent connections</A> (aka keepalives). |
| When the connection is idle and the server closes the connection |
| (based on the <A HREF="../mod/core.html#keepalivetimeout"> |
| KeepAliveTimeout</A>), the client is programmed so that the client does |
| not send back a FIN and ACK to the server. This means that the |
| connection stays in the FIN_WAIT_2 state until one of the following |
| happens:<P> |
| <UL> |
| <LI>The client opens a new connection to the same or a different |
| site, which causes it to fully close the older connection on |
| that socket. |
| <LI>The user exits the client, which on some (most?) clients |
| causes the OS to fully shutdown the connection. |
| <LI>The FIN_WAIT_2 times out, on servers that have a timeout |
| for this state. |
| </UL><P> |
| If you are lucky, this means that the buggy client will fully close the |
| connection and release the resources on your server. However, there |
| are some cases where the socket is never fully closed, such as a dialup |
| client disconnecting from their provider before closing the client. |
| In addition, a client might sit idle for days without making another |
| connection, and thus may hold its end of the socket open for days |
| even though it has no further use for it. |
| <STRONG>This is a bug in the browser or in its operating system's |
| TCP implementation.</STRONG> <P> |
| |
| The clients on which this problem has been verified to exist:<P> |
| <UL> |
| <LI>Mozilla/3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) |
| <LI>Mozilla/2.02 (X11; I; FreeBSD 2.1.5-RELEASE i386) |
| <LI>Mozilla/3.01Gold (X11; I; SunOS 5.5 sun4m) |
| <LI>MSIE 3.01 on the Macintosh |
| <LI>MSIE 3.01 on Windows 95 |
| </UL><P> |
| |
| This does not appear to be a problem on: |
| <UL> |
| <LI>Mozilla/3.01 (Win95; I) |
| </UL> |
| <P> |
| |
| It is expected that many other clients have the same problem. What a |
| client <STRONG>should do</STRONG> is periodically check its open |
| socket(s) to see if they have been closed by the server, and close their |
| side of the connection if the server has closed. This check need only |
| occur once every few seconds, and may even be detected by a OS signal |
| on some systems (e.g., Win95 and NT clients have this capability, but |
| they seem to be ignoring it).<P> |
| |
| Apache <STRONG>cannot</STRONG> avoid these FIN_WAIT_2 states unless it |
| disables persistent connections for the buggy clients, just |
| like we recommend doing for Navigator 2.x clients due to other bugs. |
| However, non-persistent connections increase the total number of |
| connections needed per client and slow retrieval of an image-laden |
| web page. Since non-persistent connections have their own resource |
| consumptions and a short waiting period after each closure, a busy server |
| may need persistence in order to best serve its clients.<P> |
| |
| As far as we know, the client-caused FIN_WAIT_2 problem is present for |
| all servers that support persistent connections, including Apache 1.1.x |
| and 1.2.<P> |
| |
| <H3>Something in Apache may be broken</H3> |
| |
| While the above bug is a problem, it is not the whole problem. |
| Some users have observed no FIN_WAIT_2 problems with Apache 1.1.x, |
| but with 1.2b enough connections build up in the FIN_WAIT_2 state to |
| crash their server. We have not yet identified why this would occur |
| and welcome additional test input.<P> |
| |
| One possible (and most likely) source for additional FIN_WAIT_2 states |
| is a function called <CODE>lingering_close()</CODE> which was added |
| between 1.1 and 1.2. This function is necessary for the proper |
| handling of persistent connections and any request which includes |
| content in the message body (e.g., PUTs and POSTs). |
| What it does is read any data sent by the client for |
| a certain time after the server closes the connection. The exact |
| reasons for doing this are somewhat complicated, but involve what |
| happens if the client is making a request at the same time the |
| server sends a response and closes the connection. Without lingering, |
| the client might be forced to reset its TCP input buffer before it |
| has a chance to read the server's response, and thus understand why |
| the connection has closed. |
| See the <A HREF="#appendix">appendix</A> for more details.<P> |
| |
| We have not yet tracked down the exact reason why |
| <CODE>lingering_close()</CODE> causes problems. Its code has been |
| thoroughly reviewed and extensively updated in 1.2b6. It is possible |
| that there is some problem in the BSD TCP stack which is causing the |
| observed problems. It is also possible that we fixed it in 1.2b6. |
| Unfortunately, we have not been able to replicate the problem on our |
| test servers.<P> |
| |
| <H2><LI>What can I do about it?</H2> |
| |
| There are several possible workarounds to the problem, some of |
| which work better than others.<P> |
| |
| <H3>Add a timeout for FIN_WAIT_2</H3> |
| |
| The obvious workaround is to simply have a timeout for the FIN_WAIT_2 state. |
| This is not specified by the RFC, and could be claimed to be a |
| violation of the RFC, but it is widely recognized as being necessary. |
| The following systems are known to have a timeout: |
| <P> |
| <UL> |
| <LI><A HREF="http://www.freebsd.org/">FreeBSD</A> versions starting at 2.0 or possibly earlier. |
| <LI><A HREF="http://www.netbsd.org/">NetBSD</A> version 1.2(?) |
| <LI><A HREF="http://www.openbsd.org/">OpenBSD</A> all versions(?) |
| <LI><A HREF="http://www.bsdi.com/">BSD/OS</A> 2.1, with the |
| <A HREF="ftp://ftp.bsdi.com/bsdi/patches/patches-2.1/K210-027"> |
| K210-027</A> patch installed. |
| <LI><A HREF="http://www.sun.com/">Solaris</A> as of around version |
| 2.2. The timeout can be tuned by using <CODE>ndd</CODE> to |
| modify <CODE>tcp_fin_wait_2_flush_interval</CODE>, but the |
| default should be appropriate for most servers and improper |
| tuning can have negative impacts. |
| <LI><A HREF="http://www.sco.com/">SCO TCP/IP Release 1.2.1</A> |
| can be modified to have a timeout by following |
| <A HREF="http://www.sco.com/cgi-bin/waisgate?WAISdocID=2242622956+0+0+0&WAISaction=retrieve"> SCO's instructions</A>. |
| <LI><A HREF="http://www.linux.org/">Linux</A> 2.0.x and |
| earlier(?) |
| <LI><A HREF="http://www.hp.com/">HP-UX</A> 10.x defaults to |
| terminating connections in the FIN_WAIT_2 state after the |
| normal keepalive timeouts. This does not |
| refer to the persistent connection or HTTP keepalive |
| timeouts, but the <CODE>SO_LINGER</CODE> socket option |
| which is enabled by Apache. This parameter can be adjusted |
| by using <CODE>nettune</CODE> to modify parameters such as |
| <CODE>tcp_keepstart</CODE> and <CODE>tcp_keepstop</CODE>. |
| In later revisions, there is an explicit timer for |
| connections in FIN_WAIT_2 that can be modified; contact HP |
| support for details. |
| <LI><A HREF="http://www.sgi.com/">SGI IRIX</A> can be patched to |
| support a timeout. For IRIX 5.3, 6.2, and 6.3, |
| use patches 1654, 1703 and 1778 respectively. If you |
| have trouble locating these patches, please contact your |
| SGI support channel for help. |
| <LI><A HREF="http://www.ncr.com/">NCR's MP RAS Unix</A> 2.xx and |
| 3.xx both have FIN_WAIT_2 timeouts. In 2.xx it is non-tunable |
| at 600 seconds, while in 3.xx it defaults to 600 seconds and |
| is calculated based on the tunable "max keep alive probes" |
| (default of 8) multiplied by the "keep alive interval" (default |
| 75 seconds). |
| <LI><A HREF="http://www.sequent.com">Squent's ptx/TCP/IP for |
| DYNIX/ptx</A> has had a FIN_WAIT_2 timeout since around |
| release 4.1 in mid-1994. |
| </UL> |
| <P> |
| The following systems are known to not have a timeout: |
| <P> |
| <UL> |
| <LI><A HREF="http://www.sun.com/">SunOS 4.x</A> does not and |
| almost certainly never will have one because it as at the |
| very end of its development cycle for Sun. If you have kernel |
| source should be easy to patch. |
| </UL> |
| <P> |
| There is a |
| <A HREF="http://www.apache.org/dist/contrib/patches/1.2/fin_wait_2.patch"> |
| patch available</A> for adding a timeout to the FIN_WAIT_2 state; it |
| was originally intended for BSD/OS, but should be adaptable to most |
| systems using BSD networking code. You need kernel source code to be |
| able to use it. If you do adapt it to work for any other systems, |
| please drop me a note at <A HREF="mailto:marc@apache.org">marc@apache.org</A>. |
| <P> |
| <H3>Compile without using <CODE>lingering_close()</CODE></H3> |
| |
| It is possible to compile Apache 1.2 without using the |
| <CODE>lingering_close()</CODE> function. This will result in that |
| section of code being similar to that which was in 1.1. If you do |
| this, be aware that it can cause problems with PUTs, POSTs and |
| persistent connections, especially if the client uses pipelining. |
| That said, it is no worse than on 1.1, and we understand that keeping your |
| server running is quite important.<P> |
| |
| To compile without the <CODE>lingering_close()</CODE> function, add |
| <CODE>-DNO_LINGCLOSE</CODE> to the end of the |
| <CODE>EXTRA_CFLAGS</CODE> line in your <CODE>Configuration</CODE> file, |
| rerun <CODE>Configure</CODE> and rebuild the server. |
| <P> |
| <H3>Use <CODE>SO_LINGER</CODE> as an alternative to |
| <CODE>lingering_close()</CODE></H3> |
| |
| On most systems, there is an option called <CODE>SO_LINGER</CODE> that |
| can be set with <CODE>setsockopt(2)</CODE>. It does something very |
| similar to <CODE>lingering_close()</CODE>, except that it is broken |
| on many systems so that it causes far more problems than |
| <CODE>lingering_close</CODE>. On some systems, it could possibly work |
| better so it may be worth a try if you have no other alternatives. <P> |
| |
| To try it, add <CODE>-DUSE_SO_LINGER -DNO_LINGCLOSE</CODE> to the end of the |
| <CODE>EXTRA_CFLAGS</CODE> line in your <CODE>Configuration</CODE> |
| file, rerun <CODE>Configure</CODE> and rebuild the server. <P> |
| |
| <STRONG>NOTE:</STRONG> Attempting to use <CODE>SO_LINGER</CODE> and |
| <CODE>lingering_close()</CODE> at the same time is very likely to do |
| very bad things, so don't.<P> |
| |
| <H3>Increase the amount of memory used for storing connection state</H3> |
| <DL> |
| <DT>BSD based networking code: |
| <DD>BSD stores network data, such as connection states, |
| in something called an mbuf. When you get so many connections |
| that the kernel does not have enough mbufs to put them all in, your |
| kernel will likely crash. You can reduce the effects of the problem |
| by increasing the number of mbufs that are available; this will not |
| prevent the problem, it will just make the server go longer before |
| crashing.<P> |
| |
| The exact way to increase them may depend on your OS; look |
| for some reference to the number of "mbufs" or "mbuf clusters". On |
| many systems, this can be done by adding the line |
| <CODE>NMBCLUSTERS="n"</CODE>, where <CODE>n</CODE> is the number of |
| mbuf clusters you want to your kernel config file and rebuilding your |
| kernel.<P> |
| </DL> |
| |
| <H3>Disable KeepAlive</H3> |
| <P>If you are unable to do any of the above then you should, as a last |
| resort, disable KeepAlive. Edit your httpd.conf and change "KeepAlive On" |
| to "KeepAlive Off". |
| |
| <H2><LI>Feedback</H2> |
| |
| If you have any information to add to this page, please contact me at |
| <A HREF="mailto:marc@apache.org">marc@apache.org</A>.<P> |
| |
| <H2><A NAME="appendix"><LI>Appendix</A></H2> |
| <P> |
| Below is a message from Roy Fielding, one of the authors of HTTP/1.1. |
| |
| <H3>Why the lingering close functionality is necessary with HTTP</H3> |
| |
| The need for a server to linger on a socket after a close is noted a couple |
| times in the HTTP specs, but not explained. This explanation is based on |
| discussions between myself, Henrik Frystyk, Robert S. Thau, Dave Raggett, |
| and John C. Mallery in the hallways of MIT while I was at W3C.<P> |
| |
| If a server closes the input side of the connection while the client |
| is sending data (or is planning to send data), then the server's TCP |
| stack will signal an RST (reset) back to the client. Upon |
| receipt of the RST, the client will flush its own incoming TCP buffer |
| back to the un-ACKed packet indicated by the RST packet argument. |
| If the server has sent a message, usually an error response, to the |
| client just before the close, and the client receives the RST packet |
| before its application code has read the error message from its incoming |
| TCP buffer and before the server has received the ACK sent by the client |
| upon receipt of that buffer, then the RST will flush the error message |
| before the client application has a chance to see it. The result is |
| that the client is left thinking that the connection failed for no |
| apparent reason.<P> |
| |
| There are two conditions under which this is likely to occur: |
| <OL> |
| <LI>sending POST or PUT data without proper authorization |
| <LI>sending multiple requests before each response (pipelining) |
| and one of the middle requests resulting in an error or |
| other break-the-connection result. |
| </OL> |
| <P> |
| The solution in all cases is to send the response, close only the |
| write half of the connection (what shutdown is supposed to do), and |
| continue reading on the socket until it is either closed by the |
| client (signifying it has finally read the response) or a timeout occurs. |
| That is what the kernel is supposed to do if SO_LINGER is set. |
| Unfortunately, SO_LINGER has no effect on some systems; on some other |
| systems, it does not have its own timeout and thus the TCP memory |
| segments just pile-up until the next reboot (planned or not).<P> |
| |
| Please note that simply removing the linger code will not solve the |
| problem -- it only moves it to a different and much harder one to detect. |
| </OL> |
| <!--#include virtual="footer.html" --> |
| </BODY> |
| </HTML> |