Idle but half-dead TCP connections will eventually time out. It can take hours, though, on some systems.[1] Early thinking was that if you have a Telnet session open, TCP itself should not time out. The Telnet server might have a timeout, but it wasn't TCP's job to decide how long you get for lunch.
Today most servers have shorter timeouts, mostly as a defense against denial of service attacks. But it's often the HTTP server, not the TCP level, that times out first.
I find that usually, long-lived idle TCP connections get killed by stateful firewalls, of which there are often several along any given path through the Internet.
e.g. my home router has a connection timeout of 24 hours.
There's quite a number of stateful firewalls that just silently drop the connection with out sending a RST though, meaning TCP connections are idling forever, if it does not employ any tcp or application level keep-alives or timeouts.
There's quite a number of stateful firewalls that just silently drop the connection with out sending a RST though, meaning TCP connections are idling forever, if it does not employ any tcp or application level keep-alives.
> The TCP user timeout controls how long transmitted data may remain
> unacknowledged before a connection is forcefully closed.
As I understand it, this only applies if there is data outstanding. In the puzzler, there was no data outstanding. You're right that if there had been, the side with data outstanding would eventually notice the problem and terminate the connection. The default timeout on most systems I've seen is 5-8 minutes.
By contrast, the previous article you linked was about KeepAlive, which will always eventually detect this condition, but by default usually not for at least two hours.
I'm battling this problem at the movement in a server I wrote (although I'm using a C++ framework above the socket interface which adds to the complexity). The problem only occurs on linux so wasn't detected for a while, and only with one client (running an embedded RTOS). My server ends up stuck in CLOSE_WAIT and therefore wastes a responder thread, and eventually this reaches the limit and the server stops responding completely. It's a really difficult one to debug as it takes about 3 minutes to cause the problem to occur. It's easy enough to see what is going on at the TCP level, but it gets more complex to try and resolve this, as the various software layers add further abstractions above this. It's one thing to see the TCP messages, but another to try and understand this at a higher code level. The CLOSE_WAIT state does appear to timeout but not for a very long time, too long in this case.
If you're stuck in CLOSE_WAIT, it's a bug in your software: You've received a fin and need to close the socket if you're done with it.
The socket should be marked ready for reading, but when you try to read you'll get zero bytes back: Something in your framework may not realize that -- truss/strace the process and I'd guess you'll see a 0 byte read followed by not closing it; alternatively you may not be polling the socket for read availability?
Some things would change if you intended for the socket to be half closed, but I don't think you do?
Depends if TCP keepalives are enabled, but if the connection goes through a NAT gateway, that will certainly have a tracking state timeout. Usually it is on the order of at least hours though, sometimes days.
SO_RCVTIMEO and SO_SNDTIMEO are for setting a timeout on blocking socket operations. They don't tell you anything about whether the other end is still there or not.
Setting those options isn't much different from setting a timeout in your poll() call.
Today most servers have shorter timeouts, mostly as a defense against denial of service attacks. But it's often the HTTP server, not the TCP level, that times out first.
[1] https://blogs.technet.microsoft.com/nettracer/2010/06/03/thi...