Idle but half-dead TCP connections will eventually time out. It can take hours, ...

dap · on Aug 18, 2016

> Idle but half-dead TCP connections will eventually time out. It can take hours, though, on some systems.

This is not true, in general. I think you're describing connections which have TCP KeepAlive enabled on them.

colanderman · on Aug 19, 2016

I find that usually, long-lived idle TCP connections get killed by stateful firewalls, of which there are often several along any given path through the Internet.

e.g. my home router has a connection timeout of 24 hours.

noselasd · on Aug 19, 2016

There's quite a number of stateful firewalls that just silently drop the connection with out sending a RST though, meaning TCP connections are idling forever, if it does not employ any tcp or application level keep-alives or timeouts.

noselasd · on Aug 19, 2016

There's quite a number of stateful firewalls that just silently drop the connection with out sending a RST though, meaning TCP connections are idling forever, if it does not employ any tcp or application level keep-alives.

Animats · on Aug 18, 2016

See RFC 763, page 77, "User Timeout". This is a bit ambiguous, and there's an attempt to clarify it in RFC 5482.

[1] https://tools.ietf.org/html/rfc0793 [2] https://tools.ietf.org/html/rfc5482

dap · on Aug 18, 2016

From RFD 5482's abstract:

> The TCP user timeout controls how long transmitted data may remain > unacknowledged before a connection is forcefully closed.

As I understand it, this only applies if there is data outstanding. In the puzzler, there was no data outstanding. You're right that if there had been, the side with data outstanding would eventually notice the problem and terminate the connection. The default timeout on most systems I've seen is 5-8 minutes.

By contrast, the previous article you linked was about KeepAlive, which will always eventually detect this condition, but by default usually not for at least two hours.

bluejekyll · on Aug 19, 2016

Correct. Connection read/write timeouts only apply when, wait for it, reading or writing.

All TCP keep alive does is send a packet every so often, which is actually something which is rarely actually set.

jdswain · on Aug 19, 2016

I'm battling this problem at the movement in a server I wrote (although I'm using a C++ framework above the socket interface which adds to the complexity). The problem only occurs on linux so wasn't detected for a while, and only with one client (running an embedded RTOS). My server ends up stuck in CLOSE_WAIT and therefore wastes a responder thread, and eventually this reaches the limit and the server stops responding completely. It's a really difficult one to debug as it takes about 3 minutes to cause the problem to occur. It's easy enough to see what is going on at the TCP level, but it gets more complex to try and resolve this, as the various software layers add further abstractions above this. It's one thing to see the TCP messages, but another to try and understand this at a higher code level. The CLOSE_WAIT state does appear to timeout but not for a very long time, too long in this case.

toast0 · on Aug 19, 2016

If you're stuck in CLOSE_WAIT, it's a bug in your software: You've received a fin and need to close the socket if you're done with it.

The socket should be marked ready for reading, but when you try to read you'll get zero bytes back: Something in your framework may not realize that -- truss/strace the process and I'd guess you'll see a 0 byte read followed by not closing it; alternatively you may not be polling the socket for read availability?

Some things would change if you intended for the socket to be half closed, but I don't think you do?

dboreham · on Aug 18, 2016

Depends if TCP keepalives are enabled, but if the connection goes through a NAT gateway, that will certainly have a tracking state timeout. Usually it is on the order of at least hours though, sometimes days.

andrewstuart2 · on Aug 19, 2016

setsockopt [1], though, lets you change the timeout on a per-socket basis via the SO_RCVTIMEO and SO_SNDTIMEO options.

If you need to know sooner that your data isn't going to be sent, it's pretty trivial to set up a short timeout that overrides the system defaults.

[1] https://www.freebsd.org/cgi/man.cgi?query=setsockopt&sektion...

caf · on Aug 19, 2016

SO_RCVTIMEO and SO_SNDTIMEO are for setting a timeout on blocking socket operations. They don't tell you anything about whether the other end is still there or not.

Setting those options isn't much different from setting a timeout in your poll() call.