Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Feb 2013 15:21:15 -0600
From:      Kevin Day <kevin@your.org>
To:        freebsd-net@freebsd.org
Subject:   Syncookies break with Windows 8
Message-ID:  <CA61E725-8370-4ED2-BBA7-F6FAFF93A553@your.org>

next in thread | raw e-mail | index | archive | help
We've got a large cluster of HTTP servers, each server handling =
>10,000req/sec. Occasionally, and during periods of heavy load, we'd get =
complaints from some users that downloads were working but going =
EXTREMELY slowly. After a whole lot of debugging, we narrowed it down to =
being only Windows 8 clients experiencing this problem. It turns out =
that FreeBSD's implementation of syncookies is likely violating RFC1323.

When syncookies kicks in, either because the syncache limit is reached =
or net.inet.tcp.syncookies_only is set, some shortcuts are taken with =
regard to TCP connections. Unlike some other syncookies implementations =
which (ab)use timestamps to store options, the FreeBSD implementation of =
syncookies discards TCP options such as window scaling. In itself this =
isn't a bad thing, but it becomes a bad thing because we then lie and =
pretend that we are supporting window scaling.

According to RFC1323, if you want to use TCP window scaling, the client =
says so on the initial SYN. If the server is also willing to use =
scaling, it says so on the SYN/ACK. If both parties included a scaling =
option on their respective SYN, you assume window scaling is working and =
proceed to use it. If one or both parties don't have a scaling option, =
you don't scale at all. The problem here is that with syncookies, we =
don't save the wscale parameter from the client's SYN, but offer to use =
window scaling anyway on our SYN/ACK, so the client thinks we =
successfully negotiated window scaling even though we haven't.


This is how a normal window scaled connection happens:

client > server: Flags [S], win 65535, options [mss 1460,nop,wscale =
4,nop,nop,sackOK], length 0
(client is connecting, offering a window of 64K, but if scaling is =
negotiated wants to scale future window sizes by 4 bits)

server > client: Flags [S.], win 65535, options [mss 1460,nop,wscale =
5,sackOK,eol], length 0
(server is ACKing the client's SYN, also offering an unscaled window of =
64K, but wanting to shift by 5 going forward)

The server and client both offered window scaling, so they're now using =
it from this point on. All window sizes sent/received are shifted by the =
appropriate number of bits.


When syncookies kicks in on the server, and the client is anything BUT =
Windows 8, this happens:

client > server: Flags [S], win 65535, options [mss 1460,nop,wscale =
4,nop,nop,sackOK], length 0
However, syncookies cause the options to get lost. The client sent the =
"wscale 4" parameter, but we immediately forgot it.

server > client: Flags [S.], win 65535, options [mss 1460,nop,wscale =
5,sackOK,eol], length 0
(server is ACKing the client's SYN, also offering an unscaled window of =
64K, but wanting to shift by 5 going forward)

The server sent a wscale back on its SYN/ACK, so the client thinks =
window scaling is now in effect. But it's not, the server didn't =
remember the client's wscale option, so it's not scaling any of the =
received window sizes that are coming in from the client. This doesn't =
actually hurt much. The client thinks it's telling us it has a 1MB =
window open, but we're only hearing that it's sent a 64K window, so =
that's all we ever use. It's "failing safe" here, and nothing actually =
breaks.


Now throw Windows 8 into the mix. Windows 8's TCP auto tuning is much =
more aggressive than previous versions of Windows. I honestly can't tell =
if this is a bug or intentional design, but Windows will sometimes, =
intermittently, advertise a much much larger wscale option than it =
actually needs. This is a mild example of what happens:

client > server: Flags [S], win 8192, options [mss 1460,nop,wscale =
8,nop,nop,sackOK], length 0
(client is connecting, offering an unscaled window of 8192 bytes, but =
wants to negotiate window scaling of 8 bits if the server will accept =
it)

server > client: Flags [S.], win 65535, options [mss 1460,nop,wscale =
5,sackOK,eol], length 0
(server is ACKing the client's SYN, also offering an unscaled window of =
64K, but wanting to shift by 5 going forward)

We're at the same point here as in the above example, the client now =
believes we've successfully negotiated window scaling, but on the server =
side we're treating all window sizes coming from the client as being =
shifted by 0. So the client sends it's first ACK:

client > server: Flags [.], seq 1, ack 1, win 256, length 0

The client believes we're still scaling everything it says by 8 bits, =
but it only wants to give us a 64K window, so it's saying 256 here. =
(256<<8 =3D 65536). We don't remember that we agreed to shift everything =
by 8, so we treat that as just 256. The connection now proceeds, but we =
think we can only send 256 bytes at a time. It is extremely slow.

I have seen Windows 8 attempt to use wscale parameters of 8 all way up =
to 10. While I've only caught a few cases of this happening in the wild, =
when it's using 10 we end up thinking we only have a 64 byte window and =
things get really silly really fast.


I've been talking with someone on Microsoft's side of things about why =
Windows is choosing to do this. But my own view of this is that if =
syncookies are being used in their current state (we lose the client's =
wscale option), we can't advertise wscale on the SYN/ACK. My reading of =
RFC1323 says that if we put a wscale option in our SYN/ACK that means we =
agreed to use the client's wscale in their SYN. I don't think that's =
correct. If syncookies are being used, we should advertise MIN(sb_max, =
TCP_MAXWIN) with no scaling and stay within the RFC.

This doesn't affect Linux because it uses timestamp options to stuff the =
client's wscale, so it gets re-learned on the ACK. OpenBSD and OS X =
don't have syncookies. NetBSD seems to have the same problem if it's new =
syncookie implementation gets turned on.=20

Any thoughts? Was there a reason why we're forcing the use of wscale on =
syncookie connections?

-- Kevin




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA61E725-8370-4ED2-BBA7-F6FAFF93A553>