Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 05 Sep 2001 02:20:10 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Mike Silbersack <silby@silby.com>
Cc:        "Vladimir A. Jakovenko" <vovik@lucky.net>, freebsd-net@freebsd.org, freebsd-hackers@freebsd.org
Subject:   Re: SO_REUSEPORT on unicast UDP sockets
Message-ID:  <3B95EE4A.EF204095@mindspring.com>
References:  <20010904231049.E7815-100000@achilles.silby.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Mike Silbersack wrote:
> > Similarly, there are a number of bugs in the TCP sockets as
> > well; specifically, there's a problem with all sockets being
> > treated as being in the same collision domain, when doing
> > automatic port assignment.  This limits you to 65535 oubound
> > TCP connections, even though you have multiple IP aliases on
> > an interface (theoretically, you should get 64k connections
> > per IP address, if you bind _not_ to IN_ADDR_ANY, but instead
> > use a specific port, but the hash is broken).
> 
> I like this problem's evil sibling: client side TIME_WAITs.  If
> you build them up, you just sit there unable to allocate outgoing
> ports until they time out.

If you fix or workaround the source IP address problem, and
patch/tune the kernel for enough outbound sockets, you can
go to 250,000 outbound connections very easily.  I used a
couple of 1GB memory systems in this configuration to get my
1M (actually, closer to 2M) inbound server connections...
obviously, a server doesn't have the port limitation, when
it comes to accepting connections.

The client TIME_WAIT problem is more an issue for port reuse;
for a 2MSL timer in the standard 60 second range, this will
basically limit you to 65535/60, or ~1000 outbound connections
a second per IP address, as a sustained rate, with a total
outstanding count of 65535 * IP_address_count.

Unless you set SO_REUSEPORT/SO_REUSEADDR.

So for the client side, you are pretty much limited by the
bug (or your fix), and whatever you set the 2MSL timer down
to, as a sustained rate top end.

For most real world uses, apart from test equipment, which
will usually just use raw sockets directly, and fake the
AYN/ACK for the SYN, and then accept the ACK without an RST,
you never really get up into this number of client connections
on a single box.


> Maybe net or openbsd handle these situations better, I'll have
> to check later.

I doubt it.  Until I did testing on 4.3, no one had really
run over 32,766 open sockets in a production server, since at
that point, the ucred reference count overflowed, which would
result in some strange and very hard to identify crashes, when
closing those connections.

Alfred fixed this in -current, but it wasn't done consciously
to address a known problem, it was done "just in case" (Alfred
finds problems like that, and fixes them without necessarily
being aware of it... 8-)).  It hadn't been MFC'ed back to 4.3
until I identified an actual problem, and the root cause.

NetBSD and OpenBSD have some hacks on the server side of the
scaling problem (e.g. they have each implemented a SYN cache,
which is OK as far as it goes, but is really inferior to the
SYN cache and rate halving algorithm code (also against FreeBSD)
out of the Pittsburgh Supercomputing Center.

I've done a preliminary port of the PSC code to 4.x, actually,
though I would need to strip out a number of local changes.

One interesting thing about the SYN cache code is that it could
use the tcptmpl allocation until it saw the ACK (or even the
first data, as was suggested by some of the researchers at that
startup in India, a while back, though that's very aggressive).

Mostly, you aren't going to see the hashing on both source and
detination IP's and ports -- what you'd see in an L2/L3 switch,
if you were building one -- which would let you reuse the local
pair, so long as it was associated with a different remote pair.

That's probably the real long term fix, if there is one.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3B95EE4A.EF204095>