Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Mar 1997 19:42:03 -0600
From:      Chris Csanady <ccsanady@nyx.pr.mcs.net>
To:        hackers@FreeBSD.ORG
Subject:   Re: Solaris TPC-C benchmarks (with Oracle) 
Message-ID:  <199703130142.TAA11137@nyx.pr.mcs.net>
In-Reply-To: Your message of Wed, 12 Mar 1997 17:49:48 -0500. <199703122249.RAA21336@jenolan.caipgeneral> 

next in thread | previous in thread | raw e-mail | index | archive | help

[stuff about benchmarks deleted]

>
>As I've conversed with David Greenwald, you are not going to come
>close to touching Solaris on such things until:
>
>	1) The BSD timer code is fixed, walking the entire TCP socket
>	   list 7 times per second, and atleast once per connect/bind
>	   is a serious bottle neck.
>
>	2) Select() is planed down to it's absolute minimum, the best
>	   scheme I've seen is to make the kernel implementation and
>	   all the back ends do Svr4 poll(), and implemenet select()
>	   (so you can still provide that interface) in terms of
>	   poll().  It is the only way to kill this bottleneck that I
>	   know of.

Perhaps someone should look into integrating the NetBSD poll?

>	3) Something intelligent is done with TIME_WAIT'ers.  Anything
>	   4.4 derived falls apart at around 1200 or so time wait
>	   connections.
>
>	4) Improve the header prediction code, see jacobsons one liner
>	   comparisons for how it should be done.

For those who would actually like to look at this, here is the url...
"http://www.noao.edu/~rstevens/vanj.93sep07.txt"  It is a description of
how our BSD net code should look. :)  As soon as I am able to, I'm going
to start working on implementing whats talked about here.

For starters, I'd like to get rid of the usage of mbuf chains.  This is mostly
a simple, if time consuming task.  (i think)  It will save a bunch of copying 
around the net code, as well as simplifying things.  The only part I'm not
really sure about is how to do memory management with the new "pbuf's."  I
looked at the linux code, and they call their generic kmalloc() to allocate a
buffer the size of the packet.  This would be easier, but I dont like it. :)
In Van Jacobsons slides from his talk, he mentions that routines call the
output driver to get packet buffers (pbufs), not a generic allocator.

The two things I want to know, are what exactly should our pbufs look like,
and how will they be allocated.

The idea that I had in my head was...  that the driver would initially
allocate a block of memory, divide it into segments just larger than the
interfaces maximum mtu, call them pbuf's, link them all together into a free 
list, and give them away when requested.  Is this ok?  Also, where exactly
should the code go?

If someone could help me out on this a bit, I think I could get it done.  And
after I'm more familiar with all of the code, I'd like to attack some of the
other problems.

>
>	5) All the bulk is pulled out of soreceive() and sosend().  I
>	   am seriously considering attaching tcp_sendmsg() into the
>	   socket read/write operations when a connection hits
>	   TCP_ESTABLISHED to avoid all of the slow conditionalized
>	   code.

In the new implementation, sorecieve, and sosend go away. :)

>There are more, those are just off the top of my head.  Solaris's tcp
>scales nicely (until a certain extremely high rate of incoming
>connections gets hit) on SMP as well (because writers are in the hash
>tables often enough that readers begin going to sleep on the hash rw
>lock).

The new architecture also seems as if it would scale nicely with SMP.  This
is also one of the reasons im interested in doing it.

For an idea of the other changes that Van Jacobson made, take a look at 
"ftp://ftp.ee.lbl.gov/talks/vj-nws93-1.ps.Z"

Is there anyone else interested in this as well?

--Chris Csanady

>I think TPC costs like $1200.00, but all of the networking end things
>it does test can be tuned just as effectively using Webstone.  The
>networking you will find is the first order problem, once you have
>that planed down, then worry about making sure the vm/vfs code can
>keep up with the transaction rates the net code can actually deliver
>and sustain.
>
>---------------------------------------------////
>Yow! 11.26 MB/s remote host TCP bandwidth & ////
>199 usec remote TCP latency over 100Mb/s   ////
>ethernet.  Beat that!                     ////
>-----------------------------------------////__________  o
>David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703130142.TAA11137>