Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 30 Nov 2014 21:59:18 +0300
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        Alfred Perlstein <alfred@freebsd.org>
Cc:        "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, Adrian Chadd <adrian@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>
Subject:   Re: svn commit: r275326 - in head: sys/dev/cxgbe/tom sys/kern sys/netinet sys/sys usr.bin/bluetooth/btsockstat usr.bin/netstat usr.bin/systat
Message-ID:  <20141130185918.GB47144@FreeBSD.org>
In-Reply-To: <547B5DB1.2030003@freebsd.org>
References:  <201411301252.sAUCqYXm055601@svn.freebsd.org> <CAJ-VmoniCHqV7FO97RWa%2BJxhWmSfy09LPY3gZW%2BQ9wwJfL74JA@mail.gmail.com> <547B5DB1.2030003@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
  Alfred,

On Sun, Nov 30, 2014 at 10:10:57AM -0800, Alfred Perlstein wrote:
A> Splitting this into the mbuf layer adds a huge level of complexity where 
A> again, there are already completion paths in the socket layer to do 
A> this.  I am completely confused as to why this couldn't just be done 
A> with the socket callback system already in place.  Very open to being 
A> educated on this!

As said in September, I can't understand how socket buffer upcall system
can be used here. It does the opposite: wakes up something in kernel when
data arrives to socket. So I am also very open to being explained on
how can I apply it here.

A> The concept of "not filled mbufs" in a socket buffer seems absolutely 
A> wrong at a glance, I'm sure with some better explanation this would all 
A> make sense, but really am still not convinced this is at all the right 
A> way to go on this.

I'll put a longer explanation in the end of this email.

A> Does any other OS do this for any reason?  Or is this just a short 
A> sighted hack for an experiment in sendfile?

Well, we also have plans to put TLS into kernel, that would use them
as well. :) There might be more consumers.

Note, that sf_bufs were initially a "just a short sighted hack" for
sendfile(2), and now are used in several places in kernel.

A> I am really trying very hard to rationalize this change, so I will ask, 
A> is there something about keeping TCP windows open that you are hoping to 
A> accomplish that you can not otherwise do without sb_ccc and sb_acc?  If 
A> not then why is all this stuff being stuffed into mbufs as opposed to 
A> using callbacks?  It really seems wrong, my thoughts are "this is like 
A> kse for mbufs" something done with good intentions, but is complex and 
A> will have to be ripped out later.  Am I wrong here?

No, this has nothing to do with keeping TCP windows.

Here is longer explanation: the new sendfile(2) is going to be
non-blocking on disk. That means, that syscalls returns back to
application without completing I/O, immediately. Application can
do its work further. It can write(2) to the socket as well. Or run
another sendfile(2) on the same socket. Probably now you see the
problem. If the non-blocking sendfile doesn't put a placeholder
for its data into the socket buffer, then data in the socket buffer
is going to be mixed randomly.

Please note, that I don't move anything to the mbuf layer, as you
claim it. Neither mbuf.h, not kern_mbuf.c, uipc_mbuf.c are modified.
This is a new feature held internally in the socket buffer code.

Yes, the sweep of changing sb_cc to sbavail() and sbused() was
large. This is the problem of socket buffers being exposed to the
stack and stack lurking in the structure. If decades ago socket
code was developed more self-contained, then no sweep would be
needed. As I already noted in the commit message, my opinion is
that socket buffers need to be made protocol dependent and more
opaque. The SCTP code taught me that. As for TCP/UDP, right now
our socket buffer structure supports SOCK_STREAM and SOCK_DGRAM
but this is achieved through code complication, and I see no good
reason to keep it so generic. Original BSD soreceive() functions
was a hell, before it was divorced to soreceive_stream() and
soreceive_dgram(). Splitting the sockbuf to two different types
would finish that.

-- 
Totus tuus, Glebius.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141130185918.GB47144>