Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Jun 2000 14:34:37 -0600
From:      "Kenneth D. Merry" <ken@kdm.org>
To:        Alfred Perlstein <bright@wintelcom.net>
Cc:        Jonathan Lemon <jlemon@flugsvamp.com>, arch@FreeBSD.ORG
Subject:   Re: kblob discussion.
Message-ID:  <20000619143437.A80133@panzer.kdm.org>
In-Reply-To: <20000619130911.I26801@fw.wintelcom.net>; from bright@wintelcom.net on Mon, Jun 19, 2000 at 01:09:11PM -0700
References:  <20000619111309.E26801@fw.wintelcom.net> <20000619140029.D37084@prism.flugsvamp.com> <20000619133931.A79532@panzer.kdm.org> <20000619130911.I26801@fw.wintelcom.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 19, 2000 at 13:09:11 -0700, Alfred Perlstein wrote:
> * Kenneth D. Merry <ken@kdm.org> [000619 12:40] wrote:
> > I would like to see something more generic as well, especially something
> > that could work for both sending and receiving.
> 
> As would I.
> 
> > Another thing that I would like to see is the ability to get notification
> > of when the I/O is done, like async I/O.
> 
> That can be easily put into kblob at the expense of some performance,
> however it would require a side allocation per send which negates
> the whole purpose of it as a low overhead extremely fast way to send
> data.

This gets into the generic API versus specific API issue.

> > I think Jonathan's API is a little more generic than kblob, although I
> > share some of Jonathan's reservations about using that much kva.
> 
> I haven't seen Johnathan's API.

I think he posted the gist of it to committers, I'm sure he'll repost it to
-arch.

> > I'd almost like the ability to allocate the buffers in userland, and then
> > map them into the kernel and revoke the userland mapping, so the user
> > process can't get to it while you're doing a send.
> 
> So use sendfile. :)

That only works for files. :)

> > You could do similar things on receive.  A receive side API could work well
> > with something like RDMA.  (Here's a URL:
> > ftp://ftpeng.cisco.com/pub/rdma/draft-csapuntz-tcprdma-00.txt
> > )
> 
> You've got to be kidding if you think I'm going to draft an RFC to add
> bits to the TCP header.

That wasn't the point.  There's already a draft spec, and therefore no need
to write another, unless the draft isn't going to work.

(I'm not saying that RDMA is perfect, I haven't studied it enough to be
able to give it a conclusive thumbs up.  The idea is a good one, though.)

Anyway, what I'm getting at is you could use something like RDMA to DMA
data from the NIC directly into user-supplied buffers.

Another possibility for a buffer-type receive interface is to receive it
into generic kernel memory, and then pass those buffers out in a nice
"package".  That would be similar in effect to the page flipping code in
the zero copy code I posted, except that there wouldn't be a user page to
throw away, and there would be no size and alignment constraints.

> > One problem is that allocating buffers in userland would work well for
> > sends, but not as well for receives.  So maybe both should be allowed?
> 
> Er..
> 
> What's the point of a many to one mapping for a recieve buffer?

I'm not talking about a one to many mapping.  Again, I'd rather see a
generic API, and kblob is a rather specific API.

> The only way to do this would be to allow the kernel's mbuf map to
> be map'd in by a user process, which doesn't make much sense.

Actually, the zero copy receive code that Drew Gallatin wrote (and I posted
on Friday) page flips kernel pages into userland, and recycles the userland
pages.

> > Anyway, those are just some ideas, I think there's some room for
> > discussion here.
> 
> I'd rather not, I've already discussed it a lot and many people
> want this functionality.  The people that want more flexibility
> don't realize that the interface is easily extendable (or don't
> want to extend it themselves), and the people that don't like it
> on principle don't see the need for changes to get real world
> performance increases.

I'd be more in favor of doing it right the first time, rather than
continually revising and extending the interface.

It won't be trivial to do, but I think it should be possible to get great
performance gains without having the limited API that would have to be
later expanded.

> > > I think that any notion of committing this should be held off
> > > until we have a chance to come to some kind of consensus on the
> > > issue.
> > 
> > Another point about the timeframe for deciding and committing both this and
> > the accept filters -- a lot of people are away at Usenix, and last week a
> > number of key developers were at the SMP gathering.  (Some are going from
> > one to the other, and therefore will have been away for more than a week.)
> 
> I was there, many people liked the ideas I'm presenting.
> 
> > I think we should hold off on this stuff until people have a chance to get
> > back from Usenix and comment on it.
> > 
> > I'll be heading to Usenix tomorrow, and if anyone would like to talk about
> > this stuff in person, just let me know.
> 
> I'll be there as well.  Have some code handy that implememnts this
> and we'll discuss it, if not just grabbing a beer would be fun. :)

Believe it or not, I've got code that behaves in more or less the manner
I'm describing.  That's part of the reason I'm interested in this subject
at all.

It is effectively a zero copy receive and send system with async I/O
semantics.  The problem is that it was implemented for a custom hardware
and software environment, and the interface is expedient, not elegant.
Anyway, for those reasons, it will never be released.  The parts of that
work that are generally useful are included in the code I posted on Friday.

But yeah, I'd like to get together and yak about this stuff.

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000619143437.A80133>