Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 May 2003 02:53:26 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Igor Sysoev <is@rambler-co.ru>
Cc:        arch@freebsd.org
Subject:   Re: sendfile(2) SF_NOPUSH flag proposal
Message-ID:  <3ED72A16.9CACD4C5@mindspring.com>
References:  <Pine.BSF.4.21.0305301302540.58337-100000@is>

next in thread | previous in thread | raw e-mail | index | archive | help
Igor Sysoev wrote:
> On Fri, 30 May 2003, Terry Lambert wrote:
> > Or you could just fix sendfile.  8-).
> 
> I'm going to fix it as Matthew Dillon suggested if no one else is going to
> do it in the near future.

I'm pretty sure it will deadlock on boundary conditions, but
Matt has confidence it won't; I looked at the code he pointed
to in -stable in -current, and I'm not so sure I agree, but
I'm willing to be wrong.  If it fixes the problem for you,
and doesn't deadlock, then more power to you.  I would ask
that you test with files sizes in 1 byte increments, up to
32769 bytes, with headers of 0 bytes and 300 bytes for your
test cases, so that the boundary that I'm worried about ends
up getting exercised.


> > > By the way what's about kqueue(2) ?  Are you not confused that NetBSD
> > > does not support EVFILT_AIO and OpenBSD does not support EVFILT_AIO and
> > > EVFILT_TIMER ?  Does this mean that FreeBSD should not introduce any
> > > new kqueue filters or flags ?
> >
> > These are incredibly trivial to support.  I estimate the work
> > at an hour each, including writing a unit test.  It took me
> > about an hour to write the SystemV IPC Message Queue KNOTE()
> > code for FreeBSD.
> 
> Nevetheless there's no support for EVFILT_AIO and EVFILT_TIMER.
> By the way I do not think that EVFILT_AIO is a trivial thing.
> Actually it requires at least the working AIO enviroment in the kernel.

This is really a tangent again; however, I would point out that
aio can be implemented in the context of sceduler activations
and a spawned AIO kernel thread per request (the alternative is
to implement it entirely in user space, and then implement a
loopback "send" mechanism for the KNOTE()'s).  So implementing
aio is probably a 20 hour task (1/2 a man-week).  More work,
but still all doable in a weeks time or less.

In general, most of the things you are pointing at, including
the sendfile problem, don't take a lot of thinking to fix, only
the grunt-work to actually crank out the code.


> Now we have more portable kqueue() that exists in FreeBSD, NetBSD, and OpenBSD
> (I do not know about Darwin and MacOS X) with the same prototype and
> some unsupported filters. And we have much less portable sendfile() that
> exists in the most modern unices but with the different prototypes and
> functionality.

This illustrates my thesis that interfaces with the same names
tend to converge over time.  Another example is select(), which
Linux initially implemented as updating the timeout struct with
the time which had elapsed; this was divergents, and broke a
lot of code, until they relented and fixed it to defacto standard
behaviour.  I'm confident the same thing will eventually happen
with kqueue/kevent.

The main issue with Linux adoption of kqueue/kevent is that they
claim it's level triggered instead of edge triggered, that they
want events, they don't want conditions raised.

To a small extent, they are right.  But this is trivially
correctable, and needs to be corrected anyway, for EVFILT_PROC
to support a larger numbr of PID's.  Right now, the PID is
OR'ed in with the event, and so is limited to 20 bits.  Another
parameter, a void * (in which the PID value can be cast and
recovered) would be enough to provide additional context.  With
this context, it's possible to arrange a contract between the
user kn_data that was passed in and the filter routine, in
order to copy out arbitrary data, making the event edge rather
than level triggered.

With this single modification, you fix both the 20 bit PID limit
problem and the Linux objection to the adoption of the kevent
interface.

In other words, you increase convergence.  It's natural over
time for visible source bases to converge.

> > It doesn't "read" it, per se: it creates a mapping, and it
> > faults the pages; when they are in core, then they can be
> > sent.
> 
> So what do these lines in /sys/kern/uipc_syscalls.c:sendfile():
> 
> if (!pg->valid || !vm_page_is_valid(pg, pgoff, xfsize)) {
>          ....
>          error = VOP_READ(vp, &auio, IO_VMIO | ((MAXBSIZE / bsize) << 16),
>                           p->p_ucred);
>          ....
> }

That's easy: they mean you aren't looking at version 1.147 of
the file, and that you're looking at RELENG_4, and not -CURRENT
(version 1.65.2.17, or earlier).  You are 82 HEAD revisions
behind the state of the art.

-- Terry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3ED72A16.9CACD4C5>