Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Sep 2006 14:26:50 +0200
From:      Andre Oppermann <andre@freebsd.org>
To:        Gleb Smirnoff <glebius@FreeBSD.org>, Andre Oppermann <andre@freebsd.org>, freebsd-current@freebsd.org, freebsd-net@freebsd.org, alc@freebsd.org,  tegge@freebsd.org,  gallatin@cs.duke.edu
Subject:   Re: Much improved sendfile(2) kernel implementation
Message-ID:  <4512850A.5000107@freebsd.org>
In-Reply-To: <20060921114431.GF27667@FreeBSD.org>
References:  <4511B9B1.2000903@freebsd.org> <20060921114431.GF27667@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Gleb Smirnoff wrote:
>   Andre,
> 
> On Wed, Sep 20, 2006 at 11:59:13PM +0200, Andre Oppermann wrote:
> A> I have rewritten kern_sendfile() to work in two loops, the inner which turns
> A> as many pages into mbufs as it can up to the free send socket buffer space.
> A> The outer loop then drops the whole mbuf chain into the send socket buffer,
> A> calls tcp_output() on it and then waits until 50% of the socket buffer are
> A> free again to repeat the cycle.  This way tcp_output() gets the full amount
> A> of data to work with and can issue up to 64K sends for TSO to chop up in the
> A> network adapter without using any CPU cycles.  Thus it gets very efficient
> A> especially with the readahead the VM and I/O system do.
> A> 
> A> The patch is available here:
> A>  http://people.freebsd.org/~andre/sendfile-20060920.diff
> 
> I see that mbuf allocations are done in different way depending on the SS_NBIO
> flag on socket. This isn't true for the current code, so your patch isn't
> only opmization, but is also changing the semantics of the syscall. This should
> be avoided in single change/commit.

IMO this goes hand in hand.

> The non-blocking IO definition is related to the blocking on the socket buffer.
> It isn't related to blocking on memory, so this change is incorrect:
> 
> -			m_header = m_uiotombuf(hdr_uio, M_DONTWAIT, 0, 0);
> -			if (m_header == NULL)
> +			m = m_uiotombuf(hdr_uio, (nbio ? M_NOWAIT : M_WAITOK),
> +			    0, 0);
> +			if (m == NULL) {
> +				error = nbio ? EAGAIN : ENOBUFS;
> 
> There should be unconditional M_NOWAIT. Oops, the M_DONTWAIT in the current
> code is incorrect. It is present since rev. 1.171. If the m_uiotombuf() fails
> the current code returns from syscall without error! Before rev. 1.171, there
> wasn't m_uiotombuf(), the mbuf header was allocated below, with correct wait
> argument.
> 
> The wait argument for m_uiotombuf() should be changed to M_WAITOK, but in
> a separate commit.
> 
> And this one:
> 
> +			m0 = m_get((nbio ? M_NOWAIT : M_WAITOK), MT_DATA);
> +			if (m0 == NULL) {
> +				error = (nbio ? EAGAIN : ENOBUFS);
> +				sf_buf_mext((void *)sf_buf_kva(sf), sf);
> +				break;
> +			}
> 
> This one should be M_WAITOK always. It is M_TRYWAIT (equal to M_WAITOK) in
> the current code.

The reason why I changed the mbuf allocations with SS_NBIO is the rationale
of sendfile() and the performance evaluation that was done by alc@ students.
sendfile() has two flags which control its blocking behavior.  Non blocking
socket (SS_NBIO) and SF_NODISKIO.  The latter is necessary because file
reads or writes are normally not considered to be blocking.  The most optimal
sendfile() is usage is with a single process doing accept(), parsing and
then sendfile that should never ever block on anything.  This way the main
process then can use kqueue for all the socket stuff and it can transfer
all sends that require disk I/O to a child process or thread to provide a
context for the read.  Meanwhile the main process is free to accept further
connections and to continue serving existing connections.  Having sendfile()
block in mbuf allocation for the header, on sfbufs or anything else is not
desirable and must be avoided.  I know I'm extending the traditional definition
of SS_NBIO a bit but it's fully in line with the semantics and desired
operational behavior of sendfile().  The paper by alc@'s students clearly
identifies this as the main property of a sendfile implementation besides
its zero copy nature.

> Is there RELENG_6 version of your patch. If there is one, we can test it
> quite extensively.

This patch does not fully apply to RELENG_6 as there are some small changes
and a bit locking differences.  It shouldn't be too hard to adapt it.  You
won't get TSO speedups though and we can't MFC TSO because it changes the
ABI+API for network drivers.

-- 
Andre



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4512850A.5000107>