Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Nov 2007 09:53:48 +0100
From:      Andre Oppermann <andre@freebsd.org>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        Kip Macy <kip.macy@gmail.com>, Perforce Change Reviews <perforce@freebsd.org>, Kip Macy <kmacy@freebsd.org>
Subject:   Re: PERFORCE change 129544 for review
Message-ID:  <474E7E1C.3030907@freebsd.org>
In-Reply-To: <20071129075148.X7555@fledge.watson.org>
References:  <200711260527.lAQ5RNSw090238@repoman.freebsd.org> <20071126115044.J65286@fledge.watson.org> <b1fa29170711282115i57ffe985qbebc7c5f4a663154@mail.gmail.com> <20071129075148.X7555@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote:
> 
> On Wed, 28 Nov 2007, Kip Macy wrote:
> 
>> I agree that making it toe specific is somewhat misleading. I actually 
>> think I'll be able to fix my code so that it can cope with data being 
>> added to the end of a cluster or mbuf that has already been 
>> transmitted. If so, I won't need to pull this along when I bring TOE 
>> support into CVS.
>>
>> Thanks for the feedback.
> 
> I'm not as familiar with the transmit side of the socket buffer side -- 
> at least not anymore -- but on the receive side we make certain strong 
> guarantees about not replacing existing mbufs and clusters, especially 
> at the head of the socket buffer queue.  I think the requirement for 
> that in 7/8 may have changed because of the rewritten soreceive() code, 
> but it used to be that soreceive() expected the value of sb_mb never to 
> go from one non-NULL value to another non-NULL value as long as the 
> sb_sx lock (or its predecessor) was held, even though sb_mtx had been 
> released.  This was so that the mbuf could be left in the socket buffer 
> during copyout() and related receive activities, so that if there was a 
> short read, error, etc, mbufs weren't being re-inserted at the head of 
> the queue.  That type of invariant has historically been undocumented, 
> but it could be that similar invariants exist in the compaction code for 
> transmit and can be documented, enforced, and possibly even relied upon. 
> :-)

On the TX side we don't append data *into* existing mbufs to
protect ongoing DMA transfers.  Appends happen to the mbuf chain
(m_next).  A number of small writes will consume one mbuf each.

On the RX side we compress the mbufs at the tail (sbcompress) to
prevent external exhaustion attacks.  I've written a special version
of soreceive_stream that pulls as many mbufs from the head of the
socket buffer as the user has specified space in iovecs.  Those mbuf
are removed from the queue.  The lock then was dropped and the copyout
performed on the whole chain in one go.  This gave significant speedups
at high receive speeds.  However at the expense of a fatal race
condition when copyout failed and the socket went away.  Then the
resulting prepend would horribly crash.  Haven't studied the new
socket locks and locking model yet in detail.  Perhaps this can now
be implemented in a safe way.

-- 
Andre




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?474E7E1C.3030907>