Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Apr 2016 13:10:05 +0200
From:      Marko Zec <zec@fer.hr>
To:        "Bjoern A. Zeeb" <bz@freebsd.org>
Cc:        John Baldwin <jhb@freebsd.org>, <src-committers@freebsd.org>, <svn-src-all@freebsd.org>, <svn-src-head@freebsd.org>
Subject:   Re: svn commit: r297742 - head/sys/netinet
Message-ID:  <20160411131005.0f9feac3@x23>
In-Reply-To: <alpine.BSF.2.00.1604091824530.83904@ai.fobar.qr>
References:  <201604091205.u39C5Oks041429@repo.freebsd.org> <5630207.6cr5Ycyqbh@ralph.baldwin.cx> <alpine.BSF.2.00.1604091824530.83904@ai.fobar.qr>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 9 Apr 2016 18:31:24 +0000
"Bjoern A. Zeeb" <bz@freebsd.org> wrote:

> On Sat, 9 Apr 2016, John Baldwin wrote:
> 
> > On Saturday, April 09, 2016 12:05:24 PM Bjoern A. Zeeb wrote:  
> >> Author: bz
> >> Date: Sat Apr  9 12:05:23 2016
> >> New Revision: 297742
> >> URL: https://svnweb.freebsd.org/changeset/base/297742
> >>
> >> Log:
> >>   Mfp: r296310,r296343
> >>
> >>   It looks like as with the safety belt of DELAY() fastened (*) we
> >> can completely tear down and free all memory for TCP (after
> >> r281599).
> >>
> >>   (*) in theory a few ticks should be good enough to make sure the
> >> timers are all really gone. Could we use a better matric here and
> >> check a tcbcb count as an optimization?  
> >
> > In theory, no amount of DELAY() is ever enough to close a
> > theoretical race window.  In practice you might get lucky, but you
> > might also panic and  
> 
> Yes I do understand.  Thus saying a better metric should do the right
> thing.  I am aware of the consequences of removing type stability.
> Should there be reports I'll make sure that DELAY becomes something
> more proper sooner than later.
> 
> 
> > trash user data.  In the rest of the tree, we tend to prefer
> > marking items as NOFREE instead of this approach putting a priority
> > on stability and reliability over memory efficiency.
> >
> > For all of the zones that you removed NOFREE from, do you know why
> > that was added in the first place (e.g. which stale pointers to
> > pcbs could be referenced after free)?  Did you verify that those
> > conditions have been fixed?  
> 
> I did check.  I did check a few years ago (and I think you had
> reviewed that; maybe it was trouble).  And the TCP bits here were
> the last ones that were problematic back then.  With the changes from
> r281599 this should no longer be a problem.
> 
> As for the others, a few years ago Andre already removed the NOFREE
> and we unconditionally made him back the change out, which was a
> mistake as otherwise some of these zones would have been "clean" for
> years.  Others have had KASSERTs ensuring that on VNET stack they were
> actually empty.

Just for the record, another proposal which could have solved this
cleanly years ago but was swiftly rejected short of proper discussion,
was to de-virtualize memory zones used throughout the network stack.

The original thrust behind my decision to virtualize the zones was to
catch any inter-vnet leaks during the initial VNETization efforts.
Today, roughly 10 years after the vnet / vimage changes hit the tree,
during which period no inter-vnet zone leaks have been identified, I
see no reason to keep (most of) the zones separated in multiple
instances.

I've heard arguments that maintaining separate zones may be good for
parallelization due to separation of pcbinfo locks, but such claims
have never been supported by benchmark runs or real-life anecdotal
evidence.  Another argument which was brought up to reject my
proposal, that separate zones provide or could provide inherent
per-vnet resource limiting was also and still is extremely thin,
since the number of items in most networking zones is limited to
maxsockets, which is a shared / global OS limit.

OTOH, it's obvious that maintaining separate zones is not only wasteful
on memory, but likely can contribute to increased D-cache trashing with
traffic spread among multiple vnets.

And finally, as John pointed out, the DELAY(tick) commited here and
deemed a "safety belt" is highly dubious, since sleeping for a single
tick provides no real guarantees against potential races, but rather
works as an umbrella to mask them in some (un)lucky cases.

Nevertheless, I'm glad Bjoern took time to push the vnet framework
further to the point where it could be finally enabled by default, for
which I hope was the main objective behind this sweep, since I find it
absurd that Linux has had vnet-like capability available by default for
years, while we not only pioneered the concept but had it merged in the
main tree years before Linux folks even started their efforts.

Marko



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160411131005.0f9feac3>