Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Mar 2010 00:53:34 -0800
From:      Doug Hardie <bc979@lafn.org>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        stable@freebsd.org, current@freebsd.org
Subject:   Re: Survey results very helpful, thanks! (was: Re: net.inet.tcp.timer_race: does anyone have a non-zero value?)
Message-ID:  <80C9B3BA-C498-419B-BD5E-6C2111F24F64@lafn.org>
In-Reply-To: <alpine.BSF.2.00.1003082020560.96747@fledge.watson.org>
References:  <alpine.BSF.2.00.1003071141050.9729@fledge.watson.org> <alpine.BSF.2.00.1003081450310.23881@fledge.watson.org> <FF1D92A1-89BD-457E-9A6C-089D20E4D175@lafn.org> <alpine.BSF.2.00.1003082020560.96747@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 8 March 2010, at 12:33, Robert Watson wrote:

>=20
> On Mon, 8 Mar 2010, Doug Hardie wrote:
>=20
>> I run a number of 4 core systems with em interfaces.  These are =
production systems that are unmanned and located a long way from me.  =
Under unusual conditions it can take up to 6 hours to get there.  I have =
been waiting to switch to 8.0 because of the discussions on the em =
device and now it sounds like I had better just skip 8.x and wait for 9. =
 7.2 is working just fine.
>=20
> Not sure that any information in this survey thread should be relevant =
to that decision.  This race has existed since before FreeBSD, having =
appeared in the original BSD network stack, and is just as present in =
FreeBSD 7.x as 8.x or 9.x.  When I learned about the race during the =
early 7.x development cycle, I added a counter/statistic to measure how =
much it happened in practice, but was not able to exercise it in my =
testing, and so left the counter in to appear in 7.0 and later so that =
we could perform this survey as core counts/etc increase.
>=20
> The two likely outcomes were "it is never exercised" and "it is =
exercised but only very infrequently", neither really justifying the =
quite complex change to correct it given requirements at the time.  =
On-going development work on the virtual network stack is what justifies =
correcting the bug at this point, moving from detecting and handling the =
race to preventing it from occuring as an invariant.  The motivation =
here, BTW, is that we'd like to eliminate the type-stable storage =
requirement for connection state (which ensures that memory once used =
for a connection block is only ever used for connection blocks in the =
future), allowing memory to be fully freed when a virtual network stack =
is destroyed.  Using type-stable storage helped address this bug, but =
was primarily present to reduce the overhead of monitoring using =
netstat(1).  We'll now need to use a slightly more expensive solution =
(true reference counts) in that context, although in practice it will =
almost certainly be an unmeasurable cost.
>=20
> Which is to say that while there might be something in the em/altq/... =
thread to reasonably lead you to avoid 8.0, nothing in the TCP timer =
race thread should do so, since it affects 7.2 just as much as 8.0.  =
Even if you do see a non-zero counter, that's not a matter for =
operational concern, just useful from the perspective of a network stack =
developer to understanding timing and behaviors in the stack.  :-)


Thanks for the complete explanation.  I don't believe the ALTQ issue =
will affect me.  I am not currently using it and do not expect to in the =
near future.  In addition, there was a posting that a fix for at least =
part of that will be added in a week or so.  Given all that it appears =
its time to start the planning/testing process for 8.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?80C9B3BA-C498-419B-BD5E-6C2111F24F64>