Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 May 2016 06:03:54 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Ian Lepore <ian@freebsd.org>
Cc:        "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>,  Nathan Whitehorn <nwhitehorn@freebsd.org>,  Justin Hibbits <chmeeedalf@gmail.com>, Scott Long <scottl@freebsd.org>,  src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r300154 - head/sys/net
Message-ID:  <20160519043606.G1061@besplex.bde.org>
In-Reply-To: <1463594263.1180.298.camel@freebsd.org>
References:  <201605181545.u4IFjCKD030751@repo.freebsd.org>  <20160518105033.1eae7432@zhabar.knownspace> <759d085c-a485-c2ed-5d70-26eb4d27cdc2@freebsd.org> <1463592737.1180.294.camel@freebsd.org> <1EED3540-DCCF-40D2-91BA-CE0AA54C5D08@lists.zabbadoz.net> <1463594263.1180.298.camel@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 18 May 2016, Ian Lepore wrote:

> On Wed, 2016-05-18 at 17:35 +0000, Bjoern A. Zeeb wrote:
>>> On 18 May 2016, at 17:32 , Ian Lepore <ian@freebsd.org> wrote:
>>>
>>> On Wed, 2016-05-18 at 10:14 -0700, Nathan Whitehorn wrote:
>>> ...
>>> It may be more complicated than that, though.  armv6 can do 64-bit
>>> atomics even tho it's 32-bit.  armv4, also 32-bit, can do 64-bit
>>> atomics in the kernel but not in userland.
>>>
>>> Maybe machine/atomic.h needs a #define that says whether 64-bit ops
>>> are
>>> available in the current compilation unit.  (And likewise for other
>>> bit
>>> sizes if we have arches that have other limitations.)
>>
>> Question because I didn=A2t follow the details, but how was this solved
>> for the COUNTERS framework?

Using special code that pessimizes old machines on 32-bit arches especially=
=2E

For example, incrementing a 32-bit network counter used to take 1 inlined
counter++ statement
     (as little as 1 instruction on i386, but it is a read-modify-write
     instruction and thus no faster than separate instructions, and if
     the counter is shared this is almost as slow as a locked instruction
     in some cases).
This now takes an if_inc_counter() function call which takes 33 instruction=
s
altogether on i386 with certain nonstandard not very optimal CFLAGS, and 9
instructions on amd64.
     (COUNTER64() code is inlined, and the function call is a separate
     pessimization.  It costs about half of the 9 instructions on amd64
     and its instructions are relatively heavyweight.)
This is when i386 has cmpxchgb.  The single cmpxchg8b instruction is
heavier weight than a 32-bit memory increment and using it takes lots
of control logic.

> iirc, each platform implements counters its own way, probably the wrong
> way on all of them except x86.

I think other arches just use compatiblity code which uses critical
sections.  This is not so bad.  It might be faster than using cmpxchg8b
depending on how fast critical_enter() and critical_exit() are.
Unfortunately, they are not very fast.  They are functions too, and
on i386 critical_enter() takes 20+ instructions.  critical_exit() takes
more, and debugging is broken and caused a panic when I tried to
trace through critical_exit().  That is so slow that hard-disabling
interrupts is probably faster.

Network drivers were mostly written under the assumption that they are
running on a UP system and incrementing a counter is inline and fast,
so they increment counters without worrying about the overhead for
this.  33 instructions for 2 if_inc_counter()s per packet is about a
1% pessimization for bge on my slow hardware.

> For some crazy reason the docs for COUNTERS say that it does not use
> atomics.  I have no idea why the docs for an API are dictating
> implementation, but I suspect it's because atomics are more expensive
> on x86 than other alternatives.  So the arm code slavishly avoids using
> atomics in COUNTERS even though doing so would be more effecient than
> the current copied-from-x86 code.

Other places just hard-code use of PCPU although that is also more
complicated and uglier than counter++.  Counters in PCPU only exist
because full atomics are probably slower on all arches and much slower
on most arches.  Most 64-bit PCPU accesses on 32-bit arches are broken
since they are not atomic even for 1 CPU.  COUNTER64() is more careful
to a fault.  arm PCPU_INC() seems to be broken even for 32-bit accesses.
In the non-SMP case, it just does pcpu->pc_ ## member++ and in the SMP
case it does the same with a register pointer instead of a global.

I wrote some alternative x86 implementations that are at least 20%
faster than the cmpxchg8b method, but the best method is clearly to
use only 32-bit low-level counters and add up the counters in a daemon.
The daemon shouldn't run very often.  There aren't many counters except
i/o byte counters that want to wrap in 32 bits more than once per hour.
Even 100 Gbps ethernet can only do 150 Mpps so it takes at least 30
seconds to wrap.

Bruce
From owner-svn-src-all@freebsd.org  Wed May 18 20:05:45 2016
Return-Path: <owner-svn-src-all@freebsd.org>
Delivered-To: svn-src-all@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 61B1AB41BD9;
 Wed, 18 May 2016 20:05:45 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42])
 by mx1.freebsd.org (Postfix) with ESMTP id EC29A111D;
 Wed, 18 May 2016 20:05:44 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au
 (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id A01833C57DF;
 Thu, 19 May 2016 06:05:37 +1000 (AEST)
Date: Thu, 19 May 2016 06:05:36 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: yBryan Drewery <bdrewery@freebsd.org>, Gleb Smirnoff <glebius@freebsd.org>,
 src-committers@freebsd.org, svn-src-all@freebsd.org, 
 svn-src-releng@freebsd.org
Subject: Re: svn commit: r300088 - in releng/9.3: . sys/conf sys/dev/kbd
In-Reply-To: <20160518124147.V7042@besplex.bde.org>
Message-ID: <20160519060448.B1270@besplex.bde.org>
References: <201605172228.u4HMSbhj012124@repo.freebsd.org>
 <14a8d29d-bc14-3f96-57a4-81f1b6dfdd82@FreeBSD.org>
 <20160517230710.GB1015@FreeBSD.org>
 <38ca6091-5607-5796-9f6e-7f2d6c117707@FreeBSD.org>
 <20160518124147.V7042@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=TuMb/2jh c=1 sm=1 tr=0
 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8
 a=YdjKeqWGE6BAugsnKCQA:9 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>;
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 20:05:45 -0000

On Wed, 18 May 2016, Bruce Evans wrote:

> On Tue, 17 May 2016, Bryan Drewery wrote:
>
>> On 5/17/16 4:07 PM, Gleb Smirnoff wrote:
>>> On Tue, May 17, 2016 at 03:59:26PM -0700, Bryan Drewery wrote:
>>> B> > Author: glebius
>>> B> > Date: Tue May 17 22:28:36 2016
>>> B> > New Revision: 300088
>>> B> > URL: https://svnweb.freebsd.org/changeset/base/300088
>>> B> >
>>> B> > Log:
>>> B> >   - Use unsigned version of min() when handling arguments of SETFKEY 
>>> ioctl.
>>> B> >   - Validate that user supplied control message length in sendmsg(2)
>>> B> >     is not negative.
>>> B>
>>> B> The sendmsg(2) change is not included here (9.3) nor in the advisory 
>>> but
>>> B> is in the commit log.  Was it intended to be changed in 9.3?
>>> 
>>> That was my failure to mention SA-16:19 in commit message for 9.3. It 
>>> doesn't
>>> apply to 9.x.
>>> 
>>> B> Plus the only consumer I see is sendit() which seems to be protected
>>> B> already from negative values when not using COMPAT_43:
>>> B>
>>> B> >                  if (mp->msg_controllen < sizeof(struct cmsghdr)
>>> B> >  #ifdef COMPAT_OLDSOCK
>>> B> >                      && mp->msg_flags != MSG_COMPAT
>>> B> >  #endif
>>> B> >                  ) {
>>> B> >                          error = EINVAL;
>>> B> >                          goto bad;
>>> B> >                  }
>>> B> >                  error = sockargs(&control, mp->msg_control,
>>> B> >                      mp->msg_controllen, MT_CONTROL);
>>> 
>>> No, it isn't protected. In the comparison (mp->msg_controllen < 
>>> sizeof(struct cmsghdr))
>>> both values are unsigned. Later in sockargs() it is treated as signed.
>
> But it is protected (except on exotic unsupported arches).  The above is
> a complete bounds check for mp->msg_controllen, written in an obfuscated
> way using an unsigned type botch/hack.  Negative values are normally
> promoted to large unsigned values, so they fail this check and sockargs()
> is never called with them.
>
> On exotic arches, the analysis is more complicated and the hack doesn't
> work.  ...

Oops.  I read this sort of backwards.  The unsign botches are somewhat
larger, and more like the one in kbd.c:
- this only checks the lower bound, so if the operands were negative then
   they would pass instead of fail the check after bogus unsign extension
- the operands are actually both unsigned, since msg_controllen already
   has unsigned poisoning.  It was poisoned even in FreeBSD-1.  It was
   u_int then.  It is still u_int in 4.4BSD-Lite*.  Now it is socklen_t,
   which is uint32_t.  POSIX has messes for this.  At least in the 2001
   version, it doesn't seem to require the poisoning, but recommends that
   applications not use values above 2**31-1 [since these values require
   the poisoning to represent, and are not very useful except for opening
   security holes like here].
- so after passing the lower bound check, msg_controllen may have a large
   unsigned 32-bit value.  Undefined behaviour occurs when we pass this
   to sockargs() which doesn't have unsign poisoning.  Except on unsupported
   exotic arches, the behaviour is to overflow to a negative value.
- sockargs() then checks the overflowed value.  This is robust enough if
   the overflow gives any value at all, but still bogus.

Correct code would do bounds checks before calling sockargs (of the
form val >= min && val <= INT_MAX), but there are several callers and
it is convenient to check only in sockargs().  For that, sockargs must
take a parameter with the same type as msg_controllen, although old
unsign botches force this to be unsigned (precisely socklen_t).  It is
too late to change socklen_t back to int.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160519043606.G1061>