Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 2 May 2017 21:25:40 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Bruce Evans <brde@optusnet.com.au>, Alan Somers <asomers@freebsd.org>,  Gleb Smirnoff <glebius@freebsd.org>,  "src-committers@freebsd.org" <src-committers@freebsd.org>,  "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>,  "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>
Subject:   Re: svn commit: r317061 - in head: libexec/rpc.rstatd sys/amd64/amd64 sys/amd64/include sys/arm/arm sys/arm/include sys/arm64/include sys/cddl/contrib/opensolaris/uts/common/fs/zfs sys/compat/linprocfs...
Message-ID:  <20170502203703.I1176@besplex.bde.org>
In-Reply-To: <20170502095527.GB1622@kib.kiev.ua>
References:  <201704171734.v3HHYlf5022945@repo.freebsd.org> <CAOtMX2jdNj0du0ZuUKPr16iHK_YeNVzf-nDvwC-MuFM003VVAg@mail.gmail.com> <20170419130428.J956@besplex.bde.org> <20170430201324.GV1622@kib.kiev.ua> <20170501163725.U972@besplex.bde.org> <20170502095527.GB1622@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2 May 2017, Konstantin Belousov wrote:

> On Mon, May 01, 2017 at 06:25:11PM +1000, Bruce Evans wrote:
>> XX Index: subr_counter.c
>> XX ===================================================================
>> XX --- subr_counter.c	(revision 317604)
>> XX +++ subr_counter.c	(working copy)
>> XX @@ -78,11 +78,15 @@
>> XX  sysctl_handle_counter_u64(SYSCTL_HANDLER_ARGS)
>> XX  {
>> XX  	uint64_t out;
>> XX +	uint32_t out32;
>> XX  	int error;
>> XX
>> XX  	out = counter_u64_fetch(*(counter_u64_t *)arg1);
>> XX
>> XX -	error = SYSCTL_OUT(req, &out, sizeof(uint64_t));
>> XX +	if (req->oldlen == 4 && (out32 = out) == out)
>> XX +		error = SYSCTL_OUT(req, &out32, sizeof(out32));
>> XX +	else
>> XX +		error = SYSCTL_OUT(req, &out, sizeof(out));
>> XX
>> XX  	if (error || !req->newptr)
>> XX  		return (error);
>>
>> This does the minimum necessary to fix the current problem.
>>
>> This works until the counters become too large for a u_int.  There
>> could be an option to get truncation with no error, but that would
>> require an API change, so applications should be required to do the
>> truncation for themself if that is what they want.
>
> Yes, this is approximately what I consider to be enough fix, but with
> some details which I do not like and which did not worked well on my
> test machine with enough memory to generate lot of increments.
>
> Below is what I consider to be good enough.  I will proceed with this
> version unless very strong objections appear.

I prefer my version.  It is the same as for sysctl_bufspace(), except
for simplifications and bug fixes that are easier because the types
are unsigned.

Your version is especially broken for the case of a lot of increments.

Clamping breaks many cases in systat -v which just needs unsigned counters
to calculate small differences.  Not many counters overflow faster than
the refresh interval in systat or top.  These cases used to appear to
work perfectly for most counters when the kernel did blind truncations.
Clamping breaks the API.  Not returing ENOMEM breaks the API.

In my version, this is handled as specified by the API by truncating and
returning ENOMEM.  Then:
- top sees the ENOMEM and mishandles it by aborting
- systat sees the ENOMEM (and also sees a short count, if it has the newer
   breakage of hard-coding 64-bit counters instead of u_int ones, and is
   run on a kernel with u_int counters) and mishandles it better by
   printing an error message but not aborting.  Thus it appears to work
   just as well as when the kernel did blind truncation, except for the
   error message.

Clamping would be difficult for the application to handle even if an
error is returned.  This would be like the situation for clamping for
strtoul(), but worse since it changes the API.  Many applications don't
understand how to handle errno for strtoul(), but they do do a bogus
check of the clamped value.  For sysctl(), clamping is not part of
the API now, so applications shouldn't check for the clamped values.
Ones that work with deltas like systat -v would see the counters stop
incrementing when the clamped value is reached, while truncated values
allow them to continue to appear to work perfectly.

> diff --git a/sys/vm/vm_meter.c b/sys/vm/vm_meter.c
> index 5f4cd46ab1e..93d5161de89 100644
> --- a/sys/vm/vm_meter.c
> +++ b/sys/vm/vm_meter.c
> @@ -266,8 +266,29 @@ static SYSCTL_NODE(_vm_stats, OID_AUTO, vm, CTLFLAG_RW, 0,
> 	"VM meter vm stats");
> SYSCTL_NODE(_vm_stats, OID_AUTO, misc, CTLFLAG_RW, 0, "VM meter misc stats");
>
> +static int
> +sysctl_handle_vmstat(SYSCTL_HANDLER_ARGS)
> +{
> +	uint64_t out;
> +#ifdef COMPAT_FREEBSD11
> +	uint32_t out1;
> +#endif

The extra wrapper and the ifdef limit the damage from the API change.
Otherwise, I don't like them.

I prefer my variable name out32 to the nondescript out1.  ('out' should
also be renamed out64, but I wanted to keep the diffs small.)

> +
> +	out = counter_u64_fetch(*(counter_u64_t *)arg1);
> +#ifdef COMPAT_FREEBSD11
> +	if (req->oldlen == sizeof(uint32_t)) {

sizeof(uint32_t) is an obfuscated spelling of '4'.  The size is already
hard-coded in '32'.  This depends on sizeof(char) being 1.  It is, and
won't change.  POSIX started specifying this in 2001, and the mere
existence of uint8_t probably requires 8-bit chars since u_char must
be the smallest type of an object.  This depends on the size of a u_int
being 32 bits/4 bytes in the old ABI.  This won't change, and is already
hard-coded as 32, as it must be because sizeof(u_int) may change.

> +		out1 = 0xffffffff;
> +		if (out < out1)
> +			out1 = out;

This does the clamping.  Clamping is probably better expressed as:

 		out1 = MIN(out, 0xffffffff)

> +		return (SYSCTL_OUT(req, &out1, sizeof(uint32_t)));

Here sizeof(uint32_t) is an obfuscated spelling of sizeof(out1).  I think
the sizeof() is justified in this case, to limit the hard-coding to of
sizes to 1 place.  Truncation by assigment to the destination variable also
helps limit this
    (sysctl_bufspace() could also be improved by truncation by assignment.
    Its integers are signed, so the result is implementation-defined, but
    we know that it is benign 2's complement and don't really care what it
    is provided it is reasonable.  This is better than laboriously checking
    the lower limit or neglecting to check it).
Here the limit 0xffffffff also depends on the size.

> +	}
> +#endif
> +	return (SYSCTL_OUT(req, &out, sizeof(uint64_t)));
> +}
> +
> #define	VM_STATS(parent, var, descr) \
> -    SYSCTL_COUNTER_U64(parent, OID_AUTO, var, CTLFLAG_RD, &vm_cnt.var, descr)
> +    SYSCTL_OID(parent, OID_AUTO, var, CTLTYPE_U64 | CTLFLAG_MPSAFE | \
> +    CTLFLAG_RD, &vm_cnt.var, 0, sysctl_handle_vmstat, "QU", descr);

I see a new problem, shared with my version.  The sysctl data says that this
sysctl is pure 64 bits, but it can return 32 bits.  The sysctl data just
can't describe the variation.  I guess this is not much a problem, since it
takes magic to read the sysctl data from the kernel and not many programs
except sysctl know how to do it.  There programs will see the pure 64
bits and only attempt to use 64 bit types.

> #define	VM_STATS_VM(var, descr)		VM_STATS(_vm_stats_vm, var, descr)
> #define	VM_STATS_SYS(var, descr)	VM_STATS(_vm_stats_sys, var, descr)

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170502203703.I1176>