Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Mar 2018 15:51:42 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Gleb Smirnoff <glebius@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r331328 - head/sys/kern
Message-ID:  <20180322144902.M1053@besplex.bde.org>
In-Reply-To: <201803212321.w2LNLWa1059145@repo.freebsd.org>
References:  <201803212321.w2LNLWa1059145@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 21 Mar 2018, Gleb Smirnoff wrote:

> Log:
>  Fix sysctl types broken in r329612.

This is still broken.

> Modified: head/sys/kern/vfs_bio.c
> ==============================================================================
> --- head/sys/kern/vfs_bio.c	Wed Mar 21 23:17:26 2018	(r331327)
> +++ head/sys/kern/vfs_bio.c	Wed Mar 21 23:21:32 2018	(r331328)
> @@ -261,17 +261,17 @@ SYSCTL_PROC(_vfs, OID_AUTO, numdirtybuffers,
>     "Number of buffers that are dirty (has unwritten changes) at the moment");
> static int lodirtybuffers;
> SYSCTL_PROC(_vfs, OID_AUTO, lodirtybuffers,
> -    CTLTYPE_LONG|CTLFLAG_MPSAFE|CTLFLAG_RW, &lodirtybuffers,
> +    CTLTYPE_INT|CTLFLAG_MPSAFE|CTLFLAG_RW, &lodirtybuffers,
>     __offsetof(struct bufdomain, bd_lodirtybuffers), sysctl_bufdomain_int, "L",
>     "How many buffers we want to have free before bufdaemon can sleep");

Now the format letter "L" is even more inconsistent.  The letter at least
used to even more important.  E.g., old versions of i386 machdep.tsc_freq
got most things wrong, but reads appeared to work in most cases since
the mismached CTLTYPE_QUAD was ignored by sysctl(8) and the format specifier
"IU" was used to print the truncated value correctly.  Writes appeared to
work in most cases although short writes are documented to fail in all cases:
from sysctl(3):

      [EINVAL]           A non-null newp is given and its specified length in
                         newlen is too large or too small.

because detection of short writes has been broken for more than 20 years.

For machdep.tsc_freq, CTLTYPE_QUAD gave 64-bit writes, but the
sysctl_handle_int() in the PROC handler attempted to do a 32-bit and the
broken short write detection allowed it to succeed.

I think the same brokenness prevents detection of the current bug as a
short write error -- CTLTYPE_ULONG gives 64-bit writes on 32-bit arches
and sysctl_handle_int() in the PROC handler attempts to do a 32-bit short
writes.  Lower levels of sysctl know the size from the CTLTYPE but don't
check it properly.  Most PROC handlers don't check it at all, but hard-code
sysctl_handle_int/etc().  This is reasonable since they can hard-code the
type in this call just as easily as hard-coding it in the sysctl data
(which is another bug, but harder to fix -- the size should be decided
at runtime).  In practice, there are inconsistencies.

I use the following fix for short write detection:

XX Index: kern_sysctl.c
XX ===================================================================
XX RCS file: /home/ncvs/src/sys/kern/kern_sysctl.c,v
XX retrieving revision 1.157
XX diff -u -2 -r1.157 kern_sysctl.c
XX --- kern_sysctl.c	11 Jun 2004 02:20:37 -0000	1.157
XX +++ kern_sysctl.c	11 Jun 2004 07:58:23 -0000
XX @@ -957,6 +949,13 @@
XX  	if (!req->newptr)
XX  		return 0;
XX -	if (req->newlen - req->newidx < l)
XX +	if (req->newlen - req->newidx != l) {
XX +		if (req->newlen - req->newidx > l) {
XX +			printf(
XX +		"sysctl_new_kernel -- short write: %zu - %zu > %zu\n",
XX +			    req->newlen, req->newidx, l);
XX +			Debugger("sysctl_new_kernel: short write");
XX +		}
XX  		return (EINVAL);
XX +	}
XX  	bcopy((char *)req->newptr + req->newidx, p, l);
XX  	req->newidx += l;
XX @@ -1075,6 +1073,13 @@
XX  	if (!req->newptr)
XX  		return 0;
XX -	if (req->newlen - req->newidx < l)
XX +	if (req->newlen - req->newidx != l) {
XX +		if (req->newlen - req->newidx > l) {
XX +			printf(
XX +		"sysctl_new_user -- short write: %zu - %zu > %zu\n",
XX +			    req->newlen, req->newidx, l);
XX +			Debugger("sysctl_new_user: short write");
XX +		}
XX  		return (EINVAL);
XX +	}
XX  	error = copyin((char *)req->newptr + req->newidx, p, l);
XX  	req->newidx += l;

The debugging code is of course only needed to detect inconsistent sysctl
handlers and applications depend on the bug.  Not many do.  After fixing
sysctl handlers, most applications work since they get the size using
arcane sysctls like sysctl(8), or by reading the variable and assuming tha
it can be written back using the same size.

Not many applications probe for the size.  They should, so that the sizes
of at least scalar vairbales aren't part of an ABI and so can be changed
easily.  (See old mail for patches that add some probing to the kernel --
the kernel attempts to match the sizes.  It truncates the sizes iff the
value can be represented in the changed size and in some other cases like
counters where truncation was the historical misbehaviour.  Unfortunately,
this interferes with applications probing the size.)

The read size is easy to probe for, and it should be possible to probe
for a write size that works, but the bug breaks that and corrupts
kernel variables instead.  The corruption is largest for the
little-endian case.  For machdep.tsc_freq, the only serious corruption
was for changing a variable near the 4G boundary.  E.g., if the variable
is initially 0xffffffff, then attempting to change it to 0x100000000
actually changed it to 0, by changing only the low 32 bits.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180322144902.M1053>