From owner-svn-src-user@freebsd.org Thu Oct 6 00:13:20 2016 Return-Path: Delivered-To: svn-src-user@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D82A0AF448A for ; Thu, 6 Oct 2016 00:13:20 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 82ACA9B6; Thu, 6 Oct 2016 00:13:19 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-106-163-102.carlnfd1.nsw.optusnet.com.au (c122-106-163-102.carlnfd1.nsw.optusnet.com.au [122.106.163.102]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 44B5FD600A7; Thu, 6 Oct 2016 11:13:11 +1100 (AEDT) Date: Thu, 6 Oct 2016 11:13:10 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: "Philip M. Gollucci" cc: Alan Cox , src-committers@freebsd.org, svn-src-user@freebsd.org Subject: Re: svn commit: r306713 - in user/alc/PQ_LAUNDRY: lib/libc/stdlib lib/msun/ld80 lib/msun/src sys/cam sys/vm In-Reply-To: Message-ID: <20161006101333.N1163@besplex.bde.org> References: <201610051803.u95I3Hq1040052@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=CoZCCSMD c=1 sm=1 tr=0 a=IXAyHK3mFcy+1kvmsno0Fw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=KO4GtmpiOqKhjvcpFJEA:9 a=cV0wJtvhprYiQA4Q:21 a=BLgT7fOEgMPPPvzW:21 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-BeenThere: svn-src-user@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the experimental " user" src tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Oct 2016 00:13:20 -0000 On Wed, 5 Oct 2016, Philip M. Gollucci wrote: > I know you(Alan) didn't write this, its from the MFH, but how would len != > expected. > > neither are initialized or passed in > both times its set its expected = len = foo Er, both are initialized. len is passed as an input/output parameter to sysctl(). sysctl() can return a short count. This is now detected by recording the expected value in 'expected' and mishandled. Previously, the error wasn't even detected. > On Wed, Oct 5, 2016 at 2:03 PM, Alan Cox wrote: > >> Modified: user/alc/PQ_LAUNDRY/lib/libc/stdlib/random.c >> ============================================================ >> ================== >> --- user/alc/PQ_LAUNDRY/lib/libc/stdlib/random.c Wed Oct 5 >> 17:32:06 2016 (r306712) >> +++ user/alc/PQ_LAUNDRY/lib/libc/stdlib/random.c Wed Oct 5 >> 18:03:17 2016 (r306713) >> @@ -270,16 +270,17 @@ void >> srandomdev(void) >> { >> int mib[2]; >> - size_t len; >> + size_t expected, len; >> >> if (rand_type == TYPE_0) >> - len = sizeof(state[0]); >> + expected = len = sizeof(state[0]); >> else >> - len = rand_deg * sizeof(state[0]); >> + expected = len = rand_deg * sizeof(state[0]); >> >> mib[0] = CTL_KERN; >> mib[1] = KERN_ARND; >> - sysctl(mib, 2, state, &len, NULL, 0); >> + if (sysctl(mib, 2, state, &len, NULL, 0) == -1 || len != expected) >> + abort(); >> >> if (rand_type != TYPE_0) { >> fptr = &state[rand_sep]; I don't like this set of changes. Originally, srandomdev() read from /dev/random. read() can also return a short count. This was detected and mishandled, but not as badly as now. The non-error of a short count was treated as an error, and both were handled by falling back to the weak method of using gettimeofday(). This was broken by switching to the sysctl and not even checking for errors from the sysctl. The sysctl can and does fail, mainly when a new userland is used with an old kernel. This commit restores some error checking. The implementation uses the style bugs of using the numbered sysctl KERN_ARND. This sysctl shouldn't exist by number, and isn't documented by number. It is too new to need a number or benefit from the careful documentation of old numbered sysctls. It is only documented by name (kern.arandom), only in a wrong place (random(4)), and a naive reader would expect from reading this man page that /dev/random is still the primary interface. Its behaviour of returning a short count might be expected from the device's behaviour, but is not really documented for the sysctl. It is only documented that the sysctl will not return random bytes unless the random device is seeded. The return value for this case is undocumented (is it 0 or -1? Source code seems to say that it is 0). Anyway, errors seems to be possible, so abort() is not acceptable handling. This was handled much better in arc4random() and is still handled better there despite recent regressions: - the sysctl is wrapped by a function that retries for short counts. However, the sysctl usage has even larger style bugs -- the sysctl is by number, and sysctl() is named __sysctl() - error handling was not missing. Perhaps the retry loop can spin forever with a short count of 0, but if the sysctl fails then there was a fallback first to reading /dev/random and then to the weak gettimeofday() method. Now there is no fallback, and the difference is just the retry loop. The first step in the removed fallback in arc4random() was not weaker like the log message says. It is to the primary method according to the man page. According to the man page, reading /dev/random blocks until the RNG is seeded for the first time. Blocking for the sysctl is undocumented. I think the sysctl returns a short count of 0, and we treat this as an error and abort(), so applications running early in the boot crash instead of having the traditional behaviour of hanging. It is unclear if short counts of nonzero can actually occur. I think they could easily occur in older implementations where /dev/random blocked for more cases than before the initial seeding. I think they should still be possible. Even if there is no need to block, it is useful to be able to kill a read() with a preposterously large count. If interrupts are allowed at all, then after one the return value must be either -1 or a short count. That is another reason why abort() in a library is bad error handling. It turns most signals into SIGABRT. Bruce