From owner-freebsd-stable@FreeBSD.ORG Thu Mar 12 10:40:33 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 93BC91C8; Thu, 12 Mar 2015 10:40:33 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1B89921E; Thu, 12 Mar 2015 10:40:32 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t2CAeNOX030323 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 12 Mar 2015 12:40:24 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t2CAeNOX030323 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t2CAeNCT030320; Thu, 12 Mar 2015 12:40:23 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 12 Mar 2015 12:40:23 +0200 From: Konstantin Belousov To: Mark Johnston Subject: Re: Suspected libkvm infinite loop Message-ID: <20150312104023.GL2379@kib.kiev.ua> References: <54FE3803.2000307@akips.com> <20150310215913.GB52108@charmander.picturesperfect.net> <54FFBDE9.5060702@akips.com> <1648097.s1OBMXVVbH@ralph.baldwin.cx> <5501108C.4080303@akips.com> <20150312043407.GA11120@raichu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150312043407.GA11120@raichu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-stable@freebsd.org, Nick Frampton , John Baldwin , kib@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Mar 2015 10:40:33 -0000 On Wed, Mar 11, 2015 at 09:34:07PM -0700, Mark Johnston wrote: > On Thu, Mar 12, 2015 at 02:05:32PM +1000, Nick Frampton wrote: > > On 12/03/15 00:38, John Baldwin wrote: > > >>> It sounds like this issue might be the one fixed in r272566: if the > > >>> > >KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an > > >>> > >sbuf error return value could bubble up and be treated as ERESTART, > > >>> > >resulting in a loop. > > >>> > > > > >>> > >This can be confirmed with something like > > >>> > > > > >>> > > dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p > > >>> > > > > >>> > >If the output consists solely of __sysctl, this bug is likely the > > >>> > >culprit. > > >> > > > >> >Unfortunately, I accidentally killed fstat this morning before I could do any further debug. > > >> > > > >> >I ran truss -p on it yesterday and it was spinning solely on __sysctl. > > >> > > > >> >I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the > > >> >problem in a reasonable time frame so it could be days or weeks before we see it happen again. > > > Tha truss output is consistent with Mark's suggestion, so I would try > > > his suggested fix of 272566. > > > > I patched the 10.1 kernel with r272566 and it appears to have fixed the issue. Is this patch likely > > to be MFCed back to 10-stable? > > I can't see any reason it shouldn't be, and there was an MFC reminder in > the commit log entry for that revision. I've cc'ed kib@, who might have a > reason. The mentioned commit depends on r271976, in fact it depends on the series of commits, including r271486 and r271489. I did not merged r271976 with manual resolution of the conficts, since it means that the work done for HEAD needs to be redone for stable/10 to ensure that all cases are covered. Later, when the mentioned series is merged, the work should be redone once more. And to note, r271489 is not trivially mergeable as well, just checked.