From owner-freebsd-current@freebsd.org Thu Dec 17 22:33:57 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 975CAA4B3D2 for ; Thu, 17 Dec 2015 22:33:57 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5C90E1D89 for ; Thu, 17 Dec 2015 22:33:57 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id tBHMXkNR096011; Thu, 17 Dec 2015 14:33:51 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201512172233.tBHMXkNR096011@gw.catspoiler.org> Date: Thu, 17 Dec 2015 14:33:46 -0800 (PST) From: Don Lewis Subject: Re: fork_findpid() - Fatal trap 12: page fault while in kernel mode To: mjguzik@gmail.com cc: kostikbel@gmail.com, freebsd-current@freebsd.org In-Reply-To: <20151217195802.GA29200@dft-labs.eu> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Dec 2015 22:33:57 -0000 On 17 Dec, Mateusz Guzik wrote: > On Thu, Dec 17, 2015 at 11:48:08AM -0800, Don Lewis wrote: >> On 17 Dec, Konstantin Belousov wrote: >> > On Wed, Dec 16, 2015 at 11:08:02AM -0800, Don Lewis wrote: >> >> I used to have a patch the deferred linking the new process into >> >> proctree/allproc until it was fully formed. The motivation was to get >> >> rid of all of the PRS_NEW stuff scattered around the source. >> >> Unfortunately the patch bit-rotted and I'm pretty sure that I lost it. >> > >> > I had similar tought for a second as one of the possibilities to fix the >> > issue, but rejected it outright due to the way the pid allocator works. >> > The loop which faulted is the allocator, it depends on the new pid being >> > linked early to detect the duplicated alloc. >> > >> > What you wrote could be done, but this restructuring requires the separate >> > pid allocator, and probably it must repeat all quirks and subtle behaviour >> > of the current algorithm. But I do not object, PRS_NEW is a trouble >> > on its own. >> >> I don't think it requires any changes to the allocater. It should only >> be necessary to delay the call to fork_findpid() until we are ready to >> link the new proc into allproc. Basically, drop the locks at the >> beginning of do_fork(), then grab them again somewhere near the end >> (probably where we are currently mark the process as PRS_NORMAL) and >> move the call to fork_findpid(), the p2->p_pid assignment, and the list >> manipulation code to a location after that. >> >> It's probably not quite that simple though ... > > That would mean you would need to be able to deconstruct the process > because you cannot guarantee there are any pids left, which may or may > not be easily doable. It doesn't look like we handle that properly in the current code. I think fork_findpid() will loop forever. It shouldn't be possible if maxproc < pid_max / 3, or maybe pid_max / 2. It might be a good idea to enforce this. > The current method is going to bite us performance-wise anyway and an > allocater which does not require a walk over the tree is necessary in > the long run. Seems like a bitmap (or a bunch of bitmaps) is the way to > go here. I think that separate bitmaps for process, process group, and session ids would be needed. It would waste some space, but it's probably more efficent to use a byte array and store all the bits for the pid together. > Meanwhile one can add a special process permanently in PRS_NEW state and > poisoned pointers in debug kernels to help ensuring that all loops > handle the case. > > Not signing up for any of this work though.