From owner-freebsd-current@FreeBSD.ORG Wed Feb 15 17:55:24 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17C631065676 for ; Wed, 15 Feb 2012 17:55:24 +0000 (UTC) (envelope-from dmitrym@juniper.net) Received: from exprod7og108.obsmtp.com (exprod7og108.obsmtp.com [64.18.2.169]) by mx1.freebsd.org (Postfix) with ESMTP id 9EEC78FC12 for ; Wed, 15 Feb 2012 17:55:23 +0000 (UTC) Received: from P-EMHUB01-HQ.jnpr.net ([66.129.224.36]) (using TLSv1) by exprod7ob108.postini.com ([64.18.6.12]) with SMTP ID DSNKTzvxicJ9XahSPa13tOJMCDE/cTDxIqGn@postini.com; Wed, 15 Feb 2012 09:55:23 PST Received: from magenta.juniper.net (172.17.27.123) by P-EMHUB01-HQ.jnpr.net (172.24.192.33) with Microsoft SMTP Server (TLS) id 8.3.213.0; Wed, 15 Feb 2012 09:54:49 -0800 Received: from [172.24.26.191] (dmitrym-lnx.jnpr.net [172.24.26.191]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id q1FHsm162300; Wed, 15 Feb 2012 09:54:48 -0800 (PST) (envelope-from dmitrym@juniper.net) Message-ID: <4F3BF164.2020506@juniper.net> Date: Wed, 15 Feb 2012 09:54:44 -0800 From: Dmitry Mikulin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:9.0) Gecko/20111229 Thunderbird/9.0 MIME-Version: 1.0 To: Konstantin Belousov References: <20120209122908.GD3283@deviant.kiev.zoral.com.ua> <4F34311A.9050702@juniper.net> <20120210001725.GJ3283@deviant.kiev.zoral.com.ua> <4F3478B3.9040809@juniper.net> <20120213152825.GH3283@deviant.kiev.zoral.com.ua> <4F3988E8.2040705@juniper.net> <20120213222521.GK3283@deviant.kiev.zoral.com.ua> <4F3993C5.5020703@juniper.net> <20120215163252.GZ3283@deviant.kiev.zoral.com.ua> <4F3BE9C2.8040908@juniper.net> <20120215174031.GB3283@deviant.kiev.zoral.com.ua> In-Reply-To: <20120215174031.GB3283@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-EXCLAIMER-MD-CONFIG: f8e27f27-03b2-4c3e-9447-119194e72cb6 X-Mailman-Approved-At: Wed, 15 Feb 2012 18:02:35 +0000 Cc: freebsd-current Current , Marcel Moolenaar Subject: Re: [ptrace] please review follow fork/exec changes X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Feb 2012 17:55:24 -0000 On 02/15/2012 09:40 AM, Konstantin Belousov wrote: > On Wed, Feb 15, 2012 at 09:22:10AM -0800, Dmitry Mikulin wrote: >> >> On 02/15/2012 08:32 AM, Konstantin Belousov wrote: >>> On Mon, Feb 13, 2012 at 02:50:45PM -0800, Dmitry Mikulin wrote: >>>>>>> It seems that now wait4(2) can be called from the real (non-debugger) >>>>>>> parent first and result in the call to proc_reap(), isn't it ? We would >>>>>>> then just reparent the child back to the caller, still leaving the >>>>>>> zombie and confusing debugger. >>>>>> When either gdb or the real parent gets to proc_reap() the process >>>>>> wouldn't >>>>>> get destroyed, it'll get caught by the following clause: >>>>>> if (p->p_oppid&& (t = pfind(p->p_oppid)) != NULL) { >>>>>> >>>>>> and the real parent with get the child back into the children's list >>>>>> while >>>>>> gdb will get it into the orphan list. The second time around when >>>>>> proc_reap() is entered, p->p_oppid will be 0 and the process will get >>>>>> really reaped. Does it make sense? And proc_reparent() attempts to keep >>>>>> the >>>>>> orphan list clean and not have the same entries and the list of >>>>>> siblings. >>>>> Right, this is what I figured. But I asked about some further implication >>>>> of this change: >>>>> >>>>> if real parent spuriosly calls wait4(2) on the child pid after the child >>>>> exited, but before the debugger called the wait4(), then exactly the >>>>> code you noted above will be run. This results in the child being fully >>>>> returned to the original parent. >>>>> >>>>> Next, the wait4() call from debugger gets an error, and zombie will be >>>>> kept around until parent calls wait4() for this pid once more. >>>>> >>>>> Am I missed something ? >>>> In this case the process will move from gdb's child list to gdb's orphan >>>> list when the real parent does a wait4(). Next time around the wait loop >>>> in >>>> gdb it'll be caught by the orphan's proc_reap(). >>> I do not see how the next debugger loop could find this process at all, >>> since the first wait4() call reparented it to the original parent. >> Not the debugger loop, the kern_wait() loop. The child get re-parented to >> the original parent but moves to the orphan list of the debugger process. > Either the debugger loop which calls wait4/waitpid, or the kern_wait loop > resulting from the debugger calling wait*. > > Could you, please, describe, how the patched kernel moves the wait'ed > zombie to the orphan list of the debugger ? > For me, it seems that there is another bug, the child appears both on > the childdren list, and on the orphan list of the real parent. The first attempt to reap the child will get into the if (p->p_oppid && (t = pfind(p->p_oppid)) != NULL) { clause, which will re-parent it to the real parent. The child will not be destroyed at this point. The following loop in proc_reparent() will make sure that the child does not stay in both lists: LIST_FOREACH(p, &parent->p_orphans, p_orphan) { if (p == child) { LIST_REMOVE(child, p_orphan); break; } } Since the child parent is gdb and it's still being traced, the following will move it to gdb's orphan list: if (child->p_flag & P_TRACED) LIST_INSERT_HEAD(&child->p_pptr->p_orphans, child, p_orphan); After this the real parent will get the exit status. The next pass through the kern_wait() loop called from gdb will catch the child in its orphan list and will reap it this time for real since p->p_oppid will be set to 0 in the previous attempt to reap it. Gdb gets the exit code, the child is destroyed.