From owner-freebsd-hackers@FreeBSD.ORG Thu Aug 10 18:05:58 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DBEA216A4DF; Thu, 10 Aug 2006 18:05:58 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id E494543D49; Thu, 10 Aug 2006 18:05:57 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id k7AI5pXM075318; Thu, 10 Aug 2006 14:05:54 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Thu, 10 Aug 2006 13:31:50 -0400 User-Agent: KMail/1.9.1 References: <20060810151616.GA17109@stud.fit.vutbr.cz> <20060810154305.GA21483@lor.one-eyed-alien.net> <20060810161705.GB19047@stud.fit.vutbr.cz> In-Reply-To: <20060810161705.GB19047@stud.fit.vutbr.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200608101331.51473.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Thu, 10 Aug 2006 14:05:56 -0400 (EDT) X-Virus-Scanned: ClamAV 0.87.1/1644/Wed Aug 9 23:55:42 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on server.baldwin.cx Cc: Divacky Roman , hackers@freebsd.org Subject: Re: SoC: help with LISTs and killing procs X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Aug 2006 18:05:59 -0000 On Thursday 10 August 2006 12:17, Divacky Roman wrote: > On Thu, Aug 10, 2006 at 10:43:05AM -0500, Brooks Davis wrote: > > On Thu, Aug 10, 2006 at 05:35:43PM +0200, Divacky Roman wrote: > > > On Thu, Aug 10, 2006 at 10:23:59AM -0500, Brooks Davis wrote: > > > > On Thu, Aug 10, 2006 at 05:16:17PM +0200, Divacky Roman wrote: > > > > > hi > > > > > > > > > > I am doing this: > > > > > > > > > > (pseudocode) > > > > > LIST_FOREACH_SAFE(em, &td_em->shared->threads, threads, tmp_em) { > > > > > > > > > > kill(em, SIGKILL); > > > > > } > > > > > > > > > > kill(SIGKILL) calls exit() which calls my exit_hook() > > > > > > > > > > my exit_hook() does LIST_REMOVE(em, threads). > > > > > > > > > > the problem is that this is not synchronous so I am getting a panic by INVARIANTS > > > > > that "Bad link elm prev->next != elm". This is because I list 1st item in the list > > > > > I call kill on it, then process 2nd list, then scheduler preempts my code and calls > > > > > exit() on the first proc which removes the first entry and bad things happen. > > > > > > > > > > I see this possible solutions: > > > > > > > > > > make this synchronous, it can be done by something like: > > > > > > > > > > .... > > > > > kill(em, SIGKILL); > > > > > wait_for_proc_to_vanish(); > > > > > > > > > > pls. tell me what do you think about this solution and if its correct what is the wait_for_proc_to_vanish() > > > > > > > > > > maybe there's some better solution, pls tell me. > > > > > > > > It sounds like you need a lock protecting the list. If you held it over > > > > the whole loop you could signal all processes before the exit_hook could > > > > remove any. > > > > > > I dont understand. I am protecting the lock by a rw_rlock(); > > > > > > the exit_hook() then acquires rw_wlock(); when removing the entry. > > > what exactly do you suggest me to do? I dont get it. > > > > This can't be the case. If you're holding a read lock around the > > loop (it must cover the entire loop), it should not be possible for the > > exit_hook() to obtain a write lock while you are in the loop. Just to > > verify, is the lock for the list and not per element? > > oh.. I see whats going on.. in the exit_hook I am doing this: > > > em = em_find(p->p_pid, EMUL_UNLOCKED); // this performs EMUL_RLOCK(&emul_lock); > ... > EMUL_RUNLOCK(&emul_lock); > > EMUL_WLOCK(&emul_lock); > LIST_REMOVE(em, threads); > SLIST_REMOVE(&emuldata_head, em, linux_emuldata, emuldatas); > EMUL_WUNLOCK(&emul_lock); > > the EMUL_RUNLOCK() unlocks it so it doesnt wait. This should be turned into rw_try_upgrade() > but I dont understand how ;( You could make em_find() take an argument to specify if it should acquire a WLOCK instead of an RLOCK. Really, unless you have measured a lot of contention on this lock, you should just make it a mtx, and only go back and make it a rwlock if it really needs it. Also, you currently can't do an interlocked msleep() or cv_wait() with a rwlock, so you really need to use a mutex anyway. > anyway, I still dont understand how should I use the lock to achieve the synchronization. > > my code looks like: > > EMUL_RLOCK(&emul_lock); > LIST_FOREACH_SAFE(em, &td_em->shared->threads, threads, tmp_em) { > } > EMUL_RUNLOCK(&emul_lock); > > what do you suggest? I need to process the list first and then let the exit_hook in the various processes run. This is not safe anyway. kill() is way too big of a function to call with a lock held. You also pass the wrong thread to it IIRC (you should always pass curthread as the td argument to a syscall). You probably need to use psignal, and you probably should be doing something like this: EMUL_LOCK(); LIST_FOREACH(td, &td_em->shared->threads, threads) { p = td->td_proc; PROC_LOCK(p); psignal(p, SIGKILL); PROC_UNLOCK(p); } while (THREADS_STILL_AROUND(&td->em)) msleep(td_em, &emul_lock, PWAIT, "foo", 0); EMUL_UNLOCK(); Then in your exit_hook you should do this: em = em_find(p->p_pid, EMUL_UNLOCKED); LIST_REMOVE(...); SLIST_REMOVE(...); wakeup(em); EMUL_UNLOCK(); -- John Baldwin