From owner-freebsd-arch@FreeBSD.ORG Tue Aug 23 13:11:10 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B90A5106564A; Tue, 23 Aug 2011 13:11:10 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 71A7C8FC0A; Tue, 23 Aug 2011 13:11:10 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id DD57D46B91; Tue, 23 Aug 2011 09:11:09 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7BFA58A02F; Tue, 23 Aug 2011 09:11:09 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Date: Tue, 23 Aug 2011 09:11:08 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E53986B.5000804@FreeBSD.org> In-Reply-To: <4E53986B.5000804@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108230911.09021.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 23 Aug 2011 09:11:09 -0400 (EDT) Cc: Andriy Gapon Subject: Re: skipping locks, mutex_owned, usb X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Aug 2011 13:11:10 -0000 On Tuesday, August 23, 2011 8:09:15 am Andriy Gapon wrote: > > Yes, the subject sounds quite hairy, so please let me try to explain it. > First, let's consider one concrete function: > > static int > ukbd_poll(keyboard_t *kbd, int on) > { > struct ukbd_softc *sc = kbd->kb_data; > > if (!mtx_owned(&Giant)) { > /* XXX cludge */ > int retval; > mtx_lock(&Giant); > retval = ukbd_poll(kbd, on); > mtx_unlock(&Giant); > return (retval); > } > > if (on) { > sc->sc_flags |= UKBD_FLAG_POLLING; > sc->sc_poll_thread = curthread; > } else { > sc->sc_flags &= ~UKBD_FLAG_POLLING; > ukbd_start_timer(sc); /* start timer */ > } > return (0); > } > > This "XXX cludge" [sic] pattern is scattered around a few functions in the ukbd > code and perhaps other usb code: > func() > { > if (!mtx_owned(&Giant)) { > mtx_lock(&Giant); > func(); > mtx_unlock(&Giant); > return; > } > > // etc ... > } > > I have a few question directly and indirectly related to this pattern. > > I. [Why] do we need this pattern? > Can the code be re-written in a smarter (if not to say proper) way? Giant is recursive, it should just be always acquired. Also, this recursive call pattern is very non-obvious. A far more straightforward approach would be to just have: static int ukbd_poll(keyboard_t *kbd, int on) { struct ukbd_softc *sc = kbd->kb_data; mtx_lock(&Giant); if (on) { sc->sc_flags |= UKBD_FLAG_POLLING; sc->sc_poll_thread = curthread; } else { sc->sc_flags &= ~UKBD_FLAG_POLLING; ukbd_start_timer(sc); /* start timer */ } mtx_unlock(&Giant); return (0); } > II. ukbd_poll() in particular can be called from the kdb context (kdb_trap -> > db_trap -> db_command_loop -> etc). What would happen if the Giant is held by a > thread on some other CPU (which would be hard-stopped by kdp_trap)? Won't we get > a lockup here? > So shouldn't this code actually check for kdb_active? Probably, yes. > III. With my stop_scheduler_on_panic patch ukbd_poll() produces infinite chains > of 'infinite' recursion because mtx_owned() always returns false. This is because > I skip all lock/unlock/etc operations in the post-panic context. I think that > it's a good philosophical question: what operations like mtx_owned(), > mtx_recursed(), mtx_trylock() 'should' return when we actually act as if no locks > exist at all? Most code should not be abusing mtx_owned() in this fashion. For Giant you should just follow a simple pattern like above that lets it recurse. For your own locks you generally should use things like mtx_assert() to require all callers of a given routine to hold the lock rather than recursively acquiring the lock. Very few legitimate cases of mtx_owned() exist IMO. It is debatable if we should even have a mtx_owned() routine since we have mtx_assert(). Given this, I would leave mtx_owned() and mtx_trylock() as-is. > That question III actually brings another thought: perhaps the whole of idea of > skipping locks in a certain context is not a correct direction? > Perhaps, instead we should identify all the code that could be legitimately > executed in the after-panic and/or kdb contexts and make that could explicitly > aware of its execution context. That is, instead of trying to make _lock, > _unlock, _owned, _trylock, etc do the right thing auto-magically, we should try to > make the calling code check panicstr and kdb_active and do the right thing on that > level (which would include not invoking those lock-related operations or other > inappropriate operations). I believe that you'd end up having to maintain duplicate code for the two cases and that it would be a bigger headache as a result. -- John Baldwin