From owner-freebsd-current@FreeBSD.ORG Sat Jan 24 18:43:12 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C7C531065670; Sat, 24 Jan 2009 18:43:12 +0000 (UTC) (envelope-from maksim.yevmenkin@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.233]) by mx1.freebsd.org (Postfix) with ESMTP id 8BAE18FC20; Sat, 24 Jan 2009 18:43:12 +0000 (UTC) (envelope-from maksim.yevmenkin@gmail.com) Received: by rv-out-0506.google.com with SMTP id b25so5351174rvf.43 for ; Sat, 24 Jan 2009 10:43:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=EFuadXZN4hrA8k7cZE3kZcrgmqW1Ibv3mRkyblI6IgU=; b=LsefckPb934DHhspJg6f4xRUoRYz/ZQjMUPGdPK4EA2h7Y7J2+xvAgO8ert4rYsbgp dtgMM82rR1anD6WaviBgqcJLMN11Ug1xbTjdly3oSHf1Tz1PPmwIxop8A5eyLnrsuBJG ZcAuO7iJXNdbbCcKJBcPTc1Z0b38BbTiw8hlQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=KxWL0GcuPkF0nP7mjPABByCYFd8bzrfOAjVXKZqG4v4UupYbKx5d8HTXNr0GCLU7sL pMHzmInT7fGQ6HUevMZ2I2c4cNoEODVdUVQt5M3CSXph01Nj3FwO+s7ttO3l1bQLjeve pi1z0NCaMDcoee7XvN/Pbx1fneOwaS0LdZcAY= MIME-Version: 1.0 Received: by 10.140.162.21 with SMTP id k21mr529570rve.256.1232822592133; Sat, 24 Jan 2009 10:43:12 -0800 (PST) In-Reply-To: <200901241055.20054.hselasky@c2i.net> References: <20090123154336.GJ60948@e.0x20.net> <200901240952.21670.hselasky@c2i.net> <200901241055.20054.hselasky@c2i.net> Date: Sat, 24 Jan 2009 10:43:12 -0800 Message-ID: From: Maksim Yevmenkin To: Hans Petter Selasky Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org, current@freebsd.org, Alfred Perlstein Subject: Re: panic: mutex Giant not owned at /usr/src/sys/kern/tty_ttydisc.c:1127 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Jan 2009 18:43:13 -0000 On Sat, Jan 24, 2009 at 1:55 AM, Hans Petter Selasky wrote: > On Saturday 24 January 2009, Maksim Yevmenkin wrote: >> Hans Petter, >> >> i'm sorry, did i mention there is no sleeping in netgraph? :-) > > Can you elaborate this? Is netgraph run from fast interrupt context? So that > only spin locks are possible? Or is it run from a thread? i already told you that netgraph is essentially single-threaded. for all intents and purposes think that only _ONE_ thread services _ALL_ netgraph nodes. if something can not be done immediately, netgraph queues it and returns back to it later. _ANY_ delay/sleep would stall _ENTIRE_ netgraph. think poll/select/callback driven programming model. >> from what i can see you are _NOT_ using _SPIN_ mutexes (aka spin >> locks). you are using regular mutexes. let me quote locking(9) man >> page >> >> " >> Mutexes >> Basically (regular) mutexes will deschedule the thread if the mutex >> can not be acquired. A non-spin mutex can be considered to be equivalent >> to getting a write lock on an rw_lock (see below), >> " >> >> so, if thread can not get mutex it will be descheduled. this >> absolutely can not happen in netgraph! > > There are mutexes inside the taskqueue aswell. The problem will be the same > there if you don't use a so-called fast tasqueue. well, yes. technically, its not 100% correct, but it has to be like it because there is simply no way around this. if you read the blob at the top of the ng_ubt2.c you know that it uses regular sc_mbufq_mtx (which can sleep). let me quote the blob here again " Access to the outgoing queues and task flags is controlled by the sc_mbufq_mtx lock. It is an unavoidable evil. Again, sc_mbufq_mtx should really be a spin lock." the sc_mbufq_mtx is _NOT_ marked as SPIN mutex only because WITNESS has to know about spin locks. it is generally NOT a good idea to add a spin lock when someone feels like it. now back to taskqueue_enqueue(), yes, taskqueue_enqueue_fast() is more appropriate here. however, taskqueue_enqueue() was chosen because taskqueue_enqueue() does not do anything that could cause sleep in TQ_LOCK()/TQ_UNLOCK() (its just basically a lookup). from taskqueue man page "Enqueueing a task does not perform any memory allocation which makes it suitable for calling from an interrupt handler." USB_BUS_LOCK()/USB_BUS_UNLOCK() are different is this regard because those locks used when usb2 does things with hardware. also, like i said, there could be few usb devices sharing the same bus and thus the same USB_BUS_LOCK. once again the rule is: NO SLEEPING/STALLING IN NETGRAPH. it DOES NOT mean that you can not use mutexes. it only means that you have to be very careful of how the mutex is used. >> first of all, i do not think crashes are caused by detach(). in fact, >> detach() is clean. i've tested it and it worked for me. i tried to >> start/stop device while doing flood l2ping. i also tried to yank the >> device while doing flood l2ping. it works correctly. i think, the >> issue is related to stalled transfers. there is still something wrong >> with the code path that deals with stalled transfers. stalls do not >> happen on my test box, so i can not test it. also there is NO code >> duplication. asynchronous path is required to decouple netgraph from >> usb2. > > Only if netgraph is run from fast interrupt context. no, no, no. we are going back in circles. i would very much like to do things synchronously, but i can not. imo, i've given you plenty reasoning behind async design. please understand, asyn design has to stay. even if "sync" code appears to work, its still wrong. >> regular mutexes can sleep. we are not allowed to sleep in netgraph. >> therefor we must transition out of netgraph context before calling >> into any code that tries to grab regular mutex. the async design is >> there not because i want to make things complicated. its there because >> it is needed. > > I think there are two definitions of sleeping. > > 1) When a thread is waiting for a mutex it is not sleeping in the same way > like if it was to call "tsleep()". > > 2) When a thread is waiting for a wakeup it is surely sleeping, which can > happen inside sx_lock() and tsleep(). when i say "sleeping" i mean that the thread that is trying to get the lock is de-scheduled. that is, it is not running anymore. i'm not talking about [tm]sleep() etc. just the fact that thread is not running. oh, and to possibly answer your next question, freebsd's mutexes are ADAPTIVE, meaning that thread will will spin on a mutex for a little bit if it can not grab it right away BEFORE going to sleep. this means that the sc_mbufq_mtx mutex is very likely to be an equivalent of a spin lock because it is mosty used to protect queue while en/dequeuing mbuf etc. the same COULD apply for TQ_LOCK in taskqueue_enqeue(). thanks, max