From owner-freebsd-current@FreeBSD.ORG Thu Dec 1 16:59:13 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C082106567B; Thu, 1 Dec 2011 16:59:13 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id C94DE8FC14; Thu, 1 Dec 2011 16:59:12 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA16172; Thu, 01 Dec 2011 18:59:11 +0200 (EET) (envelope-from avg@FreeBSD.org) Message-ID: <4ED7B25E.8070002@FreeBSD.org> Date: Thu, 01 Dec 2011 18:59:10 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:8.0) Gecko/20111109 Thunderbird/8.0 MIME-Version: 1.0 To: John Baldwin References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <201111171638.03208.jhb@freebsd.org> <4EC6D544.5040803@FreeBSD.org> <201111211132.42119.jhb@freebsd.org> In-Reply-To: <201111211132.42119.jhb@freebsd.org> X-Enigmail-Version: undefined Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-current@FreeBSD.org Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Dec 2011 16:59:13 -0000 [cc list trimmed] on 21/11/2011 18:32 John Baldwin said the following: > On Friday, November 18, 2011 4:59:32 pm Andriy Gapon wrote: >> on 17/11/2011 23:38 John Baldwin said the following: >>> On Thursday, November 17, 2011 4:35:07 pm John Baldwin wrote: >>>> Hmmm, you could also make critical_exit() not perform deferred preemptions >>>> if SCHEDULER_STOPPED? That would fix the recursion and still let the >>>> preemption "work" when resuming from the debugger? Just to clarify, probably you are actually suggesting to not perform deferred preemptions if kdb_active == TRUE. Because that's where we get the recursion (via kdb_switch). I think that if we get into the mi_switch in a state where !kdb_active && SCHEDULER_STOPPED(), then we probably should just - I don't know - panic again? [the following is preserved for context] >> Yes, that's a good solution, I think. I just didn't want to touch such a "low >> level" code, but I guess there is no rational reason for that. >> >>> Or you could let the debugger run in a permament critical section (though >>> perhaps that breaks too many other things like debugger routines that try >>> to use locks like the 'kill' command (which is useful!)). Effectively what you >>> are trying to do is having the debugger run in a critical section until the >>> debugger is exited. It would be cleanest to let it run that way explicitly >>> if possible since then you don't have to catch as many edge cases. >> >> I like this idea, but likely it would take some effort to get done. > > Yes, it would take some effort, so checking SCHEDULER_STOPPED in > critical_exit() is probably good for the short term. Would be nice to fix > it properly some day. > >> Related to this is something that I attempted to discuss before. I think that >> because the debugger acts on a frozen system image (the debugger is a sole actor >> and observer), the debugger should try to minimize its interaction with the >> debugged system. In this vein I think that the debugger should also bypass any >> locks just like with SCHEDULER_STOPPED. The debugger should also be careful to >> note a state of any subsystems that it uses (e.g. the keyboard) and return them >> to the initial state if it had to be changed. But that's a bit different story. >> And I really get beyond my knowledge zone when I try to think about things like >> handling 'call func_xxxx' in the debugger where func_xxxx may want to acquire >> some locks or noticeably change some state within a system. > > I think to some extent, I think if a user calls a function from the debugger > they better know what they are doing. However, I think it can also be useful > to have the debugger modify the system in some cases if it can safely do so > (e.g. the 'kill' command from DDB can be very useful, and IIRC, it is careful > to only use try locks and just fail if it can't acquire the needed locks). > >> But to continue about the locks... I have this idea to re-implement >> SCHEDULER_STOPPED as some more general check that could be abstractly denoted as >> LOCKING_POLICY_CHECK(context). Where the context could be defined by flags like >> normal, in-panic, in-debugger, etc. And the locking policies could be: normal, >> bypass, warn, panic, etc. >> >> However, I am not sure if this could be useful (and doable properly) in >> practice. I am just concerned with the interaction between the debugger and the >> locks. It still seems to me inconsistent that we are going with >> SCHEDULER_STOPPED for panic, but we are continuing to use "if (!kdb_active)" >> around some locks that could be problematic under kdb (e.g. in USB). In my >> opinion the amount of code shared between normal context and kdb context is >> about the same as amount of code shared between normal context and panic >> context. But I haven't really quantified those. > > I think you need to keep the 'kill' case in mind. In that case you don't want > to ignore locks, but the code is carefully written to use try locks instead (or > should be). > -- Andriy Gapon