FreeBSD Mail Archives

Date:      Thu, 17 Nov 2011 14:40:45 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        freebsd-current@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject:   Re: Stop scheduler on panic
Message-ID:  <4EC500CD.6040305@FreeBSD.org>
In-Reply-To: <20111117111405.GT50300@deviant.kiev.zoral.com.ua>
References:  <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <20111116202714.5ee4bd53@fabiankeil.de> <4EC43764.1020202@FreeBSD.org> <4EC4423A.3020904@FreeBSD.org> <20111117081533.GP50300@deviant.kiev.zoral.com.ua> <4EC4C89A.2040208@FreeBSD.org> <20111117090653.GR50300@deviant.kiev.zoral.com.ua> <4EC4D3DE.8080103@FreeBSD.org> <20111117111405.GT50300@deviant.kiev.zoral.com.ua>

This is a multi-part message in MIME format.
--------------090104080008010605000800
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On 11/17/11 13:14, Kostik Belousov wrote:
> On Thu, Nov 17, 2011 at 11:29:02AM +0200, Alexander Motin wrote:
>> On 11/17/11 11:06, Kostik Belousov wrote:
>>> On Thu, Nov 17, 2011 at 10:40:58AM +0200, Alexander Motin wrote:
>>>> On 11/17/11 10:15, Kostik Belousov wrote:
>>>>> On Thu, Nov 17, 2011 at 01:07:38AM +0200, Alexander Motin wrote:
>>>>>> On 17.11.2011 00:21, Andriy Gapon wrote:
>>>>>>> on 16/11/2011 21:27 Fabian Keil said the following:
>>>>>>>> Kostik Belousov<kostikbel@gmail.com>  wrote:
>>>>>>>>
>>>>>>>>> I was tricked into finishing the work by Andrey Gapon, who developed
>>>>>>>>> the patch to reliably stop other processors on panic.  The patch
>>>>>>>>> greatly improves the chances of getting dump on panic on SMP host.
>>>>>>>>
>>>>>>>> I tested the patch trying to get a dump (from the debugger) for
>>>>>>>> kern/162036, which currently results in the double fault reported in:
>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-current/2011-September/027766.html
>>>>>>>>
>>>>>>>> It didn't help, but also didn't make anything worse.
>>>>>>>>
>>>>>>>> Fabian
>>>>>>>
>>>>>>> The mi_switch recursion looks very familiar to me:
>>>>>>> mi_switch() at mi_switch+0x270
>>>>>>> critical_exit() at critical_exit+0x9b
>>>>>>> spinlock_exit() at spinlock_exit+0x17
>>>>>>> mi_switch() at mi_switch+0x275
>>>>>>> critical_exit() at critical_exit+0x9b
>>>>>>> spinlock_exit() at spinlock_exit+0x17
>>>>>>> [several pages of the previous three lines skipped]
>>>>>>> mi_switch() at mi_switch+0x275
>>>>>>> critical_exit() at critical_exit+0x9b
>>>>>>> spinlock_exit() at spinlock_exit+0x17
>>>>>>> intr_even_schedule_thread() at intr_event_schedule_thread+0xbb
>>>>>>> ahci_end_transaction() at ahci_end_transaction+0x398
>>>>>>> ahci_ch_intr() at ahci_ch_intr+0x2b5
>>>>>>> ahcipoll() at ahcipoll+0x15
>>>>>>> xpt_polled_action() at xpt_polled_action+0xf7
>>>>>>>
>>>>>>> In fact I once discussed with jhb this recursion triggered from a different
>>>>>>> place.  To quote myself:
>>>>>>> <avg>    spinlock_exit ->  critical_exit ->  mi_switch ->  kdb_switch ->
>>>>>>> thread_unlock ->  spinlock_exit ->  critical_exit ->  mi_switch ->  ...
>>>>>>> <avg>    in the kdb context
>>>>>>> <avg>    this issue seems to be triggered by td_owepreempt being true at 
>>>>>>> the time
>>>>>>> kdb is entered
>>>>>>> <avg>    and there of course has to be an initial spinlock_exit call 
>>>>>>> somewhere
>>>>>>> <avg>    in my case it's because of usb keyboard
>>>>>>> <avg>    I wonder if it would make sense to clear td_owepreempt right 
>>>>>>> before
>>>>>>> calling kdb_switch in mi_switch
>>>>>>> <avg>    instead of in sched_switch()
>>>>>>> <avg>    clearing td_owepreempt seems like a scheduler-independent 
>>>>>>> operation to me
>>>>>>> <avg>    or is it better to just skip locking in usb when kdb_active is set
>>>>>>> <avg>    ?
>>>>>>>
>>>>>>> The workaround described above should work in this case.
>>>>>>> Another possibility is to pessimize mtx_unlock_spin() implementations to 
>>>>>>> check
>>>>>>> SCHEDULER_STOPPED() and to bypass any further actions in that case.  But 
>>>>>>> that
>>>>>>> would add unnecessary overhead to the sunny day code paths.
>>>>>>>
>>>>>>> Going further up the stack one can come up with the following proposals:
>>>>>>> - check SCHEDULER_STOPPED() swi_sched() and return early
>>>>>>> - do not call swi_sched() from xpt_done() if we somehow know that we are 
>>>>>>> in a
>>>>>>> polling mode
>>>>>>
>>>>>> There is no flag in CAM now to indicate polling mode, but if needed, it 
>>>>>> should not be difficult to add one and not call swi_sched().
>>>>>
>>>>> I have the following change for eons on my test boxes. Without it,
>>>>> I simply cannot get _any_ dump.
>>>>>
>>>>> diff --git a/sys/cam/cam_xpt.c b/sys/cam/cam_xpt.c
>>>>> index 10b89c7..a38e42f 100644
>>>>> --- a/sys/cam/cam_xpt.c
>>>>> +++ b/sys/cam/cam_xpt.c
>>>>> @@ -4230,7 +4230,7 @@ xpt_done(union ccb *done_ccb)
>>>>>  			TAILQ_INSERT_TAIL(&cam_simq, sim, links);
>>>>>  			mtx_unlock(&cam_simq_lock);
>>>>>  			sim->flags |= CAM_SIM_ON_DONEQ;
>>>>> -			if (first)
>>>>> +			if (first && panicstr == NULL)
>>>>>  				swi_sched(cambio_ih, 0);
>>>>>  		}
>>>>>  	}
>>>>
>>>> That should be OK for kernel dumping. I was thinking about CAM abusing
>>>> polling not only for dumping. But looking on cases where it does it now,
>>>> may be it is better to rewrite them instead.
>>>
>>> So, should I interpret your response as 'Reviewed by" ?
>>
>> It feels somehow dirty to me. I don't like these global variables. If
>> you consider it is fine, proceed, I see no much harm. But if not, I can
>> add polling flag to the CAM. Flip a coin for me. :)
> You promised to add the polling at summer' meeting in Kiev. Will you do
> it now ?

Sorry, I've probably forgot. The patch is attached.

-- 
Alexander Motin

--------------090104080008010605000800
Content-Type: text/plain;
 name="cam_poll.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="cam_poll.patch"

Index: cam_sim.h
===================================================================
--- cam_sim.h	(revision 227580)
+++ cam_sim.h	(working copy)
@@ -104,7 +104,8 @@
 	u_int32_t		flags;
 #define	CAM_SIM_REL_TIMEOUT_PENDING	0x01
 #define	CAM_SIM_MPSAFE			0x02
-#define CAM_SIM_ON_DONEQ		0x04
+#define	CAM_SIM_ON_DONEQ		0x04
+#define	CAM_SIM_POLLED			0x08
 	struct callout		callout;
 	struct cam_devq 	*devq;	/* Device Queue to use for this SIM */
 	int			refcount; /* References to the SIM. */
Index: cam_xpt.c
===================================================================
--- cam_xpt.c	(revision 227580)
+++ cam_xpt.c	(working copy)
@@ -2957,6 +2957,9 @@
 
 	mtx_assert(sim->mtx, MA_OWNED);
 
+	/* Don't use ISR for this SIM while polling. */
+	sim->flags |= CAM_SIM_POLLED;
+
 	/*
 	 * Steal an opening so that no other queued requests
 	 * can get it before us while we simulate interrupts.
@@ -2996,6 +2999,9 @@
 	} else {
 		start_ccb->ccb_h.status = CAM_RESRC_UNAVAIL;
 	}
+
+	/* Use CAM ISR for this SIM again. */
+	sim->flags &= ~CAM_SIM_POLLED;
 }
 
 /*
@@ -4224,7 +4230,7 @@
 		TAILQ_INSERT_TAIL(&sim->sim_doneq, &done_ccb->ccb_h,
 		    sim_links.tqe);
 		done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX;
-		if ((sim->flags & CAM_SIM_ON_DONEQ) == 0) {
+		if ((sim->flags & (CAM_SIM_ON_DONEQ | CAM_SIM_POLLED)) == 0) {
 			mtx_lock(&cam_simq_lock);
 			first = TAILQ_EMPTY(&cam_simq);
 			TAILQ_INSERT_TAIL(&cam_simq, sim, links);

--------------090104080008010605000800--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EC500CD.6040305>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation