Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Jul 2017 18:44:46 -0700
From:      Mark Johnston <markj@FreeBSD.org>
To:        Eugene Grosbein <eugen@grosbein.net>
Cc:        FreeBSD Stable <freebsd-stable@FreeBSD.org>
Subject:   Re: stable/11 debugging kernel unable to produce crashdump again
Message-ID:  <20170724014445.GA20872@raichu>
In-Reply-To: <59746BD5.5010301@grosbein.net>
References:  <587928B3.2050607@grosbein.net> <20170113193726.GC77535@wkstn-mjohnston.west.isilon.com> <587A0E12.7070205@grosbein.net> <59746BD5.5010301@grosbein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jul 23, 2017 at 04:26:45PM +0700, Eugene Grosbein wrote:
> On 14.01.2017 18:40, Eugene Grosbein wrote:
> > 
> >> I suspect that this is because we only stop the scheduler upon a panic
> >> if SMP is configured. Can you retest with the patch below applied?
> >>
> >> Index: sys/kern/kern_shutdown.c
> >> ===================================================================
> >> --- sys/kern/kern_shutdown.c	(revision 312082)
> >> +++ sys/kern/kern_shutdown.c	(working copy)
> >> @@ -713,6 +713,7 @@
> >>  		CPU_CLR(PCPU_GET(cpuid), &other_cpus);
> >>  		stop_cpus_hard(other_cpus);
> >>  	}
> >> +#endif
> >>  
> >>  	/*
> >>  	 * Ensure that the scheduler is stopped while panicking, even if panic
> >> @@ -719,7 +720,6 @@
> >>  	 * has been entered from kdb.
> >>  	 */
> >>  	td->td_stopsched = 1;
> >> -#endif
> >>  
> >>  	bootopt = RB_AUTOBOOT;
> >>  	newpanic = 0;
> >>
> >>
> > 
> > Indeed, my router is uniprocessor system and your patch really solves the problem.
> > Now kernel generates crashdump just fine in case of panic. Please commit the fix, thanks!
> 
> Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump:

Is this amd64 GENERIC, or something else?

> 
> - "call doadump" from DDB prompt works just fine;
> - "shutdown -r now" reboots the system without problems;
> - "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs just afer showing uptime
> instead of continuing with crashdump generation; same if "real" panic occurs.
> 
> Same for debug.minidump set to 1 or 0. How do I debug this?

I'm not able to reproduce the problem in bhyve using r321401. Looking
at the code, the culprits might be cngrab(), or one of the
shutdown_post_sync eventhandlers. Since you're apparently able to see
the console output at the time of the panic, I guess it's probably the
latter. Could you try your test with the patch below applied? It'll
print a bunch of "entering post_sync"/"leaving post_sync" messages with
addresses that can be resolved using kgdb. That'll help determine where
we're getting stuck.

Index: sys/sys/eventhandler.h
===================================================================
--- sys/sys/eventhandler.h	(revision 321401)
+++ sys/sys/eventhandler.h	(working copy)
@@ -85,7 +85,11 @@
 			_t = (struct eventhandler_entry_ ## name *)_ep;	\
 			CTR1(KTR_EVH, "eventhandler_invoke: executing %p", \
  			    (void *)_t->eh_func);			\
+			if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
+				printf("entering post_sync %p\n", (void *)_t->eh_func); \
 			_t->eh_func(_ep->ee_arg , ## __VA_ARGS__);	\
+			if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
+				printf("leaving post_sync %p\n", (void *)_t->eh_func); \
 			EHL_LOCK((list));				\
 		}							\
 	}								\



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170724014445.GA20872>