Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 May 2002 18:08:37 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Archie Cobbs <archie@dellroad.org>
Cc:        freebsd-net@FreeBSD.ORG, <freebsd-hackers@FreeBSD.ORG>
Subject:   Re: splimp() during panic?
Message-ID:  <20020525172252.B6594-100000@gamplex.bde.org>
In-Reply-To: <200205241807.g4OI7Ja70244@arch20m.dellroad.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 24 May 2002, Archie Cobbs wrote:

> I'm trying to debug a mbuf corruption bug in the kernel. I've added
> an mbuf sanity check routine which calls panic() if anything is amiss
> with the mbuf free list, etc. This function runs at splimp() and if/when
> it calls panic() the cpl is still at splimp().
>
> My question is: does this guarantee that the mbuf free lists, etc. will
> not be modified between the time panic() is called and the time a core
> file is generated? For example, if an incoming packet causes a networking
> interrupt after panic() has been called but before the core file is
> written, will that interrupt be blocked when it calls splimp()?

No (apart from it being too late to block the interrupt after it has
occurred).  panic() should run entirely at the ipl that it is called
at, or higher, and it should not undo any other interrupt disables
(e.g. the CPU interrupt (un)mask  or the ICU or APIC interrupt masks
on i386's), since unmasking might cause various problems including
corruption of your data structures.  However, panic() is too broken
to actually keep interrupts masked.  If does a sync() very early, and
sync() obviously cannot work with interrupts masked, since it wanders
off into normal disk i/o code that depends on disk interrupts being
enabled to work (actually it is the wait for i/o to complete after the
sync() that depends on disk interrupts working).  But sync() in panic()
usually does work in FreeBSD-[1-4].  The usual mechanism for clobbering
the interrupt masks so that it works is calling tsleep().  tsleep()
knows that it is in a panic, but still "helpfully" enables interrupts.
From the RELENG_4 version:

	if (cold || panicstr) {
		/*
		 * After a panic, or during autoconfiguration,
		 * just give interrupts a chance, then just return;
		 * don't run any other procs or panic below,
		 * in case this is the idle process and already asleep.
		 */
		splx(safepri);
		splx(s);
		return (0);
	}

You could try setting safepri to a priority that is actually safe (0xffff
on i386's).  There may be other ipl-clobbering mechanism though.

sync() in panic() tends to not work in -current, since things are locked
by mutexes and there is no kludge like the above to unlock them.  The
usual failure is to panic recursively on hitting a non-recursive mutex
that is already held, usually the same one (in or near bremfree IIRC).
There is some chance of dump working for recursive panics, but data
structures may already have been clobbered.

panic() has two defenses against endless recursion: it turns off sync()
after the first entry to panic(), and it turns off dumping after the
first entry to doadump().  It has no defense against recursion in all
the EVENTHANDLER_INVOKE() shutdowns.  All the event handlers are
apparently supposed to have their own defenses :-(.

> If this is not a valid assumption, is there an easy way to 'freeze'
> the mbuf free lists long enough to generate the core file when an
> inconsistency is found (other than adding the obvious hack)?

Not if removing RB_SYNC is the obvious hack :-).  Removing everything
except the dump and the final EVENTHANDLER_INVOKE() in boot() should
help.  (One event handler shutdown is still needed to reboot the system,
but it is after the dump so you don't care if it corrupts your structures).
Maybe add code to splx() to check that the ipl is not lowered below its
value at the start of panic().

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020525172252.B6594-100000>