Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Jun 2004 22:27:16 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Simon Barner <barner@in.tum.de>
Cc:        current@freebsd.org
Subject:   Re: Bogus signal handler causes kernel panic (5.2.1-p8/i386)
Message-ID:  <20040617215851.V1012@gamplex.bde.org>
In-Reply-To: <20040617134101.V1345@gamplex.bde.org>
References:  <20040616105706.GC1140@zi025.glhnet.mhn.de> <20040617134101.V1345@gamplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 17 Jun 2004, Bruce Evans wrote:

> On Wed, 16 Jun 2004, Simon Barner wrote:
>
> > I tried the local denial of service attack described in [1], that was
> > reported for Linux 2.4 and 2.6 some days ago (see [2] for the original
> > thread in linux.kernel)  on my FreeBSD 5.2.1-p8 system.
> >
> > The result is a kernel panic (back trace attached).
> >
> > Since des@ told me in a private mail, that he could not reprocduce the
> > panic on -CURRENT, I'd like to ask how to proceed from here.
>
> I couldn't reproduce it either, but I think this is just accidental.
> It takes a particular combination of FPU exceptions and masks to cause
> the panic.

I can now reproduce it.  The case of it involving signal handlers takes
all of the following:
(a) doing FP calculations in a signal handler.  This is not normally
    useful, and is not supported in RELENG_4.
(b) generating an unmasked pending but not trapped on exception in (a).
(c) using an old CPU, or using CPU_DISABLE_SSE.  The bug doesn't affect
    the SSE (FXSR) case like I first thought.
(d) running 5.x.  The bug is not in RELENG_4 like I first thought.
(e) running 5.x unmodified.  I use 4.x signal handlers in my 5.x userland
    for backwards compatibility.  This unsupports doing FP in signal
    handlers as in (a) and happens to avoid going near the bug.

> Try the following quick fix.  It is for -current but should work for
> RELENG_5 and RELENG_4 too.  (Note that it changes fpurstor(), not
> fpusave().  The patch context is not large enough to be unambigous.)
> It might be incomplete.

Try the following not so quick fix.  It is for -current but should work
for RELENG_5 too.

%%%
Index: npx.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/isa/npx.c,v
retrieving revision 1.149
diff -u -2 -r1.149 npx.c
--- npx.c	6 Jun 2004 15:17:44 -0000	1.149
+++ npx.c	17 Jun 2004 11:28:13 -0000
@@ -873,4 +924,13 @@
 	struct thread *td;

+	/*
+	 * Discard pending exceptions in the !cpu_fxsr case so that unmasked
+	 * ones don't cause a panic on the next frstor.
+	 */
+#ifdef CPU_ENABLE_SSE
+	if (!cpu_fxsr)
+#endif
+		fnclex();
+
 	td = PCPU_GET(fpcurthread);
 	PCPU_SET(fpcurthread, NULL);
%%%

> I think RELENG_4 has the problem too in the (CPU_ENABLE_SSE && cpu_fxsr)
> case.  Then fpusave() doesn't reset the npx state, so there may be a
> pending exception from the previous process.  fpurstor() then traps if
> it happens to restore a state that has the exception unmasked.  There
> used to be no problem because the previous state was always put away
> using the fnsave instruction, and fnsave has the side effect of
> initializing a clean state, in particular a state that doesn't have
> any pending exceptions.  This has been broken in 2 ways:
> - in RELENG_4 and -current, fxsave is used instead of fnsave in the
>   (CPU_ENABLE_SSE && cpu_fxsr) case).  fxsave doesn't have the side
>   effect.
> - in -current, the previous state is sometimes dropped instead of
>   saved.  This is entirely in software, so it doesn't have the side
>   effect.

Actually, there is only a problem from dropping the state, only in the
!fxsr case.  There is no problem using fxsave+fxrstor because fxrstor
works right.

> > [1] http://linuxreviews.org/news/2004-06-11_kernel_crash/#toc1
> > [2] http://groups.google.de/groups?hl=de&lr=&ie=UTF-8&frame=right&th=f7580d647408b95b&seekm=26hGq-Zr-31%40gated-at.bofh.it#link1
>
> The bug is a little different in Linux.  Linux uses more synchronization
> instructions, perhaps unnecessarily.  Its version of dropping the state
> isn't entirely in software.  It has an fwait that was not preceded by an
> fnclex, so it paniced if there were any pending exceptions in the state
> being dropped.

The above fix makes the difference littler.  It puts the fnclex in
npxdrop() instead of in fpurstor() because the later is called more
often.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040617215851.V1012>