Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 04 Apr 2006 09:42:05 -0700
From:      Sam Leffler <sam@errno.com>
To:        nielsen@memberwebs.com
Cc:        freebsd-net@freebsd.org
Subject:   Re: Panic (race condition?) in ipsec_process_done
Message-ID:  <4432A1DD.5030304@errno.com>
In-Reply-To: <20060403184402.9DA3EDCAC70@mail.npubs.com>
References:  <20060403184402.9DA3EDCAC70@mail.npubs.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Nate Nielsen wrote:
> I've been experiencing a panic in ipsec_process_done. Below is a
> backtrace and a patch which supresses the issue. I don't profess to
> understand the IPSec code completely...
> 
> The panic occurs when performing IKE negotiations (racoon) with multiple
> systems at the same time. The panicing boxes are routers, and running a
> slow CPU so negotiations take several seconds.
> 
> Immediately after boot and while IKE is going on the system panics.
> Needless to say after the reboot (after panic) IKE happens again and
> this results in a the box rebooting over and over.
> 
> I'm guessing this a is due to a halfway setup IPSec keys.
> 
> For me this issue only happens on production systems, so debugging is
> very difficult, but I've managed to get a kernel dump and backtrace.
> 
> The patch (below) is probably incomplete, but prevents the problem from
> happening for me.
> 
> 
> USING
>   - FreeBSD 6.0
>   - FAST_IPSEC
>   - Hardware encryption (hifn driver, aes algorithm)
>   - ipsec-tools 0.6.2
>   - Soekris net4826
> 
> 
> BACKTRACE
> 
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x70
> fault code              = supervisor read, page not present
> instruction pointer     = 0x20:0xc05ee61e
> stack pointer           = 0x28:0xc6e43ca4
> frame pointer           = 0x28:0xc6e43cb4
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 6 (crypto returns)
> trap number             = 12
> panic: page fault
> Uptime: 1m6s
> Dumping 109 MB (2 chunks)
>   chunk 0: 1MB (159 pages) ... ok
>   chunk 1: 109MB (27904 pages) 94 78 62 46 30 14
> 
> (kgdb) backtrace
> #0  doadump () at pcpu.h:165
> #1  0xc050fcb2 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
> #2  0xc050ff48 in panic (fmt=0xc06c6078 "%s")
>     at /usr/src/sys/kern/kern_shutdown.c:555
> #3  0xc06a0c00 in trap_fatal (frame=0xc6e43c64, eva=112)
>     at /usr/src/sys/i386/i386/trap.c:831
> #4  0xc06a096b in trap_pfault (frame=0xc6e43c64, usermode=0, eva=112)
>     at /usr/src/sys/i386/i386/trap.c:742
> #5  0xc06a05a9 in trap (frame=
>       {tf_fs = -1006895096, tf_es = 167968808, tf_ds = 168099880, tf_edi
> = -1059907712, tf_esi = -1060580736, tf_ebp = -958120780, tf_isp =
> -958120816, tf_ebx = -1061533440, tf_edx = -1061533440, tf_ecx =
> -1059907712, tf_eax = 0, tf_trapno = 12, tf_err = -1065091072, tf_eip =
> -1067522530, tf_cs = -1060634592, tf_eflags = 66178, tf_esp = 0, tf_ss =
> -1061533440})
>     at /usr/src/sys/i386/i386/trap.c:432
> #6  0xc06903ba in calltrap () at /usr/src/sys/i386/i386/exception.s:139
> #7  0xc05ee61e in ipsec_process_done (m=0xc0b6e100, isr=0xc0ba4900)
>     at /usr/src/sys/netipsec/ipsec_output.c:96
> #8  0xc05fbe29 in esp_output_cb (crp=0xc0d31780)
>     at /usr/src/sys/netipsec/xform_esp.c:919
> #9  0xc061c5d8 in crypto_ret_proc () at
> /usr/src/sys/opencrypto/crypto.c:1227
> #10 0xc04f9c48 in fork_exit (callout=0xc061c4c4 <crypto_ret_proc>, arg=0x0,
>     frame=0xc6e43d38) at /usr/src/sys/kern/kern_fork.c:789
> #11 0xc069041c in fork_trampoline () at
> /usr/src/sys/i386/i386/exception.s:208
> 
> 
> PATCH
> 
> --- sys/netipsec/ipsec_output.c.orig    Mon Apr  3 17:58:32 2006
> +++ sys/netipsec/ipsec_output.c Mon Apr  3 17:57:52 2006
> @@ -93,6 +93,13 @@
> 
>         IPSEC_ASSERT(m != NULL, ("null mbuf"));
>         IPSEC_ASSERT(isr != NULL, ("null ISR"));
> +
> +       /* XXX This happens. Figure out why. */
> +       if (!isr->sav) {
> +               m_freem (m);
> +               return ENOBUFS;
> +       }
> +
>         sav = isr->sav;
>         IPSEC_ASSERT(sav != NULL, ("null SA"));
>         IPSEC_ASSERT(sav->sah != NULL, ("null SAH"));
> 

This is indicative of an SA being recycled while traffic is active (e.g. 
IKE rekeying of an active tunnel).  You'll note the assert just below 
where things blew up.  This means something has changed in the stack 
such that the locking is no longer covering state changes.

	Sam



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4432A1DD.5030304>