Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Aug 2008 10:52:08 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        julian@FreeBSD.org, current@FreeBSD.org
Subject:   Re: rtentry panic with FIB
Message-ID:  <alpine.BSF.1.10.0808301049420.59527@fledge.watson.org>
In-Reply-To: <200808291636.10656.jhb@FreeBSD.org>
References:  <200808291636.10656.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Fri, 29 Aug 2008, John Baldwin wrote:

> Unfortunately it hung trying to dump, so all I have is the stack trace from 
> DDB.  This is recent HEAD running stress2
>
> panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry @ ../../1

Kip and I have theorized that increased parallelism at higher layers of the 
network stack is exposing route locking and reference counting to more stress 
than it had done previously, and that as such we're starting to trigger races 
in the routing code more than we used to.  While I wouldn't rule out a 
FIB-related bug, it seems more likely to me that we've hit a general bug in 
locking/references in the ethernet link layer / ARP, and we need to take a 
careful look at what's going on throughout that layer.

Unfortunately, that's not something I have time to work on currently, so it 
would be great if people with an existing interest in the routing code (Julian 
and Qing have done the most work there recently?) could spend a few hours 
looking really carefully at what is happening.

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> cpuid = 1
> KDB: enter: panic
> [thread pid 14025 tid 100928 ]
> Stopped at      kdb_enter+0x3d: movq    $0,0x435054(%rip)
> db> tr
> Tracing pid 14025 tid 100928 td 0xffffff0003773360
> kdb_enter() at kdb_enter+0x3d
> panic() at panic+0x14b
> _mtx_lock_flags() at _mtx_lock_flags
> _mtx_lock_flags() at _mtx_lock_flags+0xc3
> rt_check_fib() at rt_check_fib+0x1ea
> arpresolve() at arpresolve+0x77
> ether_output() at ether_output+0x180
> ip_output() at ip_output+0xb4f
> udp_send() at udp_send+0x47d
> sosend_dgram() at sosend_dgram+0x1fa
> soo_write() at soo_write+0x30
> dofilewrite() at dofilewrite+0x7a
> kern_writev() at kern_writev+0x52
> write() at write+0x4d
> syscall() at syscall+0x1bf
> Xfast_syscall() at Xfast_syscall+0xab
> --- syscall (4, FreeBSD ELF64, write), rip = 0x80071cb7c, rsp =
> 0x7fffffffe628,-
> db> c
> Uptime: 1h39m18s
> Physical memory: 2038 MB
> Dumping 263 MB:pid 14025 (udp), uid 26840, was killed: exceeded maximum CPU
> limt
> pid 14099 (udp), uid 26840, was killed: exceeded maximum CPU limit
> pid 14100 (udp), uid 26840, was killed: exceeded maximum CPU limit
>
> -- 
> John Baldwin
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.1.10.0808301049420.59527>