From owner-freebsd-net@FreeBSD.ORG Fri Feb 23 15:05:15 2007 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 71A6516A402 for ; Fri, 23 Feb 2007 15:05:15 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0787313C48E for ; Fri, 23 Feb 2007 15:05:12 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 5C440476B7; Fri, 23 Feb 2007 10:05:12 -0500 (EST) Date: Fri, 23 Feb 2007 15:05:12 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Pramod Srinivasan In-Reply-To: <5EB31780BD297F46812C8F495FA08F620A86FF54@electron.jnpr.net> Message-ID: <20070223145802.S88189@fledge.watson.org> References: <5EB31780BD297F46812C8F495FA08F620A86FF54@electron.jnpr.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: sleeping thread X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Feb 2007 15:05:15 -0000 On Thu, 22 Feb 2007, Pramod Srinivasan wrote: > I am coming across a weird issue with FreeBSD 6.1, any help appreciated. > > The problem is the following: > > One thread (1) does a setsockopt, grabs a lock in udp_usrreq, calls copyin > which hits a pagefault, this leads to that thread sleeping by calling > msleep. Performing copying/copyout while holding a mutex, especially one also required from interrupt or software interrupt context, is a bug for precisely the reason you describe here: interrupt context can get blocked waiting on an unbounded operation such as a disk read. However, I'm slightly confused by your stacktrace: in FreeBSD 6.1, there is no udp_ctloutput(), only a udp_ctlinput(). This aside, there have been a number of problems relating to ip_ctloutput() holding locks over calls to copy in and out socket buffer arguments. I believe these are mostly fixed in 7.x, and actually largely fixed in 6.x, although possibly after 6.1. The basic approach to fixing this is to either not acquire the locks until after the copy operation, or release them before the operation. This turns out to be a bit tricky, because certain pointers remain stable only while the locks are held, so the pointers may need to be re-derived or re-validated with the locks acquired. Diffing ip_ctloutput.c between 6.1 and 6.2 will most likely give you a sense of what is required: http://fxr.watson.org/fxr/diff/netinet/ip_output.c?v=RELENG61;diffval=RELENG62;diffvar=v In particular, look at the comment at the start of ip_ctloutput(). Robert N M Watson Computer Laboratory University of Cambridge > > msleep(f01cd488,c09fe6a0,44,c0956c79,0) at > bwait(f01cd488,44,c0956c79) at > vnode_pager_input_smlfs(c10487bc,c2740ae0,0,1,fcd6d918) at > vnode_pager_generic_getpages(ccd0bcf0,fcd6da50,1000,0,fcd6d978) at > vop_stdgetpages(fcd6d98c) at > > Another thread (netisr) which is processing some udp packet tries to > grab the same lock but since it's already held by thread 1, tries to > propagate the priority and panics because there is a check in the code > in propagate_priority which causes the panic > > /* > * If the thread is asleep, then we are probably about > * to deadlock. To make debugging this easier, just > * panic and tell the user which thread misbehaved so > * they can hopefully get a stack trace from the truly > * misbehaving thread. > */ > if (TD_IS_SLEEPING(td)) { > printf( > "Sleeping thread (tid %d, pid %d) owns a non-sleepable > lock\n", > td->td_tid, td->td_proc->p_pid); > #ifdef DDB > db_trace_thread(td, -1); > #endif > panic("sleeping thread"); > } > > Below is the output with witness turned on.... > > Not sure how to go forward with this, any pointers? > > Thanks, > Pramod > > lock order reversal: (sleepable after non-sleepable) > 1st 0xc0a20a8c udp (udp) @ src/sys/netinet/udp_usrreq.c:1523 > 2nd 0xccdbee54 user map (user map) @ src/sys/vm/vm_map.c:3005 > KDB: stack backtrace: > kdb_backtrace(0,ffffffff,c09c1b40,c09c16e0,c0978c6c) at > witness_checkorder(ccdbee54,9,c09305b4,bbd) at > _sx_xlock(ccdbee54,c09305a8,bbd) at > _vm_map_lock_read(ccdbee10,c09305a8,bbd,1d6d9b4,ccdd76a8) at > vm_map_lookup(fcd6da40,8097000,1,fcd6da44,fcd6da34) at > vm_fault(ccdbee10,8097000,1,0,ccdd5000) at > trap_pfault(fcd6db08,0,8097940) at > trap(fcd60008,ccdd0028,28,fcd6db94,8097940) at > calltrap() at > --- trap 0xc, eip = 0xc08a5e06, esp = 0xfcd6db48, ebp = 0xfcd6db68 --- > slow_copyin(fcd6dc88,fcd6db94,4,4,fcd6db98) at > ip_ctloutput(ccdfd4ec,fcd6dc88,0,c054f464,0) at > udp_ctloutput(ccdfd4ec,fcd6dc88,246,c0977524,ccdf3c2c) at > sosetopt(ccdfd4ec,fcd6dc88,ccde5090,1,0) at > kern_setsockopt(ccdd5000,6,0,6d,8097940) at > setsockopt(ccdd5000,fcd6dd04,5,2,292) at > syscall(3b,3b,3b,0,7a6c) at > Xint0x80_syscall() at > --- syscall (105, FreeBSD ELF32, setsockopt), eip = 0x881d1787, esp = > 0xbfbfddec, ebp = 0xbfbfde48 --- > Acquiring lockmgr lock "isofs" with the following non-sleepable locks > held: > exclusive sleep mutex udp r = 0 (0xc0a20a8c) locked @ > src/sys/netinet/udp_usrreq.c:1523 > KDB: stack backtrace: > kdb_backtrace(1,1,1,3041,ccd0bd6c) at > witness_warn(5,c09b211c,c09416bb,c09406a1) at > lockmgr(ccd0bd48,3041,ccd0bd6c,ccdd5000,fcd6d918) at > vop_stdlock(fcd6d938,3041,ccd0bcf0,fcd6d954,c058e20c) at > VOP_LOCK_APV(c09704e0,fcd6d938) at > vn_lock(ccd0bcf0,3041,ccdd5000,ccd0bcf0,3041) at > vget(ccd0bcf0,3041,ccdd5000) at > vnode_pager_lock(ccdf1840,ccdf1840,ccdf1840,0,c0930058) at > vm_fault(ccdbee10,8097000,1,0,ccdd5000) at > trap_pfault(fcd6db08,0,8097940) at > trap(fcd60008,ccdd0028,28,fcd6db94,8097940) at > calltrap() at > --- trap 0xc, eip = 0xc08a5e06, esp = 0xfcd6db48, ebp = 0xfcd6db68 --- > slow_copyin(fcd6dc88,fcd6db94,4,4,fcd6db98) at > ip_ctloutput(ccdfd4ec,fcd6dc88,0,c054f464,0) at > udp_ctloutput(ccdfd4ec,fcd6dc88,246,c0977524,ccdf3c2c) at > sosetopt(ccdfd4ec,fcd6dc88,ccde5090,1,0) at > kern_setsockopt(ccdd5000,6,0,6d,8097940) at > setsockopt(ccdd5000,fcd6dd04,5,2,292) at > syscall(3b,3b,3b,0,7a6c) at > Xint0x80_syscall() at > --- syscall (105, FreeBSD ELF32, setsockopt), eip = 0x881d1787, esp = > 0xbfbfddec, ebp = 0xbfbfde48 --- > Sleeping on "vnsrd" with the following non-sleepable locks held: > exclusive sleep mutex udp r = 0 (0xc0a20a8c) locked @ > src/sys/netinet/udp_usrreq.c:1523 > KDB: stack backtrace: > kdb_backtrace(1,1,1,ccdd763c,ccdd5000) at > witness_warn(5,c09fe6a0,c0942b95,c0956c79) at > msleep(f01cd488,c09fe6a0,44,c0956c79,0) at > bwait(f01cd488,44,c0956c79) at > vnode_pager_input_smlfs(c10487bc,c2740ae0,0,1,fcd6d918) at > vnode_pager_generic_getpages(ccd0bcf0,fcd6da50,1000,0,fcd6d978) at > vop_stdgetpages(fcd6d98c) at > VOP_GETPAGES_APV(c09704e0,fcd6d98c) at > vnode_pager_getpages(c10487bc,fcd6da50,1,0) at > vm_fault(ccdbee10,8097000,1,0,ccdd5000) at > trap_pfault(fcd6db08,0,8097940) at > trap(fcd60008,ccdd0028,28,fcd6db94,8097940) at > calltrap() at > --- trap 0xc, eip = 0xc08a5e06, esp = 0xfcd6db48, ebp = 0xfcd6db68 --- > slow_copyin(fcd6dc88,fcd6db94,4,4,fcd6db98) at > ip_ctloutput(ccdfd4ec,fcd6dc88,0,c054f464,0) at > udp_ctloutput(ccdfd4ec,fcd6dc88,246,c0977524,ccdf3c2c) at > sosetopt(ccdfd4ec,fcd6dc88,ccde5090,1,0) at > kern_setsockopt(ccdd5000,6,0,6d,8097940) at > setsockopt(ccdd5000,fcd6dd04,5,2,292) at > syscall(3b,3b,3b,0,7a6c) at > Xint0x80_syscall() at > --- syscall (105, FreeBSD ELF32, setsockopt), eip = 0x881d1787, esp = > 0xbfbfddec, ebp = 0xbfbfde48 --- > Sleeping thread (tid 100087, pid 4302) owns a non-sleepable lock > panic: sleeping thread > db_log_stack_trace_cmd(c09b2de0) at 0 > panic(c0943cde,c08f8bec,186f7,10ce,c09b2ae0) at 0 > propagate_priority(cc727c00,c09b6bf0,c0a20a8c,cc727c00,c09089b0) at 0 > turnstile_wait(c0a20a8c,ccdd5000,c0a20a8c,2,c08f60ec,225) at 0 > _mtx_lock_sleep(c0a20a8c,cc727c00,0,c09089b0,10c) at 0 > _mtx_lock_flags(c0a20a8c,0,c09089b0,10c,0) at 0 > udp_input(ccd34b00,14,0,4,4) at 0 > ip_input(cca7d180,ccd34b00,1,c08f60ec,c0a1be10) at 0 > netisr_processqueue(c0a1b738) at 0 > swi_net(0) at 0 > ithread_execute_handlers(cc726428,cc724500) at 0 > ithread_loop(cc70e740,f8de1d38,cc70e740,c051a45a,0) at 0 > fork_exit(c051a45a,cc70e740,f8de1d38) at 0 > fork_trampoline() at 0 > --- trap 0x1, eip = 0, esp = 0xf8de1d6c, ebp = 0 --- > KDB: enter: panic > [thread pid 14 tid 100002 ] > Stopped at kdb_enter+0x37: pushl $-0x1 > db> > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >