From owner-freebsd-stable@FreeBSD.ORG Mon Oct 1 12:07:52 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 63D811065672 for ; Mon, 1 Oct 2012 12:07:52 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 17AF18FC1B for ; Mon, 1 Oct 2012 12:07:51 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EACKHaVCDaFvO/2dsb2JhbAA9CIYLuTaCIAEBAQQBAQEgKx4CCAMzAgINGQIpAQkmBggHBAEcBIdkC6dnkk+BIYl+FAIEhR+BEgOTPIItgRWIJ4ZvgwOBPgk0 X-IronPort-AV: E=Sophos;i="4.80,516,1344225600"; d="scan'208";a="181349270" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 Oct 2012 08:07:44 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id D4D80B3F62; Mon, 1 Oct 2012 08:07:44 -0400 (EDT) Date: Mon, 1 Oct 2012 08:07:44 -0400 (EDT) From: Rick Macklem To: Norbert Aschendorff Message-ID: <509617515.1463700.1349093264849.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <506843B2.5060907@yahoo.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-stable@freebsd.org Subject: panic "Sleeping thread owns a non-sleepable lock" via cv_timedwait_signal, was "rsync over NFS" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Oct 2012 12:07:52 -0000 Norbert Aschendorff wrote: > Hi, > > my FreeBSD-9/stable machine (FreeBSD freebsd-tower.goebo.site > 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #2 r241044M: Sat Sep 29 12:52:01 > CEST 2012 lbo@freebsd-tower.goebo.site:/usr/obj/usr/src/sys/GENERIC > i386) crashes reproducibly when rsync-ing files to an NFSv4 share on > the FreeBSD machine. The crash makes the system reboot. The crash > creates files in /var/crash which may be obtained here: [1]. > > This problem is not limited to the self-compiled kernel/world > (stable/9) > but appears also on pre-compiled 9.1-PRERELEASE. I did not test > 9.0-RELEASE. > > If I do not use rsync on this NFS share, everything works completely > fine. > > Workaround: Use rsync over SSH. > > --Norbert > > [1] http://lbo.spheniscida.de/Files/nfs-rsync-crash.tgz (25K), vmcore > of > around 300M (90M gzipped, 64M LZMA'd) not included > >From a quick look, the panic is: Sleeping thread (tid 100099, pid 1599) owns a non-sleepable lock called from the server side krpc via cv_timedwait_sig(). I assume this means that another mutex or similar is held as well as the one passed in as an argument to cv_timedwait_sig()? (I'll keep looking, but I can't spot where another one might be held by the NFS or krpc code.) I'm not knowledgible when it comes to gdb and crash dumps. Is there an easy command Norbert can type to see all the locks held by tid 100099, pid 1599? Is the NFS client using Kerberos or AUTH_SYS for the mount? (And if you are using Kerberos, have you tried the rsync with an AUTH_SYS mount?) Does anyone happen to know of outstanding issues (or problems with WITNESS) for cv_timedwait_sig() called with a locked mutex as the argument lock? (The mutex will probably get locked by another thread related to the same pid, once sleepq_timedwait_sig() unlocks the argument mutex.) Here's the backtrace from the crash info he referenced, in case someone else can gain more insight from it: Unread portion of the kernel message buffer: Sleeping thread (tid 100099, pid 1599) owns a non-sleepable lock KDB: stack backtrace of thread 100099: #0 0xc0aae034 at mi_switch+0xe4 #1 0xc0ae3799 at sleepq_switch+0xd9 #2 0xc0ae3c06 at sleepq_catch_signals+0x3d6 #3 0xc0ae3d04 at sleepq_timedwait_sig+0x14 #4 0xc0a591ff at _cv_timedwait_sig+0x17f #5 0xc0c87f7e at svc_run_internal+0x7ce #6 0xc0c87706 at svc_run+0xc6 #7 0xc09f24c4 at nfsrvd_nfsd+0x1d4 #8 0xc0a00ad9 at nfssvc_nfsd+0x109 #9 0xc0c70c58 at sys_nfssvc+0x98 #10 0xc0dfd288 at syscall+0x378 #11 0xc0de64b1 at Xint0x80_syscall+0x21 panic: sleeping thread cpuid = 0 rick ____________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org"