From owner-freebsd-current@FreeBSD.ORG Tue Aug 3 05:31:34 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B6B6816A4CE; Tue, 3 Aug 2004 05:31:34 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 59CB043D2F; Tue, 3 Aug 2004 05:31:34 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.11/8.12.11) with ESMTP id i735VDNL077963; Mon, 2 Aug 2004 22:31:22 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200408030531.i735VDNL077963@gw.catspoiler.org> Date: Mon, 2 Aug 2004 22:31:13 -0700 (PDT) From: Don Lewis To: boris@brooknet.com.au In-Reply-To: <1091504341.729.25.camel@dirk.no.domain> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-current@FreeBSD.org cc: pjd@FreeBSD.org Subject: Re: processes freezing when writing to gstripe'd device X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Aug 2004 05:31:34 -0000 On 3 Aug, Sam Lawrance wrote: >> +> I am observing processes performing operations on a gstripe device >> +> freeze in state 'bufwait'. An 'rm' process is stuck right now. The rest >> +> of the system is fine. >> +> >> +> What's the best way to look in to this? I can't attach to rm with gdb >> +> (it just ends up waiting for something). I can drop to kdb, but have no >> +> idea where to go from there. >> >> You could use 'ps' command from DDB to which processes are alseep. >> Then you can run 'tr ' where is PID of sleeping process. >> Look for processes related somehow to this problem. >> >> It'll be also great if you can provide exact procedure which will also >> me to reproduce this problem. > > Okay, I updated to current as of yesterday and still seeing the same > problem. I'm new to these bits of the kernel but it looks like a locking > problem. This is what I am doing: > > dd if=/dev/zero of=sd0 count=20480 > cp sd0 sd1 > mdconfig -a -t vnode -f sd0 > mdconfig -a -t vnode -f sd1 > gstripe label bork md0 md1 > newfs /dev/stripe/bork > mkdir teststripe > mount /dev/stripe/bork teststripe > cd teststripe > > Now I repeatedly 'cvs checkout' and 'rm -rf' the FreeBSD src tree. > Usually it freezes during the first checkout. > > Siginfo shows: > > load:1.14 cmd: cvs 801 [biowr] 0.33u 3.35s 14% 2840k > > A trace of the frozen cvs process 801 shows: > > KDB: enter: manual escape to debugger > [thread 100006] > Stopped at kdb_enter+0x2b: nop > db> tr 801 > sched_switch(c1a87580,0) at sched_switch+0x12b > mi_switch(1,0) at mi_switch+0x24d > sleepq_switch(c63ee500,d0c83814,c06030e9,c63ee500,0) at > sleepq_switch+0xe0 > sleepq_wait(c63ee500,0,0,0,c07f3ab7) at sleepq_wait+0xb > msleep(c63ee500,c08ddd80,4c,c07f40d1,0) at msleep+0x375 > bwait(c63ee500,4c,c07f40d1) at bwait+0x47 > bufwait(c63ee500,c088f1a0,c1cd6318,c63ee500,0) at bufwait+0x2d > ibwrite(c63ee500,d0c838d8,c071906e,c63ee500,a00) at ibwrite+0x3e2 > bwrite(c63ee500,a00,0,ee,c19b1834) at bwrite+0x32 > ffs_update(c19c3738,1,0,c19b808c,c19c3738) at ffs_update+0x302 > ufs_makeinode(81a4,c199f840,d0c83bf8,d0c83c0c) at ufs_makeinode+0x3a3 > ufs_create(d0c83a74,d0c83b30,c0655238,d0c83a74,c08b8c00) at > ufs_create+0x26 > ufs_vnoperate(d0c83a74) at ufs_vnoperate+0x13 > vn_open_cred(d0c83be4,d0c83ce4,1a4,c1d47700,8) at vn_open_cred+0x174 > vn_open(d0c83be4,d0c83ce4,1a4,8,c08ad240) at vn_open+0x1e > kern_open(c1a87580,8199430,0,602,1b6) at kern_open+0xd2 > open(c1a87580,d0c83d14,3,1be,292) at open+0x18 > syscall(2f,bfbf002f,bfbf002f,8,2836c7f8) at syscall+0x217 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (5, FreeBSD ELF32, open), eip = 0x282e2437, esp = > 0xbfbfdd7c, ebp = 0xbfbfdda8 --- > > I'll keep poking around - if you have any further suggestions or need > other information, fire away. I'm not too familiar with this area of the kernel, but I'd be suspicious that one or more of the geom kernel threads are getting wedged and keeping the I/O that the cvs process is waiting for from completing. I would think that the vnode locks sd0 and sd1 need to be obtained in order to do the I/O on the md devices. Maybe a deadly embrace where one thread has a lock on sd0 and another thread has a lock on sd1 and they each want to grab a lock on the other vnode ... Try ps lax | grep g_ and get a DDB backtrace on the g_up, g_down, and g_event threads.