From owner-freebsd-stable@FreeBSD.ORG Mon Apr 24 18:05:00 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 57C6D16A40B for ; Mon, 24 Apr 2006 18:05:00 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1DED543D48 for ; Mon, 24 Apr 2006 18:04:58 +0000 (GMT) (envelope-from marck@rinet.ru) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.13.6/8.13.4) with ESMTP id k3OI4v23042467; Mon, 24 Apr 2006 22:04:57 +0400 (MSD) (envelope-from marck@rinet.ru) Date: Mon, 24 Apr 2006 22:04:57 +0400 (MSD) From: Dmitry Morozovsky To: Kris Kennaway In-Reply-To: <20060424091803.L20593@woozle.rinet.ru> Message-ID: <20060424215650.P36233@woozle.rinet.ru> References: <20060423193208.N1187@woozle.rinet.ru> <20060423201732.GA74905@xor.obsecurity.org> <20060424091803.L20593@woozle.rinet.ru> X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (woozle.rinet.ru [0.0.0.0]); Mon, 24 Apr 2006 22:04:57 +0400 (MSD) Cc: stable@freebsd.org Subject: Re: fsck_ufs locked in snaplk X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Apr 2006 18:05:00 -0000 On Mon, 24 Apr 2006, Dmitry Morozovsky wrote: DM> kKK> > one of my servers had to be rebooted uncleanly and then I have backgrounded DM> KK> > fsck locked for more than an our in snaplk: DM> KK> > DM> KK> > 742 root 1 -4 4 1320K 688K snaplk 0:02 0.00% fsck_ufs DM> KK> > DM> KK> > File system in question is 200G gmirror on SATA. Usually making a snapshot DM> KK> > (e.g., for making dumps) consumes 3-4 minutes for that fs, so it seems to me DM> KK> > that filesystem is in a deadlock. DM> KK> DM> KK> Is the process performing I/O? Background fsck deliberately runs at a DM> KK> slow rate so it does not destroy I/O performance on the rest of the DM> KK> system. DM> DM> Nope. For that case, 50+ smbds had been locked in 'ufs' state, so I've been DM> urged to revive the machine and reboot, turning off bgfsck. DM> DM> This night, dump -L locks in the same position on the same filesystem: DM> DM> 0 2887 2886 0 -4 0 1260 692 snaplk D ?? 0:01.28 DM> /sbin/mksnap_ffs root 0.0 0.1 5:19AM DM> DM> it has been started at 5:19am, and now is 9:20 - no disk activity DM> DM> DM> For the reference: it's fresh RELENG_6_1/i386. Just rechecked it: did mksnap_ffs on an otherwise idle file system: marck@office:/> mksnap_ffs /st /st/.snap/test_snapshot load: 0.02 cmd: mksnap_ffs 4012 [biord] 0.00u 0.04s 0% 696k load: 0.04 cmd: mksnap_ffs 4012 [biord] 0.00u 0.44s 0% 696k load: 0.21 cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.17s 0% 696k load: 0.20 cmd: mksnap_ffs 4012 [snaprdb] 0.00u 1.23s 0% 696k load: 0.13 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k load: 0.08 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k load: 0.01 cmd: mksnap_ffs 4012 [snaplk] 0.00u 1.30s 0% 696k (I hit ^T several times) biord phase consumes about 1.5-2 mins, snaprdb phase - about 30-40 secs, and then process died. Most disk requests succeeds; however, accessing /st/.snap locks process in ufs state forever. What bothers me most is that it is the only machine reproducibly hangs in snapshots, and it did not hang before RELENG_5 -> RELENG_6 upgrade. Other RELENG_6 machines do snapshot backups flawlessly (knock-on-wood!) Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------