From owner-freebsd-fs@FreeBSD.ORG Fri Jul 1 18:18:48 2005 Return-Path: X-Original-To: fs@freebsd.org Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 15D8E16A422 for ; Fri, 1 Jul 2005 18:18:48 +0000 (GMT) (envelope-from cdillon@wolves.k12.mo.us) Received: from mail.wolves.k12.mo.us (mail.wolves.k12.mo.us [207.160.214.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id D633443D66 for ; Fri, 1 Jul 2005 18:18:47 +0000 (GMT) (envelope-from cdillon@wolves.k12.mo.us) Received: from localhost (localhost [127.0.0.1]) by mail.wolves.k12.mo.us (Postfix) with ESMTP id 249761FEAA; Fri, 1 Jul 2005 13:18:47 -0500 (CDT) Received: from mail.wolves.k12.mo.us ([127.0.0.1]) by localhost (mail.wolves.k12.mo.us [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 52867-02-6; Fri, 1 Jul 2005 13:18:44 -0500 (CDT) Received: by mail.wolves.k12.mo.us (Postfix, from userid 1001) id 6639F1FE5B; Fri, 1 Jul 2005 13:18:44 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mail.wolves.k12.mo.us (Postfix) with ESMTP id 64BBE1A902; Fri, 1 Jul 2005 13:18:44 -0500 (CDT) Date: Fri, 1 Jul 2005 13:18:44 -0500 (CDT) From: Chris Dillon To: Skylar Thompson In-Reply-To: <20050626182031.GA5268@quark.cs.earlham.edu> Message-ID: <20050701130315.C52686@duey.wolves.k12.mo.us> References: <20050626182031.GA5268@quark.cs.earlham.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: amavisd-new at wolves.k12.mo.us Cc: fs@freebsd.org Subject: Re: Snapshot problems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jul 2005 18:18:48 -0000 On Sun, 26 Jun 2005, Skylar Thompson wrote: > I've discovered a repeatable problem with FreeBSD's UFS2 snapshots. > If I create several snapshots, and then do heavy disk I/O on the > original filesystem (deletions, creations, simple touches, etc.) I > can cause the I/O system to crash. There is no kernel panic, and the > machine still answers pings, but no disk I/O occurs. I see a similar problem (I think stuck processes show they are in the 'disk wait' state in top/ps), but my problem occurs on a production box so I've been unable to debug it. I've taken to rebooting the box automatically every day about 15 minutes before the snapshots are scheduled to be made using Ralf S. Engelschall's snapshot scripts (http://people.freebsd.org/~rse/snapshot/). The daily reboot seems to prevent the problem from happening. If I don't reboot the system daily, I can only go one to three days without a problem. > I can replicate this on a dual-processor beige-box system with a > Mylex RAID controller and a RAID-5 set, and also on a dual-processor > Dell Poweredge 2650 with a PERC 3/i RAID controller and a RAID-5 set > and RAID-1 set. FreeBSD 5.4-RELEASE is installed on both systems, > and SMP is enabled as well, with HTT disabled on the Poweredge. I > have DDB compiled in, so I can get debug information but I don't > know what to look for. I'm using FreeBSD 5.4-STABLE on a relatively new dual-processor HP DL380 G3 with integrated SmartArray 5i+ and a 7-disk RAID5 array. Given the range of hardware we are seeing the problem on, it doesn't seem to be hardware or driver related. In the meantime, try rebooting the box at a scheduled time every day to see if that helps alleviate your problem. -- Chris Dillon - cdillon(at)wolves.k12.mo.us FreeBSD: The fastest, most open, and most stable OS on the planet - Available for IA32, IA64, AMD64, PC98, Alpha, and UltraSPARC architectures - PowerPC, ARM, MIPS, and S/390 under development - http://www.freebsd.org Q: Because it reverses the logical flow of conversation. A: Why is putting a reply at the top of the message frowned upon?