From owner-freebsd-fs@FreeBSD.ORG Tue Oct 31 14:35:49 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B142516A4DF for ; Tue, 31 Oct 2006 14:35:49 +0000 (UTC) (envelope-from dudu@dudu.ro) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4210643D88 for ; Tue, 31 Oct 2006 14:35:07 +0000 (GMT) (envelope-from dudu@dudu.ro) Received: by wr-out-0506.google.com with SMTP id 71so421974wri for ; Tue, 31 Oct 2006 06:34:59 -0800 (PST) Received: by 10.65.114.11 with SMTP id r11mr6940935qbm; Tue, 31 Oct 2006 06:34:59 -0800 (PST) Received: by 10.65.112.4 with HTTP; Tue, 31 Oct 2006 06:34:59 -0800 (PST) Message-ID: Date: Tue, 31 Oct 2006 16:34:59 +0200 From: "Vlad Galu" To: "Eric Anderson" In-Reply-To: <45475DEA.2030506@centtech.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200610010015.k910F6Ba001594@cwsys.cwsent.com> <45475DEA.2030506@centtech.com> Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Frequent VFS crashes with RELENG_6 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Oct 2006 14:35:49 -0000 On 10/31/06, Eric Anderson wrote: > On 10/31/06 08:03, Vlad Galu wrote: > > On 10/1/06, Cy Schubert wrote: > >> In message , > >> "Vlad > >> GALU" writes: > >>> On 9/30/06, Martin Blapp wrote: > >>>> Hi, > >>>> > >>>> 1.) Bad ram ? Have you run some memory tester ? > >>> Yes, memtest86 didn't show anything weird. > >>> > >>>> 2.) Have you background fsck running on this disk ? If > >>>> so try to boot into single user and do a full fsck on this > >>>> disk. > >>>> > >>> I have background_fsck="NO" in rc.conf and I checked the whole disk > >>> several times. > >>> Something I forgot to mention earlier: the crash is easier to > >>> reproduce when running rtorrent. The machine did crash without running > >>> it as well, but far more seldom. > >> I've been experiencing the same problem as well. I discovered that the disk on which the filesystem was had some bad sectors causing dump -0Lauf to fail while taking snapshot causing the system to panic. Running smartctl on the device indicated that there were bad sectors 40% within the surface scan being performed by SMART. The drive, an 80 GB Maxtor, was replaced with a 250 GB Western Digital (for a very good price, so good a price I purchased two of them). It was 906 days old, having only been powered off maybe a dozen times over the last three years. > > > > During the last 2 weeks I ran the same system with WITNESS turned > > on. The fact that the purpose of this machine is not I/O dependant > > allowed me to run bonnie++ and iozone every second day for the whole > > 24 hours. At the same time I ran several instances of rtorrent. This > > morning I rebooted to a non-WITNESS kernel (the same sources from 2 > > weeks ago) and the exact same crash occured within a few hours from > > bootup. In all this time, smartd didn't report anything suspicious. > > WITNESS only reported a LOR related to kqueue that is already known. > > Any ideas for further stresstesting would be welcome. I am > > familiar with a few parts of the kernel, but VFS is a total stranger > > to me. > > > > > > > Did you get a crash dump? If not, you might want to start with adding > all the debugger options into the kernel. Yes, but for objective reasons I can't publish it :( The only debugging option that I didn't use was INVARIANTS. However, I issued an output of "bt full" during the beginning of this thread. See http://lists.freebsd.org/pipermail/freebsd-stable/2006-September/028985.html. > > Eric > > > > -- > ------------------------------------------------------------------------ > Eric Anderson Sr. Systems Administrator Centaur Technology > Anything that works is better than anything that doesn't. > ------------------------------------------------------------------------ > -- If it's there, and you can see it, it's real. If it's not there, and you can see it, it's virtual. If it's there, and you can't see it, it's transparent. If it's not there, and you can't see it, you erased it.