From owner-freebsd-fs@FreeBSD.ORG Sun Aug 31 10:26:18 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ED29416A4BF for ; Sun, 31 Aug 2003 10:26:18 -0700 (PDT) Received: from web41805.mail.yahoo.com (web41805.mail.yahoo.com [66.218.93.139]) by mx1.FreeBSD.org (Postfix) with SMTP id 8B2BD43FEA for ; Sun, 31 Aug 2003 10:26:18 -0700 (PDT) (envelope-from neoninternet@yahoo.com) Message-ID: <20030831172618.95711.qmail@web41805.mail.yahoo.com> Received: from [68.2.118.193] by web41805.mail.yahoo.com via HTTP; Sun, 31 Aug 2003 10:26:18 PDT Date: Sun, 31 Aug 2003 10:26:18 -0700 (PDT) From: Kevin Bockman To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: Filesystem problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Aug 2003 17:26:19 -0000 Hi. I have been experencing some filesystem problems for the last month or so. I was running 4.8-STABLE and updated to 5.1-RELEASE-p2. While I was running 4.8 and I tried to run a command that required hard disk activity, the process would 'hang' and I would no longer be able to ssh or telnet in. I would get stuck after typing in my login. Running 5.1 is a different story. I did a clean install of 5.1-RELEASE and cvsup'd to -p2. Every time I do this, it's great for a day or so then it acts up. Before after it started, even if I rebooted it would immediately start up. On 5.1, it is only hanging for that process and everything else is fine. I can still login, webserver responds, etc. Here is a little info: FreeBSD devel.neoninternet.net 5.1-RELEASE-p2 FreeBSD 5.1-RELEASE-p2 #0: Sat Aug 23 20:12:41 PDT 2003 kevin@devel.ph.cox.net:/usr/src/sys/i386/compile/SLURPEE i386 CPU: AMD Athlon(tm) XP 2600+ (2086.51-MHz 686-class CPU) real memory = 1073676288 (1023 MB) ad0: 117246MB [238216/16/63] at ata0-master UDMA133 root 1173 0.0 0.1 1436 916 p3 D+ 6:38PM 0:00.00 man vmstat root 784 0.0 0.1 752 636 d0 D 4:34PM 0:00.02 make all DIRPRFX=i386/libi386/ root 847 0.0 0.0 312 212 d0 D 4:34PM 0:00.00 (cc) root 848 0.0 0.3 4104 3488 d0 D 4:34PM 0:00.01 (cc1) root 849 0.0 0.1 928 668 d0 D 4:34PM 0:00.00 /usr/bin/as -o comconsole.o - last pid: 1252; load averages: 0.00, 0.00, 0.00 up 0+02:37:22 19:04:48 64 processes: 1 running, 63 sleeping CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 34M Active, 23M Inact, 38M Wired, 204K Cache, 22M Buf, 906M Free Swap: 2048M Total, 2048M Free devel# vmstat procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad0 da0 in sy cs us sy id 1 7 0 144612 928056 16 0 0 0 9 0 0 0 331 0 254 0 0 100 Anyone have any suggestions? I can not control-C out of 'man vmstat'. While doing 'make' in /usr/src/sys/boot it was hanging on as, when I restarted it, it got to i386/libi386 and will not do anything else. I'm running that through serial console, it let me ^C out of that. I tried going into single user mode and running umount, now it just sits there and I can't ^C. I have no ideas, this was all working yesterday!! :-) Any ideas on what else to check or other helpful hints would help bunches. Sorry for the cross-posts. Just not sure where to go with this one. Thanks, Kevin __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com From owner-freebsd-fs@FreeBSD.ORG Tue Sep 2 13:10:35 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1788816A4BF for ; Tue, 2 Sep 2003 13:10:35 -0700 (PDT) Received: from freebsd.org (dhcp065-024-168-078.columbus.rr.com [65.24.168.78]) by mx1.FreeBSD.org (Postfix) with SMTP id B510243F93 for ; Tue, 2 Sep 2003 12:55:49 -0700 (PDT) (envelope-from stuck_in_telnet@freebsd.org) To: freebsd-fs@freebsd.org From: stuck_in_telnet@no.where Message-Id: <20030902195549.B510243F93@mx1.FreeBSD.org> Subject: /sbin/newfs 4.8-STABLE_20030803 segfault and core X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Date: Tue, 02 Sep 2003 20:10:35 -0000 X-Original-Date: Tue Sep 2 15:30:00 EDT X-List-Received-Date: Tue, 02 Sep 2003 20:10:35 -0000 # size offset fstype [fsize bsize bps/cpg] h: 19925880 0 4.2BSD 2048 16384 89 # (Cyl. 0 - 19767*) /tmp/newfs: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), for FreeBSD 4.8, statically linked, not stripped /tmp/newfs -NU -i 3 ad0h Warning: Block size restricts cylinders per group to 40200. Warning: 1160 sector(s) in last cylinder unallocated /dev/ad0h: 19925880 sectors in 4865 cylinders of 1 tracks, 4096 sectors 9729.4MB in 1 cyl groups (40200 c/g, 80400.00MB/g, -48203008 i/g) SOFTUPDATES super-block backups (for fsck -b #) at: zsh: segmentation fault (core dumped) /tmp/newfs -NU -i 3 ad0h Core was generated by `newfs'. Program terminated with signal 11, Segmentation fault. #0 0x804c6c8 in initcg (cylno=0, utime=1062500000) at /usr/src/sbin/newfs/mkfs.c:835 (gdb) #0 0x804c6c8 in initcg (cylno=0, utime=1062500000) at /usr/src/sbin/newfs/mkfs.c:835 #1 0x804c080 in mkfs (pp=0x806eec4, fsys=0x8093bc0 "/dev/ad0h", fi=3, fo=-1) at /usr/src/sbin/newfs/mkfs.c:709 #2 0x8049581 in main (argc=1, argv=0xbfbffbc4) at /usr/src/sbin/newfs/newfs.c:617 binary from above sup date running on 4.8-RELEASE. if run on smaller partitions it simply emits a warning... /tmp/newfs -NU -i 3 ad1b Minimum bytes per inode is 773576 no biggie, now off to fix the mail client... From owner-freebsd-fs@FreeBSD.ORG Wed Sep 3 14:36:30 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2D87016A4BF for ; Wed, 3 Sep 2003 14:36:30 -0700 (PDT) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id E2EBB43FDF for ; Wed, 3 Sep 2003 14:36:28 -0700 (PDT) (envelope-from arno@heho.snv.jussieu.fr) Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22]) by shiva.jussieu.fr (8.12.9/jtpda-5.4) with ESMTP id h83LaR9U001589 for ; Wed, 3 Sep 2003 23:36:27 +0200 (CEST) Received: from heho.snv.jussieu.fr (localhost [127.0.0.1]) h83LaRTe027921 for ; Wed, 3 Sep 2003 23:36:27 +0200 (MEST) Received: (from arno@localhost) by heho.snv.jussieu.fr (8.12.9/8.12.9/Submit) id h83LaR94027918; Wed, 3 Sep 2003 23:36:27 +0200 (MEST) To: freebsd-fs@freebsd.org From: arno@heho.snv.jussieu.fr (Arno J. Klaassen) Date: 03 Sep 2003 23:36:26 +0200 Message-ID: Lines: 51 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Antivirus: scanned by sophie at shiva.jussieu.fr Subject: very slow fsck on bad disk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 21:36:30 -0000 hello, is this normal ? : - I have a 40G IBM Deskstar IDE disk, about 1-2 years old, connected to some -stable Pentium-Pro box - a while ago, I got lots of "disk errors", I decided to reboot the box, but then it did not recognise the disk any longer (so I pulled ou the (data) disk) - ... - Time passed, I found out that the last backup of that disk was rather old, and I'd like to spend some time to get of the disk whatever is still readable - I manage to find an ASUS-AMD MB, running 5.1-RELEASE whos BIOS accepts the disk as secondary master - I could not mount the disk, since "I/O Error" or something like that - I started : fsck_ffs -y -b 32 /dev/ad2s1e I started this .... "last saturday" and it's still running. When I look at the dmesg or /var/log messages, my eye got triggered by this : Sep 3 23:02:27 tabarnac kernel: ad2: hard error cmd=read fsbn 16405947 status=59 error=40 Sep 3 23:02:32 tabarnac kernel: ad2: hard error cmd=read fsbn 16405948 status=59 error=40 Sep 3 23:02:37 tabarnac kernel: ad2: hard error cmd=read fsbn 16405949 status=59 error=40 Sep 3 23:02:42 tabarnac kernel: ad2: hard error cmd=read fsbn 16405950 status=59 error=40 Sep 3 23:02:47 tabarnac kernel: ad2: hard error cmd=read fsbn 16769183 of 16769183-16769310 status=59 error=40 Sep 3 23:02:52 tabarnac kernel: ad2: hard error cmd=read fsbn 16769183 status=59 error=40 Sep 3 23:02:57 tabarnac kernel: ad2: hard error cmd=read fsbn 16769184 status=59 error=40 Sep 3 23:03:02 tabarnac kernel: ad2: hard error cmd=read fsbn 16769185 status=59 error=40 i.e., for a long time, every five seconds the error says "next block" then suddenly, it says "no more errors in between blocks 16405950 and 16769183" (i.e. 363233 blocks ....) and once again 5 seconds later it says "next block bad as well" Am i wrong or does this smell like "each surface error is queued for syslog, syslog print is triggered every 5 seconds, progress on fsck ata-disks is hold until next syslog message is printed" Thank you very much in advance. Arno From owner-freebsd-fs@FreeBSD.ORG Wed Sep 3 21:27:14 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 649EF16A4E0 for ; Wed, 3 Sep 2003 21:27:14 -0700 (PDT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9E12A43FE5 for ; Wed, 3 Sep 2003 21:27:13 -0700 (PDT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 9074D72DA4; Wed, 3 Sep 2003 21:27:13 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 8DBCA72DA3; Wed, 3 Sep 2003 21:27:13 -0700 (PDT) Date: Wed, 3 Sep 2003 21:27:13 -0700 (PDT) From: Doug White To: "Arno J. Klaassen" In-Reply-To: Message-ID: <20030903212556.C88884@carver.gumbysoft.com> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-fs@freebsd.org Subject: Re: very slow fsck on bad disk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2003 04:27:14 -0000 On Wed, 3 Sep 2003, Arno J. Klaassen wrote: > fsck_ffs -y -b 32 /dev/ad2s1e > > I started this .... "last saturday" and it's still running. > When I look at the dmesg or /var/log messages, my eye > got triggered by this : > > > Sep 3 23:02:27 tabarnac kernel: ad2: hard error cmd=read fsbn 16405947 status=59 error=40 > Sep 3 23:02:32 tabarnac kernel: ad2: hard error cmd=read fsbn 16405948 status=59 error=40 [... disk errors ...] > Am i wrong or does this smell like "each surface error is queued > for syslog, syslog print is triggered every 5 seconds, progress > on fsck ata-disks is hold until next syslog message is printed" It isn't trying to log to the defective volume, is it? And the delays are probably the disk resetting. Errors and timeouts take a while. I would say you are the proud owner of a new doorstop. :) -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org From owner-freebsd-fs@FreeBSD.ORG Thu Sep 4 13:25:27 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3604D16A4BF; Thu, 4 Sep 2003 13:25:27 -0700 (PDT) Received: from rwcrmhc11.comcast.net (rwcrmhc11.comcast.net [204.127.198.35]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0C7B943F3F; Thu, 4 Sep 2003 13:25:26 -0700 (PDT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([12.233.125.100]) by attbi.com (rwcrmhc11) with ESMTP id <2003090420252501300g9gqve>; Thu, 4 Sep 2003 20:25:25 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id NAA42450; Thu, 4 Sep 2003 13:25:25 -0700 (PDT) Date: Thu, 4 Sep 2003 13:25:23 -0700 (PDT) From: Julian Elischer To: Andrew Kinney In-Reply-To: <3F573729.8917.53574D7@localhost> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-hackers@freebsd.org cc: fs@freebsd.org Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2003 20:25:27 -0000 On Thu, 4 Sep 2003, Andrew Kinney wrote: > > > > Our experience has been that with 4GB of RAM (or more) you > really must increase your KVA to 2GB, leaving only 2GB of UVA. > So, I would concur with what Julian said. > > thrown> ;-) > > With the lack of third party filesystem support in FreeBSD, might > you be better served by looking at a Linux system running > ReiserFS or one of the other file systems designed for such > behemoth disk systems? > > These days, I think Sun even gives away Solaris licenses with their > low end x86 servers, so that might even be an option. > > UFS is great, but there are other filesystems out there that have > already addressed such problems from their use in academic, > government, and scientific computing where gigantic filesystems > tend to be more prevalent. > UFS2 will make the filesystem.. All we need is a way to FIX such a filesystem. My brief analysis of this indicates that a 'serial' fsck should be possible. What this would do is read through the filesystem metadata, creating several 'list' files on another filesystem. These would then be duplicated and sorted on several different fields, and then recombined in a 'merge' manner, to produce lists of unallocated files, bad directory entries, duplicate allocated blocks etc. etc. This would probably be workable in a similar order of magnitute of time as a normal fsck, except 'offline' and able to handle a much larger filesystem. julian