From owner-freebsd-questions@FreeBSD.ORG Sun Jul 28 05:54:47 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A6C109C9 for ; Sun, 28 Jul 2013 05:54:47 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx02.qsc.de (mx02.qsc.de [213.148.130.14]) by mx1.freebsd.org (Postfix) with ESMTP id 5307221C9 for ; Sun, 28 Jul 2013 05:54:47 +0000 (UTC) Received: from r56.edvax.de (port-92-195-48-81.dynamic.qsc.de [92.195.48.81]) by mx02.qsc.de (Postfix) with ESMTP id 84D9824B54; Sun, 28 Jul 2013 07:54:46 +0200 (CEST) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id r6S5sl8a001981; Sun, 28 Jul 2013 07:54:47 +0200 (CEST) (envelope-from freebsd@edvax.de) Date: Sun, 28 Jul 2013 07:54:47 +0200 From: Polytropon To: Frank Leonhardt Subject: Re: Delete a directory, crash the system Message-Id: <20130728075447.4d6e0468.freebsd@edvax.de> In-Reply-To: <51F420ED.1050402@fjl.co.uk> References: <51F3F290.9020004@cordula.ws> <51F420ED.1050402@fjl.co.uk> Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Polytropon List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jul 2013 05:54:47 -0000 And here, kids, you can see the strength of open source operating system: You can see _why_ something happens. :-) On Sat, 27 Jul 2013 20:35:09 +0100, Frank Leonhardt wrote: > On 27/07/2013 19:57, David Noel wrote: > >> So the system panics in ufs_rmdir(). Maybe the filesystem is > >> corrupt? Have you tried to fsck(8) it manually? > > fsck worked, though I had to boot from a USB image because I couldn't > > get into single user.. for some odd reason. > > > >> Even if the filesystem is corrupt, ufs_rmdir() shouldn't > >> panic(), IMHO, but fail gracefully. Hmmm... > > Yeah, I was pretty surprised. I think I tried it like 3 times to be > > sure... and yeah, each time... kaboom! Who'd have thought. Do I just > > post this to the mailing list and hope some benevolent developer > > stumbles upon it and takes it upon him/herself to "fix" this, or where > > do I find the FreeBSD Suggestion Box? I guess I should file a Problem > > Report and see what happens from there. > > > > I was going to raise an issue when the discussion had died down to a > concensus. I also don't think it's reasonable for the kernel to bomb > when it encounters corruption on a disk. > > If you want to patch it yourself, edit sys/ufs/ufs/ufs_vnops.c at around > line 2791 change: > > if (dp->i_effnlink < 3) > panic("ufs_dirrem: Bad link count %d on parent", > dp->i_effnlink); > > To > > if (dp->i_effnlink < 3) { > error = EINVAL; > goto out; > } > > The ufs_link() call has a similar issue. > > I can't see why my mod will break anything, but there's always > unintended consequences. One of the core policies usually is to stop _any_ action that had failed due to a "reason that cannot be" and make sure it won't get worse. This can be seen for example in fsck's behaviour: If there is a massive file system error that cannot be repaired without further intervention that _could_ destroy data or make its retrieval harder or impossible, the operator will be requested to make the decision. There are options to automate this process, but on the other hand, "always assume 'yes'" can then be a risk, as it could prevent recovery. My assumtion is that the developers chose a similar approach here: "We found a situation that should not be possible, so we stop the system for messing up the file system even more." This carries the attitude of not "hiding a problem for the sake of convenience" by "being silent and going back to the usual work". Of course it is debatable if this is the right decision in _this_ particular case. > By returning invalid argument, any code above > it should already be handling that condition although the user will be > scratching their head wondering what's wrong with it. By determining the inode number and using the fsdb tool "internal data" about inodes can be examined. Will it also show something that's basically impossible? :-) > Returning ENOENT > or EACCES or ENOTDIR may be better ("No such directory", "Access denied" > or "Not a valid directory"). Depends on the applying definition of those errors. > The trouble is that it's tricky to test properly without finding a good > way to corrupt the link count :-) There is a _simple_ way to do this, and I have even mentioned it. Use the fsdb program and manipulate the inode "manually". Make sure that you actually understand that _what_ you are doing there is creating severe file system inconsistency errors. :-) -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...