Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 29 Mar 2015 12:46:30 -0700
From:      Kirk McKusick <mckusick@mckusick.com>
To:        Da Rock <freebsd-fs@herveybayaustralia.com.au>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Delete a directory, crash the system 
Message-ID:  <201503291946.t2TJkUMv054849@chez.mckusick.com>
In-Reply-To: <55172A18.70601@herveybayaustralia.com.au> 

next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Sun, 29 Mar 2015 08:24:24 +1000
> From: Da Rock <freebsd-fs@herveybayaustralia.com.au>
> To: Kirk McKusick <mckusick@mckusick.com>
> CC: Benjamin Kaduk <kaduk@MIT.EDU>, freebsd-fs@freebsd.org
> Subject: Re: Delete a directory, crash the system
> 
> On 03/29/15 08:02, Kirk McKusick wrote:
> 
>> SU without journaling will maintain consistency. It is just that you
>> will need to run fsck after a crash. That is the way FFS has been since
>> it was written in 1982 and will allow you to recover from media errors
>> which it appears your system is suffering from. SU+J is just a faster
>> way of restarting but only works when you do not have media errors.
> 
> I guess the point I'm driving at is that on a server this may be
> an ok solution, but if you have workstations/desktops with users
> who don't know how to do this properly, that is why the journalling
> is an important feature. So its not just about faster restarts, but
> a simple reboot/boot and everything is basically ok for them.

Absent media errors, SU + fsck run at boot will always work without
any intervention on the part of the users. When you run with SU, the
default is to run fsck at every boot, so neither users nor administrators
need to do anything other than hit the power-on button.

> If there is any issue a system squawk at the sysadmin will then
> allow them to come in at some point to run a proper check. But in
> this case, we have a system which effectively crashes if there is
> a problem.
> 
> So thats why I mentioned the only other journal type fs' in freebsd,
> because in this scenario a journal is required and it appears these
> are the only alternative that don't create such a catastrophic effect.

No journaling on any system can recover from media errors. Neither
type on FreeBSD nor the one on Linux's ext4. The only way to recover
from media errors is to have redundant metadata in the filesystem.
ZFS has at least double and optionally triplely redundant metadata.
If you want a system that will cleanly recover without any system
administrator intervention in the face of media errors, that is what
you should run. As you note, it is more resource hungry than FFS, but
based on your requirement for no intervention in the face of media
errors, that is what I would recommend. As long as you run on a 64-bit
processor and have at least 4Gb of memory, it should have entirely
reasonable performance.

> Having made my point, what could be done about it - and what can I
> do to help? Would drive details provide data required to pick up
> the solution?

Short of adding metadata redundancy to FFS, there is no solution. I
have actively avoided putting such features into FFS as FreeBSD already
has ZFS that does that (and many other things). My goal is to have a
highly performant filesystem with minimal resource requirements. It by
definition has limits, and administrator intervention in the face of
media errors is one of them.

	Kirk McKusick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201503291946.t2TJkUMv054849>