From owner-freebsd-fs@FreeBSD.ORG Sun Mar 29 19:46:43 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2BF45FCB for ; Sun, 29 Mar 2015 19:46:43 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F3443E6E for ; Sun, 29 Mar 2015 19:46:42 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id t2TJkUMv054849; Sun, 29 Mar 2015 12:46:30 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201503291946.t2TJkUMv054849@chez.mckusick.com> To: Da Rock Subject: Re: Delete a directory, crash the system In-reply-to: <55172A18.70601@herveybayaustralia.com.au> Date: Sun, 29 Mar 2015 12:46:30 -0700 From: Kirk McKusick Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Mar 2015 19:46:43 -0000 > Date: Sun, 29 Mar 2015 08:24:24 +1000 > From: Da Rock > To: Kirk McKusick > CC: Benjamin Kaduk , freebsd-fs@freebsd.org > Subject: Re: Delete a directory, crash the system > > On 03/29/15 08:02, Kirk McKusick wrote: > >> SU without journaling will maintain consistency. It is just that you >> will need to run fsck after a crash. That is the way FFS has been since >> it was written in 1982 and will allow you to recover from media errors >> which it appears your system is suffering from. SU+J is just a faster >> way of restarting but only works when you do not have media errors. > > I guess the point I'm driving at is that on a server this may be > an ok solution, but if you have workstations/desktops with users > who don't know how to do this properly, that is why the journalling > is an important feature. So its not just about faster restarts, but > a simple reboot/boot and everything is basically ok for them. Absent media errors, SU + fsck run at boot will always work without any intervention on the part of the users. When you run with SU, the default is to run fsck at every boot, so neither users nor administrators need to do anything other than hit the power-on button. > If there is any issue a system squawk at the sysadmin will then > allow them to come in at some point to run a proper check. But in > this case, we have a system which effectively crashes if there is > a problem. > > So thats why I mentioned the only other journal type fs' in freebsd, > because in this scenario a journal is required and it appears these > are the only alternative that don't create such a catastrophic effect. No journaling on any system can recover from media errors. Neither type on FreeBSD nor the one on Linux's ext4. The only way to recover from media errors is to have redundant metadata in the filesystem. ZFS has at least double and optionally triplely redundant metadata. If you want a system that will cleanly recover without any system administrator intervention in the face of media errors, that is what you should run. As you note, it is more resource hungry than FFS, but based on your requirement for no intervention in the face of media errors, that is what I would recommend. As long as you run on a 64-bit processor and have at least 4Gb of memory, it should have entirely reasonable performance. > Having made my point, what could be done about it - and what can I > do to help? Would drive details provide data required to pick up > the solution? Short of adding metadata redundancy to FFS, there is no solution. I have actively avoided putting such features into FFS as FreeBSD already has ZFS that does that (and many other things). My goal is to have a highly performant filesystem with minimal resource requirements. It by definition has limits, and administrator intervention in the face of media errors is one of them. Kirk McKusick