Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Feb 2011 14:12:20 -0600 (CST)
From:      Robert Bonomi <bonomi@mail.r-bonomi.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: ZFS-only booting on FreeBSD
Message-ID:  <201102192012.p1JKCKnP038248@mail.r-bonomi.com>
In-Reply-To: <AF8BFB811828E5E7EFD857A5@mac-pro.magehandbook.com>

next in thread | previous in thread | raw e-mail | index | archive | help

> Date: Sat, 19 Feb 2011 10:35:35 -0500
> From: Daniel Staal <DStaal@usa.net>
> Subject: Re: ZFS-only booting on FreeBSD
>
  [[..  sneck  ..]]
>
> Basically, if a ZFS boot drive fails, you are likely to get the following 
> scenario:
> 1) 'What do I need to do to replace a disk in the ZFS pool?'
> 2) 'Oh, that's easy.'  Replaces disk.
> 3) System fails to boot at some later point.
> 4) 'Oh, right, you need to do this *as well* on the *boot* pool...'
>
> Where if a UFS boot drive fails on an otherwise ZFS system, you'll get:
> 1) 'What's this drive?'
> 2) 'Oh, so how do I set that up again?'
> 3) Set up replacement boot drive.
>
> The first situation hides that it's a special case, where the second one 
> doesn't.

"For any foolproof system, there exists a _sufficiently-determined_ fool
 capable of breaking it" applies.

> To avoid the first scenario you need to make sure your sysadmins are 
> following *local* (and probably out-of-band) docs, and aware of potential 
> problems.  And awake.  ;)  The scenario in the second situation presents 
> it's problem as a unified package, and you can rely on normal levels of 
> alertness to be able to handle it correctly.  (The sysadmin will realize 
> it needs to be set up as a boot device because it's the boot device.  ;)  
> It may be complicated, but it's *obviously* complicated.)
>
> I'm still not clear on whether a ZFS-only system will boot with a failed 
> drive in the root ZFS pool.  Once booted, of course a decent ZFS setup 
> should be able to recover from the failed drive.  But the question is if 
> the FreeBSD boot process will handle the redundancy or not.  At this 
> point I'm actually guessing it will, which of course only exasperates the 
> above surprise problem: 'The easy ZFS disk replacement procedure *did* 
> work in the past, why did it cause a problem now?'  (And conceivably it 
> could cause *major* data problems at that point, as ZFS will *grow* a 
> pool quite easily, but *shrinking* one is a problem.)

A non-ZFS boot drive results in immediate, _guaranteed_, down-time for
replacement if/when it fails.

A ZFS boot drive lets you replace the drive and *schedule* the down-time
(for a 'test' re-boot, to make *sure* everything works) at a convenient
time.

Failure to schedule the required down time is a management failure, not
a methodology issue.  One has located the requisite "sufficiently-
determined" fool, and the results thereof are to be expected.






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201102192012.p1JKCKnP038248>