Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Apr 2002 22:19:59 -0400
From:      utsl@quic.net
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        "Andrew P. Lentvorski" <bsder@allcaps.org>, freebsd-fs@freebsd.org
Subject:   Re: Non-standard root filesystems
Message-ID:  <20020501021959.GA20232@quic.net>
In-Reply-To: <3CCF3D98.3495D84D@mindspring.com>
References:  <20020429153020.Q16532-100000@mail.allcaps.org> <3CCEC7D5.D22356A0@mindspring.com> <20020430204153.GB3603@quic.net> <3CCF3D98.3495D84D@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 30, 2002 at 05:58:00PM -0700, Terry Lambert wrote:
> utsl@quic.net wrote:
> > On Tue, Apr 30, 2002 at 09:35:33AM -0700, Terry Lambert wrote:
> > > FreeBSD treats root mounts as "special", relative to all other
> > > mounts.  This is a design error, but overcoming it requires a
> > > reorganization of the mount code that's not really politically
> > > easy to accomplish, even though it's technically very easy.
> > >
> > > Some of the stuff Poul is doing right now will probably help
> > > you in the future with assembing things like RAID-able
> > > volumes in the future -- but not help you right now.
> > 
> > Linux has a syscall (pivot_root) to swap the root with another mounted
> > filesystem. It is occasionally quite useful, and I've been wondering
> > about implementing it (or something similar) on FreeBSD.
> > 
> > Possibly you can tell me why that wouldn't work, or would be a bad
> > idea.
> 
> Doing that would be very hard.  The way mount points work
> won't exactly make it impossible, but it won't make it easy.
> 
> Here's the architectural fix:
> 
> 1)	Seperate the mount point covering code from the per FS
> 	mounting code.

I'm not sure what you're talking about here. Could you point me at 
files and functions to read? I'm not particularly familiar with VFS. (My
kernel hacking days were years ago, and not on a Unix or even Unix-like
kernel...)

> 2)	Add a seperate VOP for setting the "mounted on" information
> 	into the superblock (some FS's, like FFS, like to record
> 	the "last mounted on" information; this is actually not
> 	used for anything that I've ever seen (right now), so it
> 	would probably be OK to rip out completely (right now; it
> 	could later be useful for automounting and getting rid of
> 	/etc/fstab entirely).

I'd think this wouldn't be necessary. I've never seen the last mounted
tag used for anything, either.

I'm not sure why you'd want to get rid of /etc/fstab.

> 3)	When mounting an FS at the VFS_MOUNT layer, simply get a
> 	pointer into the list of mounted file systems.  *DO NOT*
> 	deal with the mount point covering at all in the per FS
> 	code!
> 
> 4)	Deal with the mount point covering in the higher level
> 	code; this reduces the amount of crap you have to
> 	parse in a per FS manner anyway.  The covering is done
> 	by referencing the FS in the system mounted FS layer
> 	from #3 (above).
> 
> At this point, from the VFS perspective, all mounts -- root and
> non-root -- are exactly the same: you implement the one type of
> mount (the "fill in this mount table entry and set up the in core
> mount structure data" kind), and it's taken care of... the only
> difference between a root and a non-root mount is the vnode
> covering code for the mount, and that all uses the same code at
> a higher layer.

Hmm. Sounds like there's some complexity I missed. In any case, this is
well beyond me. It sounds like you're saying there's some code that I
haven't seen that would need to be refactored between VFS and FS.
Changes like these should be made by someone who knows what he's doing,
and I clearly don't. 

> This would also make your "pivot" FS work correctly... to do that,
> you would have to cover an opaque vnode.  You could actually do
> this with any vnode, by revoking the vnode, and making it a deadfs
> vnode.

I'm not sure what you mean by "cover an opaque vnode." I don't think I
know enough about how VFS mounts work in FreeBSD to discuss this
intelligently. Maybe after a lot of reading...

> > In my case, I have production systems running Linux with software RAID.
> > I would much rather run hardware RAID and FreeBSD, but I have no budget
> > to buy SCSI RAID controllers. Switching to FreeBSD+Vinum would be a
> > reasonable solution, but I can't mirror root, and that creates a
> > political problem. I get, "If FreeBSD and Vinum will be better, how come
> > you can't mirror the root filesystem?"
> 
> How does mirroring the root FS recover after an error?  If you
> can't load the kernel to load the software RAID, then you can't
> run the software RAID to recover from a failure, right?

Assuming RAID 1, you have a 50% chance that the primary disk fails. If
the secondary disk fails (not first to boot), shutdown, replace it, and
reboot. On most systems nowadays, it's possible to set a boot order so
that the BIOS will try to boot the second drive, if the first drive
doesn't boot. That will work sometimes.

So if you have to, you boot from something else. (The other disk most likely,
or possibly floppy, CD, or network.) At least there'd be something there
to recover. With a root mirror, the worst case is still much less
painful than a complete restore from tape.

> How does Linux solve this problem?  *Does* Linux solve this
> problem, or are we really talking about an unrecoverable
> condition that Linux lets you get yourself into, but FreeBSD
> doesn't?

About the way I described above. It's more of a problem for firmware
and/or boot loader than OS.

As for unrecoverable: I'd much rather drive in, swap a disk, reboot from
floppy, and get to go home when the mirror resyncs, than have to do
restore from backup. I _hate_ restoring root filesystems from backups.

	---Nathan

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020501021959.GA20232>