Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jan 2004 12:12:32 +0100
From:      Pawel Jakub Dawidek <nick@garage.freebsd.pl>
To:        Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc:        arch@freebsd.org
Subject:   Re: About removable disks, mountroot and sw-raid
Message-ID:  <20040121111232.GF393@garage.freebsd.pl>
In-Reply-To: <12416.1074036757@critter.freebsd.dk>
References:  <12416.1074036757@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help

--C94crkcyjafcjHxo
Content-Type: text/plain; charset=iso-8859-2
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jan 14, 2004 at 12:32:37AM +0100, Poul-Henning Kamp wrote:
+> There has been some discussions about how we handle removable disks
+> and mountroot, raid engines and such stuff.

I've think this over a bit and...

+> First I will present the scenarios from which I have analyzed the
+> situation.
+>=20
+> A:  Normal boot.
+> 	Machine boots kernel, in relatively short time all disks
+> 	are found.
+>=20
+> B:  Slow boot
+> 	Machine boots kernel, disk dribble in at a rate of one
+> 	every 20 seconds as the cabinet powers them up.
+>=20
+> C:  Boot with failed root disk.
+> 	Machine boots kernel, in relatively short time all disks
+> 	are found, root disk is not one of them.  (This is a strange
+> 	scenario I know, but it is important for the analysis).
+>=20
+> D:  Machine boots, all raid disks present.
+>=20
+> E:  Machine boots, one raid disk missing.
+>=20
+> F:  Machine running.  Operator plugs complete raid-set in, one disk
+>     at a time.
+>=20
+> The solution:
+> -------------
+>=20
+> I want to add a counter (protected by a mutex) which the diskdrivers
+> and GEOM will increment while they are configuring devices.
+>=20
+> That means that as soon as the ata-disk system notices that there
+> _may_ be a disk on a cable, it will increment this counter.
+>=20
+> If it subsequently determines that there wasn't a disk after all,
+> it decrements by one again.
+>=20
+> If it finds a disk, it hands it off to GEOM/disk_create(9) before
+> decrementing the counter.
+>=20
+> GEOM will similarly hold reference counts until all tasting have
+> settled down, so all geom classes have had their chance to do
+> their thing.
+>=20
+> mount_root will stall on this counter being non-zero, and when it
+> goes to zero, try to open the root dev and fail if it is not found.
+>=20
+> This solves scenario A.

I think this is done already in SYSINIT().
We can wait for every driver to tell us "I've finished looking for devices."
and then run our function for made use of every tasted disk.

+> Scenario B is only solvable with outside knowledge.  I propose to
+> add a tunable which says either how long time in total or maybe
+> more: useful how long time after the count went to zero before we
+> give up looking for the root dev.=20
+>=20
+> This means that the system will "stick around for a while" hoping
+> the missing disk appears, and after the timeout, it will fail.
+>=20
+> A default timeout of 40 seconds from the last disk appeared
+> sounds like a good shot at a default to me.

I'm not sure about that.
For what we want to wait? If every driver said that there are no more disks
on it, there is nothing to wait for, IMHO.

+> Provided what the user wants for scenario C is for mount_root to
+> fail, we have also solved that.  A magic timer configuration of -1
+> could mean "never", that caters for alternative desires.

Our function called from SYSINIT() may decide about this.

+> Now about sw-RAID (and mirror, and stripe and ...)
+>=20
+> In general these methods must collect tributaries until they are
+> satisfied they can run.
+>=20
+> For non-redundant configs this is trivial: all bits must be present.
+>=20
+> For redundant methods, the administrator will have to set a policy
+> and I can imagine the following policies:
+> 	1. Run when you have all tributaries.
+> 	2. Run when you have quorum (ie: one copy of mirror etc)
+> 	3. When you have quorum, run if no further tributaries have
+> 	   arrived in N seconds.
+>=20
+> Again a simple tunable integer can configure this (-1, 0, >0) and
+> maybe for simplicity we should use the same as we use for mountroot.

IMHO solution should be more general.
If module is loaded at boot time or if it is loaded as a kld module
there should be differences in action.
My solution is:
Set timeout in function called from SYSINIT() or in function called
on module load. After this timeout GEOM class assume that all disks
present in system was tasted and we should start with what we got.

That's why I proposed G_TA_FIRST/G_TA_LAST flag for taste event.

Note that event if we solve boot problem this problem (when to stop
waiting for taste events) still exists, so such timeout or additional
flag is still needed.

We also should not run tasting events before we're in proper boot stage,
then additional flags for taste event will make sens at boot time as well.

What do you think?

--=20
Pawel Jakub Dawidek                       pawel@dawidek.net
UNIX Systems Programmer/Administrator     http://garage.freebsd.pl
Am I Evil? Yes, I Am!                     http://cerber.sourceforge.net

--C94crkcyjafcjHxo
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (FreeBSD)

iQCVAwUBQA5eoD/PhmMH/Mf1AQGh3QQAieRW+NIEbyKHf3u0rkmyUmMtTen0yenc
aLSkz2E3Q/rrp7Gu+/PKl6DZMn/JKF2RJqYGHyh9vC+rk5qo5Gfyu1VaQdQex/w5
p40t2W9ke/01h/6siVpYM2Nsgkt7/HrVw40IXvY487QCkRSVKaa9Q9EWihU6sfnV
B8fYqjD9zO8=
=Mr21
-----END PGP SIGNATURE-----

--C94crkcyjafcjHxo--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040121111232.GF393>