Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Apr 2005 19:04:15 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Paul Mather <paul@gromit.dlib.vt.edu>
Cc:        freebsd-geom@freebsd.org
Subject:   Re: Is there a "disconnected" state for geom_mirror providers?
Message-ID:  <20050424170415.GC837@darkness.comp.waw.pl>
In-Reply-To: <1114360313.77313.14.camel@zappa.Chelsea-Ct.Org>
References:  <1114308801.71938.2.camel@zappa.Chelsea-Ct.Org> <20050424094148.GZ837@darkness.comp.waw.pl> <1114360313.77313.14.camel@zappa.Chelsea-Ct.Org>

next in thread | previous in thread | raw e-mail | index | archive | help

--GLdS9qjAGFrs7Ts6
Content-Type: text/plain; charset=iso-8859-2
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Apr 24, 2005 at 12:31:53PM -0400, Paul Mather wrote:
+> > If gmirror gets an error for READ or WRITE operation, it assumes provi=
der
+> > is broken. This is very important - if it will be marked only as stale,
+> > it will be connected, resynchronization will start, but because there
+> > was an error on provider, it probably will be disconnected again and we
+> > have endless loop.
+>=20
+> I guess it depends on what caused the disconnection in the first place.
+> If it was a READ of a bad sector, it could be that subsequent
+> resynchronisation will force a block reallocation of the bad block and
+> the drive will no longer be "broken."

So you want me to count number of failures of every sector and mark
component as broken if I've 2 failures related to the same sector or
something like that?:)

+> > Stale provider is when it is hot-plug and you remove it; when you use
+> > 'deactivate' command to disconnect it; when it doesn't show up on mirr=
or
+> > start, but later.
+> >=20
+> > The rule is simple: when an error was returned on I/O operation, provi=
der
+> > is marked as broken.
+>=20
+> Thanks for the clarification.  That makes sense.  I just need to
+> remember "gmirror forget" before I attempt to add back in the disk in my
+> "TIMEOUT - WRITE_DMA" not-really-broken broken disk case. :-)

If reallocation happens here, there should be no I/O error visible for
gmirror.

+> The shame about it being deleted from the mirror as opposed to marked as
+> "broken" is you lose info (shown in "gmirror list") about the broken
+> component priority, etc., which is useful for when you add a replacement
+> device (or re-add the same one, as in my case).

You can use 'gmirror dump /dev/<your_component>'.

+> If you marked a component as "broken" (but still listed as part of the
+> mirror), you could add a "-f" option to "gmirror rebuild" to force
+> rebuilding onto it a la RAIDframe. :-)

This is not so simple. I don't store any info on broken component, that it
is broken, because e.g. bad sector could be the sector with metadata.
Other components are informed that something wrong is going on.
How one can remove such broken component for good? Let's say you was able
to read metadata from the component, but you cannot write there any more.
How you can easily replace this component?
This complicates things a lot and I don't need more complications if I
want gmirror to stay reliable (which I hope it is now).

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--GLdS9qjAGFrs7Ts6
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (FreeBSD)

iD4DBQFCa9GPForvXbEpPzQRAiaEAJ9ijXlSdmaSEiVZmzmlMG/Qpv+QsQCY0QFF
mAViO3NgqYE8BH+7ojxc4Q==
=fVKR
-----END PGP SIGNATURE-----

--GLdS9qjAGFrs7Ts6--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050424170415.GC837>