Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Apr 2005 22:13:21 -0400
From:      Paul Mather <paul@gromit.dlib.vt.edu>
To:        freebsd-geom@freebsd.org
Subject:   Is there a "disconnected" state for geom_mirror providers?
Message-ID:  <1114308801.71938.2.camel@zappa.Chelsea-Ct.Org>

next in thread | raw e-mail | index | archive | help
Sadly, the "TIMEOUT - WRITE_DMA"-induced disk disconnections have
returned on my -CURRENT system since I upgraded to ATA Mk.III. :-(
However, I've noticed that when a drive is marked as failed and the
device detached, the provider also disappears from the geom_mirror it is
part of, instead of being marked as a "stale" or "disconnected" or
"missing" component of the remaining mirror components.  Is this the
correct behaviour?

In the latest failure to occur, ad0 was detached:

ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=49981679
ad0: FAILURE - device detached
subdisk0: detached
ad0: detached
GEOM_MIRROR: Cannot update metadata on disk ad0 (error=5).
GEOM_MIRROR: Cannot update metadata on disk ad0 (error=6).
GEOM_MIRROR: Device raid1: provider ad0 disconnected.
GEOM_MIRROR: Request failed (error=6). ad0[WRITE(offset=3847741440, length=16384)]


I performed an "atacontrol detach 0" followed by an "atacontrol attach
0" to "re-discover" the "failed" ad0 as part of the existing
geom_mirror.  This yielded the following:

acd0: detached
(cd0:ata0:0:1:0): lost device
(cd0:ata0:0:1:0): removing device entry
atapicam0: detached
stray irq14
ad0: 24405MB <IBM DJNA-352500 J51OA30K> at ata0-master UDMA33
GEOM_MIRROR: Component ad0 (device raid1) broken, skipping.
GEOM_MIRROR: Cannot add disk ad0 to raid1 (error=22).
acd0: DVDR <LITE-ON DVDRW SOHW-832S/VS08> at ata0-slave UDMA33
cd0 at ata0 bus 0 target 1 lun 0
cd0: <LITE-ON DVDRW SOHW-832S VS08> Removable CD-ROM SCSI-0 device 
cd0: 33.000MB/s transfers
cd0: cd present [1 x 2048 byte records]


The provider ad0 did not show up as a "stale" provider of my "raid1"
mirror (from which it had disappeared when it was detached due to the
"TIMEOUT - WRITE_DMA" failure).  I had to do a "gmirror forget raid1"
before a "gmirror insert raid1 ad0" would allow me to re-insert it so I
could perform a "gmirror rebuild raid1 ad0" to kick off synchronisation.

What is the definition of a "broken" component?  What is the difference
between a "stale" and a "broken" component?

If I were to detach and remove a hot-plug geom_mirror component and
subsequently re-attach it, will the component be considered "stale" or
"broken?"

This is not a major inconvenience (well, the return of the "TIMEOUT -
WRITE_DMA" errors are:), but I was just wondering why my failed
providers disappear now as opposed to being marked as stale as happened
in the past.

BTW, my system is a fairly recent -CURRENT: FreeBSD 6.0-CURRENT #0: Mon
Apr 18 12:25:24 EDT 2005.

Cheers,

Paul.
-- 
e-mail: paul@gromit.dlib.vt.edu

"Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid."
        --- Frank Vincent Zappa



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1114308801.71938.2.camel>