Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Dec 2007 15:55:47 +0100
From:      Ulrich Spoerlein <uspoerlein@gmail.com>
To:        stable@freebsd.org, Hidetoshi Shimokawa <simokawa@freebsd.org>
Subject:   Re: sbp(4) write error wedging GEOM mirror
Message-ID:  <20071228145547.GC1532@roadrunner.spoerlein.net>
In-Reply-To: <20071228125437.GB1532@roadrunner.spoerlein.net>
References:  <20071228125437.GB1532@roadrunner.spoerlein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 28.12.2007 at 13:54:37 +0100, Ulrich Spoerlein wrote:
> [Ramblings about sbp(4) wedging geom mirror]

Ok, it looks like sbp(4) is off the hook. I tried the rebuilding again,
this time attaching da0 via umass(4) instead of sbp(4) and while it also
eventually wedges, umass can recover from this situation by its own

umass0: Prolific PL-3507C USB Storage Device, rev 2.00/0.01, addr 2
da0 at umass-sim0 bus 0 target 0 lun 0
da0: <SAMSUNG SP2514N VF10> Fixed Direct Access SCSI-0 device
da0: 40.000MB/s transfers
da0: 238475MB (488397168 512 byte sectors: 255H 63S/T 30401C)
GEOM_MIRROR: Component da0s1 (device gm0) broken, skipping.
GEOM_MIRROR: Cannot add disk da0s1 to gm0 (error=22).
GEOM_MIRROR: Component da0s2 (device gm1) broken, skipping.
GEOM_MIRROR: Cannot add disk da0s2 to gm1 (error=22).
GEOM_MIRROR: Component da0s1 (device gm0) broken, skipping.
GEOM_MIRROR: Cannot add disk da0s1 to gm0 (error=22).
GEOM_MIRROR: Component da0s1 (device gm0) broken, skipping.
GEOM_MIRROR: Cannot add disk da0s1 to gm0 (error=22).
GEOM_MIRROR: Device gm0: provider da0s1 detected.
GEOM_MIRROR: Device gm0: provider da0s1 is stale.
GEOM_MIRROR: Device gm1: provider da0s2 detected.
GEOM_MIRROR: Device gm1: provider da0s2 is stale.
GEOM_MIRROR: Device gm0: provider da0s1 disconnected.
GEOM_MIRROR: Device gm0: provider da0s1 detected.
GEOM_MIRROR: Device gm0: rebuilding provider da0s1.
fwohci0: BUS reset
fwohci0: node_id=0xc800ffc1, gen=2, CYCLEMASTER mode
firewire0: 2 nodes, maxhop <= 1, cable IRM = 1 (me)
firewire0: bus manager 1 (me)
fwohci0: txd err=14 ack busy_X
fwohci0: txd err=14 ack busy_X
fwohci0: txd err=14 ack busy_X
fwohci0: BUS reset
fwohci0: node_id=0xc800ffc1, gen=3, CYCLEMASTER mode
firewire0: 2 nodes, maxhop <= 1, cable IRM = 1 (me)
firewire0: bus manager 1 (me)
firewire0: New S400 device ID:0050770e013023f0
da1 at sbp0 bus 0 target 0 lun 0
da1: <Prolific PL-3507C Drive 2804> Fixed Simplified Direct Access SCSI-4 device
da1: 50.000MB/s transfers
da1: 381554MB (781422768 512 byte sectors: 255H 63S/T 48641C)
GEOM_MIRROR: Device gm2: provider da1 detected.
GEOM_MIRROR: Device gm2: rebuilding provider da1.
GEOM_MIRROR: Device gm0: rebuilding provider da0s1 finished.
GEOM_MIRROR: Device gm0: provider da0s1 activated.
GEOM_MIRROR: Device gm1: provider da0s2 disconnected.
GEOM_MIRROR: Device gm1: provider da0s2 detected.
GEOM_MIRROR: Device gm1: rebuilding provider da0s2.
(14:08:27) root@coyote: ~# gmirror status
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
GEOM_MIRROR: CannotGEOM_MIRROR: Synchronization request failed (error=5). da0s2[WRITE(offset=23111270 write metadata on da0s1 (device=gm0, error=5).
GEOM_MIRROR: Cannot update metada400, length=131072)]
GEOM_MIRROR: Device gm1: provider da0s2 disconnected.
GEOta on disk da0s1 (error=5).
M_MIRROR: Device gm1: rebuilding provider da0s2 stopped.
GEOM_MIRROR: Device gm0: provider da0s1 disconnected.
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
Expumass0: BBB reset failed, IOERROR
eumass0: BBB bulk-in clear stall failed, IOERROR
nsumass0: BBB bulk-out clear stall failed, IOERROR
ive timeout(9) function: 0xc09623a9(0xc32de800) 0.006188295 s
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
... (multiple pages)
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
(da0:umass-sim0:0:0:0): Synchronize cache failed, status == 0x4, scsi status == 0x0
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
... (multiple pages)
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
      Name    Status  Components
mirror/gm2  DEGRADED  ad1
                      da1 (12%)
mirror/gm0  DEGRADED  ad0s1
mirror/gm1  DEGRADED  ad0s2
(14:14:46) root@coyote: ~#
(14:14:46) root@coyote: ~# gmirror status
      Name    Status  Components
mirror/gm2  DEGRADED  ad1
                      da1 (16%)
mirror/gm0  DEGRADED  ad0s1
mirror/gm1  DEGRADED  ad0s2

(14:41:22) root@coyote: ~# fdisk -s /dev/da0
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
Expensive timeout(9) function: 0xc0690e74(0xc342a000) 0.007737115 s
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
fdisk: can't open device /dev/da0
fdisk: cannot open disk /dev/da0: Input/output error
Exit 1
(14:41:54) root@coyote: ~# camcontrol rescan 1
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
(da0:umass-sim0:0:0:0): lost device
(da0:umass-sim0:0:0:0): removing device entry
Re-scan of bus 1 was successful

So as you can see, after lots of stalled transfers GEOM mirror will do
the right thing and kick out the failing components. Something it cannot
do when it is attached via sbp(4).

Is this behaviour of sbp(4) tweakable?

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071228145547.GC1532>