Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Jan 1999 20:55:53 +1030
From:      Greg Lehey <grog@lemis.com>
To:        Bernd Walter <ticso@cicely.de>, freebsd-scsi@FreeBSD.ORG
Subject:   Re: new Quirk candidate and vinum behavour
Message-ID:  <19990102205553.G66110@freebie.lemis.com>
In-Reply-To: <19990102105138.35033@cicely.de>; from Bernd Walter on Sat, Jan 02, 1999 at 10:51:38AM %2B0100
References:  <19990102105138.35033@cicely.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday,  2 January 1999 at 10:51:38 +0100, Bernd Walter wrote:
>
> I have had one of my hosts crashed sometime.
> Today I got a crash after setting logs to another volume:
>
> Jan  2 03:30:16 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> Jan  2 03:30:18 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31
> Jan  2 03:30:32 cicely7 syslogd: /var/log/messages: Input/output error
> Jan  2 03:30:32 cicely7 syslogd: /var/log/all.log: Input/output error
> Jan  2 03:30:32 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> Jan  2 03:30:32 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x0
> Jan  2 03:30:32 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 1 SCBs aborted
> Jan  2 03:30:32 cicely7 /kernel: vinum: subdisk var.p0.s0 is crashed
> Jan  2 03:30:32 cicely7 /kernel: vinum: plex var.p0 is degraded
> Jan  2 03:30:32 cicely7 /kernel: vinum: subdisk var.p0.s0 is stale
> Jan  2 03:30:32 cicely7 /kernel: vinum: plex var.p0 is down
> Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31
> Jan  2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> Jan  2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40
> Jan  2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted
> Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31
> Jan  2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> Jan  2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40
> Jan  2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted
> Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31
> Jan  2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> Jan  2 03:30:48 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40
> Jan  2 03:30:48 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 31 SCBs aborted
> Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> Jan  2 03:30:48 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 31
> Jan  2 03:30:48 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> Jan  2 03:30:49 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x0, SEQ_FLAGS == 0x40
> Jan  2 03:30:49 cicely7 /kernel: ahc0: Bus Device Reset on A:1. 16 SCBs aborted
> Jan  2 03:31:04 cicely7 /kernel: (da0:ahc0:0:1:0): tagged openings now 32
> Jan  2 03:31:04 cicely7 /kernel: ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
> Jan  2 03:31:04 cicely7 /kernel: SAVED_TCL == 0x10, ARG_1 == 0x20, SEQ_FLAGS == 0x40
> and so on ...
>
> As you can see the host was not realy crashed but unuseable after it
> happened.  The Problem with da0:ahc0:0:1:0 happens every time the
> tagged openings are increased The side effect is that I'm now
> running /var on a vinum volume on da1 and da2 which are drives on
> the same channel and it looks like the bdr or anything between the
> tag increase and the bdr is the reason for the subdisk crash.

On the face of it, of course, this is a SCSI problem, not a Vinum
problem.  Vinum reacted correctly to the error (this time :-).  But
we've seen a surprising number of this kind of problem in connection
with Vinum, and I think the reason is that Vinum tickles otherwise
unseen hardware problems in SCSI chains.  It's quite common for Vinum
to issue a series of I/O commands on a number of devices on a chain
(for example, with striped or RAID-5 volumes which require accessing
several drives at a time for a single user request).  You might like
to set debug flag 1:

vinum -> debug 1

This will log to syslogd details of all transfers; in combination with
the log you show, it might help the SCSI guys figure out where things
are happening.  But I have a suspicion that the real problem is
hardware (less than perfect SCSI chain, for whatever reason) rather
than software.

Greg
--
See complete headers for address, home page and phone numbers
finger grog@lemis.com for PGP public key

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990102205553.G66110>