Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Jul 2013 08:30:58 -0700
From:      Jeremy Chadwick <jdc@koitsu.org>
To:        Outback Dingo <outbackdingo@gmail.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Stable/9 from today mpssas_scsiio timeouts
Message-ID:  <20130709153058.GA8769@icarus.home.lan>
In-Reply-To: <CAKYr3zzTGKaW=AfEMY5vdRMKBsbsQsBVVHCufhtF-FV-pOyMCA@mail.gmail.com>
References:  <CAKYr3zwHFiJzO-F3BJ1RrRWJkKWmqPJDvEmMigH0Fg8Epy7Znw@mail.gmail.com> <20130709123900.GA5828@icarus.home.lan> <CAKYr3zxCkk0Cqv%2B-K-G0RQFBiCaTNZdDUzPiyJEP-_LR1i2rQQ@mail.gmail.com> <CAKYr3zxLcS4pGgA=o_xVqDKFUzHGuhDcjYaCu42Kn3rkPvXfxA@mail.gmail.com> <20130709144614.GA7538@icarus.home.lan> <CAKYr3zzTGKaW=AfEMY5vdRMKBsbsQsBVVHCufhtF-FV-pOyMCA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 09, 2013 at 11:20:45AM -0400, Outback Dingo wrote:
> On Tue, Jul 9, 2013 at 10:46 AM, Jeremy Chadwick <jdc@koitsu.org> wrote:
> 
> > On Tue, Jul 09, 2013 at 09:47:01AM -0400, Outback Dingo wrote:
> > > On Tue, Jul 9, 2013 at 9:44 AM, Outback Dingo <outbackdingo@gmail.com
> > >wrote:
> > > > On Tue, Jul 9, 2013 at 8:39 AM, Jeremy Chadwick <jdc@koitsu.org>
> > wrote:
> > > >
> > > >> On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote:
> > > >> > as of stable today im seeing alot of new mps time outs
> > > >> >
> > > >> > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul  8 16:34:28 UTC
> > 2013
> > > >> > root@:/usr/obj/nas/usr/src/sys/
> > > >> >
> > > >> > mps1@pci0:130:0:0:      class=0x010700 card=0x30201000
> > chip=0x00721000
> > > >> > rev=0x03 hdr=0x00
> > > >> >     vendor     = 'LSI Logic / Symbios Logic'
> > > >> >     device     = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]'
> > > >> >     class      = mass storage
> > > >> >     subclass   = SAS
> > > >> >
> > > >> >
> > > >> > mps0: mpssas_scsiio_timeout checking sc 0xffffff8002145000 cm
> > > >> > 0xffffff80021a6b78
> > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36
> > SMID
> > > >> 983
> > > >> > command timeout cm 0xffffff80021a6b78 ccb 0xfffffe002bb5f800
> > > >> > mps0: mpssas_alloc_tm freezing simq
> > > >> > mps0: timedout cm 0xffffff80021a6b78 allocated tm 0xffffff80021587b0
> > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36
> > SMID
> > > >> 983
> > > >> > completed timedout cm 0xffffff80021a6b78 ccb 0xfffffe002bb5f800
> > during
> > > >> > recovery ioc 8048 scsi 0 state c xfer 0
> > > >> > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a code
> > 0x0
> > > >> count
> > > >> > 1
> > > >> > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting
> > TaskMID
> > > >> 983
> > > >> > mps0: mpssas_free_tm releasing simq
> > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00
> > > >> > (probe40:mps0:0:40:0): CAM status: Command timeout
> > > >> > (probe40:mps0:0:40:0): Retrying command
> > > >> > mps1: mpssas_scsiio_timeout checking sc 0xffffff8002384000 cm
> > > >> > 0xffffff80023e5b78
> > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36
> > SMID
> > > >> 983
> > > >> > command timeout cm 0xffffff80023e5b78 ccb 0xfffffe002be14800
> > > >> > mps1: mpssas_alloc_tm freezing simq
> > > >> > mps1: timedout cm 0xffffff80023e5b78 allocated tm 0xffffff80023977b0
> > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36
> > SMID
> > > >> 983
> > > >> > completed timedout cm 0xffffff80023e5b78 ccb 0xfffffe002be14800
> > during
> > > >> > recovery ioc 8048 scsi 0 state c xfer 0
> > > >> > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a code
> > 0x0
> > > >> count
> > > >> > 1
> > > >> > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting
> > TaskMID
> > > >> 983
> > > >> > mps1: mpssas_free_tm releasing simq
> > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00
> > > >> > (probe292:mps1:0:37:0): CAM status: Command timeout
> > > >> > (probe292:mps1:0:37:0): Retrying command
> > > >>
> > > >> 1. What revision were you running before (i.e. what were you on prior
> > to
> > > >> the upgrade)?
> > > >>
> > > >
> > > >
> > > > Sorry I was on 252595 from July 3
> >
> > And does rolling back to r252595 resolve the problem for you?
> >
> > Because the only commit I see between r253035 and r252595 that might
> > account for some kind of behavioural change, unless I missed one while
> > skimming the commit history, is the following:
> >
> > r252730 -- http://www.freshbsd.org/commit/freebsd/r252730
> >
> > If at all possible, please try updating to r253037 or newer to see
> > if that has some effect/improvement.  Why I mention that commit:
> >
> > r253037 -- http://www.freshbsd.org/commit/freebsd/r253037
> >
> > Because the only mps(4) changes done in recent days are:
> >
> > http://svnweb.freebsd.org/base/stable/9/sys/dev/mps/mps_sas.c?view=log
> >
> > r253037
> > r251899
> > r251874
> >
> 
> i can say this its between July 4, and 253048, im rolling back to 252723 to
> validate a good known working state

Looking at your dmesg, it looks like the "errors" might be for SAS ports
which don't have any actual devices (disks) attached to them, yet parts
of the kernel (not sure which layer) are still trying to submit INQUIRY
commands to those ports as if they did have disks attached.

It looks like you see this behaviour on boot up, and then later during
normal operation at some point (a LUN scan or rescan or "bus taste"
might cause this to happen; for example I know that "zpool import" in
effect can sometimes cause this behaviour -- on one of my systems "zpool
import" would cause the servers' floppy drive to spin up/chunk briefly).

I'm hoping Steven or mav@ might be able to confirm/deny my theory here.

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130709153058.GA8769>