Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jul 2018 14:50:02 -0600
From:      slm@freebsd.org
To:        Ken Merry <ken@freebsd.org>, Oliver Sech <crimsonthunder@gmx.net>
Cc:        FreeBSD-scsi <freebsd-scsi@freebsd.org>
Subject:   RE: problems with SAS JBODs 2
Message-ID:  <6bc79bf80dbfbba8e77bb40d5b1a0512@mail.gmail.com>
In-Reply-To: <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org>
References:  <trinity-14d18077-ea73-40f6-9e87-d2d4000b1f7e-1530620937871@3c-app-gmx-bs01> <CAOtMX2h8r31AeNCKyckK2P0VLn1CKFogo9bWom2So1x2ngpa4A@mail.gmail.com> <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <b785fe02-9242-c95f-56cb-2130f90e17f5@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
I'm think this is a mapping table problem or the use_phy_num problem. I'm
having Oliver change the use_phy_num sysctl values to 0 and then use your
script to clear out the controller mapping entries to see what happens.

Steve

> -----Original Message-----
> From: Ken Merry [mailto:ken@freebsd.org]
> Sent: Wednesday, July 11, 2018 2:35 PM
> To: Stephen Mcconnell; Oliver Sech
> Cc: FreeBSD-scsi
> Subject: Re: problems with SAS JBODs 2
>
> Yes, I agree, Oliver=E2=80=99s problem looks different.
>
> Oliver, for your second set of files (freebsd_sas2.zip) it looks like you
> may
> have devices that aren=E2=80=99t completely going away, even from a SAS
> standpoint.
>
> Here are the 25 target IDs that show up in 2_shelf_connected_dmesg.txt:
>
> mpr0: mprsas_add_device: Target ID for added device is 467.
> mpr0: mprsas_add_device: Target ID for added device is 468.
> mpr0: mprsas_add_device: Target ID for added device is 469.
> mpr0: mprsas_add_device: Target ID for added device is 470.
> mpr0: mprsas_add_device: Target ID for added device is 471.
> mpr0: mprsas_add_device: Target ID for added device is 472.
> mpr0: mprsas_add_device: Target ID for added device is 473.
> mpr0: mprsas_add_device: Target ID for added device is 474.
> mpr0: mprsas_add_device: Target ID for added device is 475.
> mpr0: mprsas_add_device: Target ID for added device is 476.
> mpr0: mprsas_add_device: Target ID for added device is 477.
> mpr0: mprsas_add_device: Target ID for added device is 478.
> mpr0: mprsas_add_device: Target ID for added device is 479.
> mpr0: mprsas_add_device: Target ID for added device is 480.
> mpr0: mprsas_add_device: Target ID for added device is 481.
> mpr0: mprsas_add_device: Target ID for added device is 482.
> mpr0: mprsas_add_device: Target ID for added device is 483.
> mpr0: mprsas_add_device: Target ID for added device is 484.
> mpr0: mprsas_add_device: Target ID for added device is 485.
> mpr0: mprsas_add_device: Target ID for added device is 486.
> mpr0: mprsas_add_device: Target ID for added device is 487.
> mpr0: mprsas_add_device: Target ID for added device is 488.
> mpr0: mprsas_add_device: Target ID for added device is 489.
> mpr0: mprsas_add_device: Target ID for added device is 490.
> mpr0: mprsas_add_device: Target ID for added device is 503.
>
> Here are the 8 target IDs that disappear in
> 3_shelf_disconnected_dmesg.txt:
>
> mpr0: mprsas_prepare_remove: Sending reset for target ID 467
> mpr0: mprsas_prepare_remove: Sending reset for target ID 468
> mpr0: mprsas_prepare_remove: Sending reset for target ID 469
> mpr0: mprsas_prepare_remove: Sending reset for target ID 470
> mpr0: mprsas_prepare_remove: Sending reset for target ID 471
> mpr0: mprsas_prepare_remove: Sending reset for target ID 472
> mpr0: mprsas_prepare_remove: Sending reset for target ID 473
> mpr0: mprsas_prepare_remove: Sending reset for target ID 474
>
> And here are the same 8 target IDs getting added in
> 4_shelf_reconnected_dmesg.txt:
>
> mpr0: mprsas_add_device: Target ID for added device is 467.
> mpr0: mprsas_add_device: Target ID for added device is 468.
> mpr0: mprsas_add_device: Target ID for added device is 469.
> mpr0: mprsas_add_device: Target ID for added device is 470.
> mpr0: mprsas_add_device: Target ID for added device is 471.
> mpr0: mprsas_add_device: Target ID for added device is 472.
> mpr0: mprsas_add_device: Target ID for added device is 473.
> mpr0: mprsas_add_device: Target ID for added device is 474.
>
> Oliver, what happens when you try to do I/O to the devices that don=E2=80=
=99t go
> away after you pull the cable?  Does that cause the devices to go away?
>
> Looking at the mprutil output, it also shows the devices sticking around
> from
> the adapter=E2=80=99s standpoint.
>
> You can also try a =E2=80=98camcontrol rescan all=E2=80=99 or a =E2=80=98=
camcontrol rescan N=E2=80=99
> (where N
> is the scbus number shown by =E2=80=98camcontrol devlist -v=E2=80=99).  T=
hat will do some
> basic probes for each of the devices and should in theory cause them to g=
o
> away if they aren=E2=80=99t accessible.
>
> It seems like the adapter may not be recognizing that the devices in
> question
> have gone.
>
> Steve, do you have any ideas what could be going on?
>
> Ken
> =E2=80=94
> Ken Merry
> ken@FreeBSD.ORG
>
>
>
> > On Jul 10, 2018, at 11:48 AM, Stephen Mcconnell via freebsd-scsi
> > <freebsd-
> scsi@freebsd.org> wrote:
> >
> > Ken, I looked at the logs and I don't see anything in them that suggest=
s
> > that the driver is not adding any of the devices. In fact, I don't see
> > anything that looks strange at all. This looks like a different problem
> > than
> > the other one you mentioned. What do you think?
> >
> > Steve
> >
> >> -----Original Message-----
> >> From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com]
> >> Sent: Tuesday, July 10, 2018 9:28 AM
> >> To: 'Oliver Sech'; 'FreeBSD-scsi'
> >> Subject: RE: problems with SAS JBODs 2
> >>
> >> Hi Oliver, I can't get to your links. Can you try to send the logs in
> >> another
> >> way?
> >>
> >> Steve
> >>
> >>> -----Original Message-----
> >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd-
> >>> scsi@freebsd.org] On Behalf Of Oliver Sech
> >>> Sent: Tuesday, July 10, 2018 9:14 AM
> >>> To: FreeBSD-scsi
> >>> Subject: Re: problems with SAS JBODs 2
> >>>
> >>> I tested a few additional things. I don't think this is a multipath,
> >>> daisy
> >> chain
> >>> nor a SAS wide ports problem.
> >>> I can reproduce the problem with just a single connection to an
> >>> Expander/JBOD.
> >>>
> >>> Test:
> >>> * physically disconnect all shelves
> >>> * reboot system
> >>> * connect one shelf via SAS cable
> >>> * check number of disks (after a reboot everything always shows up)
> >>> * disconnect the shelf and wait (geom disk list still shows most
> >>> disks.)
> >>> * connect the shelf (missing disks)
> >>>
> >>> Tested Hardware:
> >>> * Supermicro SAS3 847E2C-R1K28JBOD     + SAS3 LSI 9305-16e ( internal
> >> daisy
> >>> chain + wide links)
> >>> * Supermicro SAS3 847E2C-R1K28JBOD     + SAS3 LSI 9305-16e (straight
> HBA
> >> <-
> >>>> EXPANDER connection. (no wide links, no daisy chain))
> >>> * Supermicro SAS2 SC847E26-RJBOD1      + SAS3 LSI 9305-16e (internal
> >>> daisy
> >>> chain)
> >>> * Promise    SAS2 VTrak 830            + SAS3 LSI 9305-16e (straight
> >>> HBA
> >>> <->
> >>> EXPANDER connection.)
> >>>
> >>>
> >>>
> >>> On 07/04/2018 12:15 PM, Oliver Sech wrote:
> >>>>> 1) Are the expanders daisy chained?  Some SAS expanders don't work
> >>> reliably
> >>>>> when daisy chained.   Best to direct connect each one to the server=
.
> >>>> At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4
> >>> lanes?).
> >>>> Unfortunately the JBOD has 24 slots in the front and 20 in the back
> >>>> and,
> >>> those are connected via a internal SAS daisy chaining.
> >>>> I could rewire and connect each backplane directly to the server, bu=
t
> >>> unfortunately I do not have enough ports..
> >>>>
> >>>> JOBD Model: Supermicro 847E2C-R1K28JBOD
> >>>>
> >>>>> 2) Are the expanders connected in multipath or single path?  You
> need
> >>>>> geom_multipath if you're going to do that.
> >>>> See answer 1. There is a single path from the host to the first
> >>>> expander.
> >>>>
> >>>>> 3) Are you attempting to use wide ports (two SAS cables connecting
> >> each
> >>>>> expander to the HBA).  If do, you'll need to make sure that each
> >>>>> pair
> >>>>> of
> >>>>> SAS cables goes to the same HBA chip (not merely the same card, as
> >> some
> >>>>> cards contain two HBA chips).
> >>>> see 1. The last time I opened one of those JBODs there were 8 SAS
> >>>> cables
> >>> between the Front and Back expander. I assume that wide ports are
> being
> >>> used.
> >>>> (2 expanders per backplane as well)
> >>>>
> >>>>> 4) Are you trying to remove an expander while ZFS is active on that
> >>>>> expander?  That will suspend your pool, and ZFS doesn't always
> >>>>> recover
> >>> from
> >>>>> a suspended state.
> >>>> I'm testing with a new unused disk shelf that was never part of the
> >>>> ZFS
> >>> pool. There were
> >>>> _______________________________________________
> >>>> freebsd-scsi@freebsd.org mailing list
> >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> >>>> To unsubscribe, send any mail to
> >>>> "freebsd-scsi-unsubscribe@freebsd.org"
> >>> _______________________________________________
> >>> freebsd-scsi@freebsd.org mailing list
> >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> >>> To unsubscribe, send any mail to "freebsd-scsi-
> unsubscribe@freebsd.org"
> > _______________________________________________
> > freebsd-scsi@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6bc79bf80dbfbba8e77bb40d5b1a0512>