Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Sep 2012 22:39:38 -0600
From:      "Kenneth D. Merry" <ken@FreeBSD.ORG>
To:        John <jwd@FreeBSD.ORG>
Cc:        FreeBSD iSCSI <freebsd-scsi@FreeBSD.ORG>
Subject:   Re: How to force a reset of a device (disk) in an enclosre slot
Message-ID:  <20120915043938.GA71754@nargothrond.kdm.org>
In-Reply-To: <20120915040907.GA5458@FreeBSD.org>
References:  <20120915022437.GA90210@FreeBSD.org> <20120915023329.GA55292@nargothrond.kdm.org> <20120915031305.GA97685@FreeBSD.org> <20120915032826.GA63349@nargothrond.kdm.org> <20120915040907.GA5458@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Sep 15, 2012 at 04:09:07 +0000, John wrote:
> ----- Kenneth D. Merry's Original Message -----
> > On Sat, Sep 15, 2012 at 03:13:05 +0000, John wrote:
> > > ----- Kenneth D. Merry's Original Message -----
> > > > On Sat, Sep 15, 2012 at 02:24:37 +0000, John wrote:
> > > > > Hi Folks,
> > > > > 
> > > > >    I've been poking around and can't seem to find a way to reset and
> > > > > hopefully acquire access to a disk device in an enclosure. For instance:
> > > > > 
> > > > > FreeBSD 9.1-PRERELEASE
> > > > > 
> > > > > # camcontrol smpphylist ses4
> > > > > 37 PHYs:
> > > > > PHY  Attached SAS Address
> > > > >   0  0x5000039368233602   <HP EG0600FBDSR HPD4>             (pass105,da98)
> > > > >   1  0x5000039368238e3e   <HP EG0600FBDSR HPD4>             (pass106,da99)
> > > > >   2  0x500003936823bca2   <HP EG0600FBDSR HPD4>             (pass107,da100)
> > > > >   3  0x500003936819507e   <HP EG0600FBDSR HPD4>             (pass108,da101)
> > > > >   4  0x5000039368197d5a   <HP EG0600FBDSR HPD4>             (pass109,da102)
> > > > >   5  0x5000039368197c6e   <HP EG0600FBDSR HPD4>             (pass110,da103)
> > > > >   6  0x500003936818770e   <HP EG0600FBDSR HPD2>             (pass111,da104)
> > > > >   7  0x5000039368238eba   <HP EG0600FBDSR HPD4>             (pass112,da105)
> > > > >   8  0x5000039368232f42   <HP EG0600FBDSR HPD4>             (pass113,da106)
> > > > >   9  0x0000000000000000
> > > > >  10  0x500003936813c31e
> > > > >  11  0x5000039368233892   <HP EG0600FBDSR HPD4>             (pass114,da107)
> > > > >  12  0x500003936813c2ca   <HP EG0600FBDSR HPD4>             (pass115,da108)
> > > > > ...
> > > > > 
> > > > > Note, bay/slot 10 has a listed device address. If I were to pull the
> > > > > drive and re-insert it, it would show up (as da390 in this case).
> > > > > The above is after a fresh reboot. Note da106 to da107 skipping
> > > > > slot 10 (slot 9 is empty).
> > > > > 
> > > > > The smp utils provide a similar view:
> > > > > 
> > > > > # smp_discover /dev/ses4 
> > > > >   phy   0:D:attached:[5000039368233602:00  t(SSP)]  6 Gbps
> > > > >   phy   1:D:attached:[5000039368238e3e:00  t(SSP)]  6 Gbps
> > > > >   phy   2:D:attached:[500003936823bca2:00  t(SSP)]  6 Gbps
> > > > >   phy   3:D:attached:[500003936819507e:00  t(SSP)]  6 Gbps
> > > > >   phy   4:D:attached:[5000039368197d5a:00  t(SSP)]  6 Gbps
> > > > >   phy   5:D:attached:[5000039368197c6e:00  t(SSP)]  6 Gbps
> > > > >   phy   6:D:attached:[500003936818770e:00  t(SSP)]  6 Gbps
> > > > >   phy   7:D:attached:[5000039368238eba:00  t(SSP)]  6 Gbps
> > > > >   phy   8:D:attached:[5000039368232f42:00  t(SSP)]  6 Gbps
> > > > >   phy  10:D:attached:[500003936813c31e:00  t(SSP)]  6 Gbps
> > > > >   phy  11:D:attached:[5000039368233892:00  t(SSP)]  6 Gbps
> > > > >   phy  12:D:attached:[500003936813c2ca:00  t(SSP)]  6 Gbps
> > > > > ...
> > > > > 
> > > > > The address of slot 10 matches. There is a disk in the slot - just
> > > > > isn't recognized and attached.
> > > > > 
> > > > > Back to the basic question. How can I issue a command to the enclosure
> > > > > to force a re-initialization of the device to recover it without
> > > > > having to physically pull & insert it. Even if the device numbers
> > > > > are not sequential, I need access to the drive...
> > > > 
> > > > You can try sending a link reset:
> > > > 
> > > > camcontrol smppc ses4 -p 10 -o linkreset
> > > > 
> > > > It may or may not work.  You can also try disabling the PHY (-o disable)
> > > > and then sending a link reset to re-enable the link.  You can also try a
> > > > hard reset (-o hardreset)
> > > 
> > > Hi Ken,
> > > 
> > > Well, I hadn't tried to actually disable the device. That did bring some
> > > reaction:
> > > 
> > > # camcontrol smppc ses4 -p 10 -o disable
> > > # camcontrol smpphylist ses4
> > > 37 PHYs:
> > > PHY  Attached SAS Address
> > >   0  0x5000039368233602   <HP EG0600FBDSR HPD4>             (pass105,da98)
> > > ....
> > >   8  0x5000039368232f42   <HP EG0600FBDSR HPD4>             (pass113,da106)
> > >   9  0x0000000000000000
> > >  10  0x0000000000000000
> > >  11  0x5000039368233892   <HP EG0600FBDSR HPD4>             (pass114,da107)
> > > ...
> > > 
> > > The device is gone.
> > > 
> > > # camcontrol smppc ses4 -p 10 -o hardreset
> > > root@vprzfs01p:/root # camcontrol smpphylist ses4
> > > 37 PHYs:
> > > PHY  Attached SAS Address
> > >   0  0x5000039368233602   <HP EG0600FBDSR HPD4>             (pass105,da98)
> > > ....
> > >   8  0x5000039368232f42   <HP EG0600FBDSR HPD4>             (pass113,da106)
> > >   9  0x0000000000000000
> > >  10  0x500003936813c31e
> > >  11  0x5000039368233892   <HP EG0600FBDSR HPD4>             (pass114,da107)
> > > ...
> > > 
> > > The device is back, but not attached - This msg:
> > > 
> > > kernel: mps1: mpssas_alloc_tm freezing simq
> > > kernel: mps1: mpssas_remove_complete on handle 0x0069, IOCStatus= 0x0
> > > kernel: mps1: mpssas_free_tm releasing simq
> > > kernel: _mapping_add_new_device: failed to add the device with handle 0x0069 to persistent table because there is no free space available - entry 0
> > 
> > That message is harmless, it won't prevent the drive from attaching.
> > 
> > > >From a debug statement in the driver: MaxPersistentEntries == 128, but I
> > > have more than 128 devices per LSI card and they normally all show up -
> > > though I do get a bunch of the above messages in dmesg..
> > 
> > You might try turning on some of the debugging in the mps(4) driver and
> > disabling and resetting the link again.
> > 
> > Try:
> > 
> > sysctl -w dev.mps.0.debug_level=0xf
> > 
> > You might get a lot of output, so be prepared to reset it back to 4:
> > 
> > sysctl -w dev.mps.0.debug_level=4
> 
> Hi Ken,
> 
> I don't see anything obvious. Hopefully you're more familair with the
> code and have better eyes than I do... Here's everything from messages
> after the -o disable. There are some "unknown/unhandled"s showing up.

Here is where the drive shows up:

> kernel: mps_intr_locked sc 0xffffff8001353000 writing postindex 243
> kernel: mps_enqueue_request SMID 653 cm 0xffffff80013ca4a8 ccb 0
> kernel: mps_intr_locked sc 0xffffff8001353000 starting with replypostindex 243
> kernel: mps_intr_locked sc 0xffffff8001353000 writing postindex 244
> kernel: SAS Address from SAS device page0 = 500003936811feae
> kernel: Found device <401<SspTarg>,End Device> <6.0Gbps> <0x0078> <4/36>
> kernel: mpssas_rescan_target targetid 255
> kernel: mpssas_rescan
> kernel: 
> kernel: Target id 0xff added

It finds the device, with target ID 255 (which is a little suspicious) and
queues a rescan, but nothing happens after that.

You might try doing a manual rescan of that device to see what happens:

camcontrol rescan X:255:0

Where X is the scbus number from camcontrol devlist.

If that doesn't work, then we need to figure out what the maximum number of
targets supported by the adapter is.  To do that, set this in
/boot/loader.conf and reboot:

hw.mps.debug_level=1

That should result in the IOCFacts page getting printed on boot.

How many drives and other devices are currently attached to that
controller?  What controller model is it, and do you have IT or IR
firmware on it?

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120915043938.GA71754>