Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Jul 2018 12:28:29 +0200
From:      Oliver Sech <crimsonthunder@gmx.net>
To:        "Kenneth D. Merry" <ken@FreeBSD.ORG>
Cc:        freebsd-scsi@freebsd.org, slm@freebsd.org
Subject:   Re: problems with SAS JBODs 2
Message-ID:  <fcdf9aae-96fc-af08-264e-379c09a1c789@gmx.net>
In-Reply-To: <20180703142629.GF26046@mithlond.kdm.org>
References:  <trinity-14d18077-ea73-40f6-9e87-d2d4000b1f7e-1530620937871@3c-app-gmx-bs01> <20180703142629.GF26046@mithlond.kdm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> The most likely issue is that the mapping table stored on the card is messed
> up.  Can you send dmesg output with the following loader tunable set:
> 
> hw.mpr.debug_level=0x203
> 
> That will turn on debugging for the mapping code and may show the problem.
> 
> If you see messages like this:
> 
> mpr0: Attempting to reuse target id 63 handle 0x000b
> mpr0: Attempting to reuse target id 64 handle 0x000c
> mpr0: Attempting to reuse target id 65 handle 0x000d
> mpr0: Attempting to reuse target id 66 handle 0x000e
> mpr0: Attempting to reuse target id 67 handle 0x000f
> mpr0: Attempting to reuse target id 68 handle 0x0010
> mpr0: Attempting to reuse target id 69 handle 0x0011
> mpr0: Attempting to reuse target id 70 handle 0x0012
> mpr0: Attempting to reuse target id 66 handle 0x000e
> 
> It indicates that the mapping code is preventing some of the drives from
> fully probing because there are collisions in the table.
> 
> Unfortunately we have not yet fixed the problem in the other situation.
> (He is running with multipathing, which could be contributing to the
> problem.)
> 
> I have a script and utility that will clear the mapping table in the card,
> but that hasn't been enough to fix the other situation.  If you do have a
> mapping problem, I can give you the script/utility to clear the table and
> we can see whether it fixes your problem.
> 
> If not, it'll probably have to wait until Steve gets back from vacation.
> 
> Ken

I added the "hw.mpr.debug_level" tunable and collected logs on the whole connect -> disconnect -> connect problem.

logs collected:
first connect log: https://paste.docker.ist.ac.at/?6ec80dde0e1f236f#NufbXSs6o+dTDTPgZgWbU8vRQ6B47tMbQ8LHPkMXfIg=
first connect sesutil: https://paste.docker.ist.ac.at/?256810338f87adc1#/N3m6iFH304SxSxpnHCt0ocOeAU8zkBennul2/BcKpQ=

disconnected shelf log: https://paste.docker.ist.ac.at/?07ff1129a6cb6117#8WH8AjO1sO2hZlHE39h314CoQxxFZmBVZNo+Q8+qp4Q=
disconnected shelf mprutil: https://paste.docker.ist.ac.at/?eebaee72dc9e1cfe#WTlnO5vlPb7997lJCMswWfwtcq1rN04CaFbxmMWHqrU=

second connect log: https://paste.docker.ist.ac.at/?684ff32c6dae185b#nZ32x023ApRvNKrVUhvCr7xi5cYJnPhs9XNTfEW6sMw=
second connect sesutil: https://paste.docker.ist.ac.at/?f0302ce3aa8e55d7#+ZaJsCUiLh/7VsqBJ5oPHxZtRbM1dVS2RankrXePikw=
second connect mprutil: https://paste.docker.ist.ac.at/?4b8d347aed941c1f#wX7y0cjtb2gYKLU99IIftmDcFpKiV2QqjcC7YN96nB0=


If you are interested in investigating this further I can try to organize a "test environment" as I'm pretty sure this issue is not limited to my hardware?

best regards,
Oliver



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fcdf9aae-96fc-af08-264e-379c09a1c789>