Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Jun 2016 00:28:38 +0100
From:      Steven Hartland <killing@multiplay.co.uk>
To:        freebsd-scsi@freebsd.org
Subject:   Re: Avago LSI SAS 3008 & Intel SSD Timeouts
Message-ID:  <73dd23bd-7989-6dde-f3ff-e6e51610390a@multiplay.co.uk>
In-Reply-To: <99b3b075-3158-29aa-3a33-311594fb9270@mindpackstudios.com>
References:  <30c04d8b-80cb-c637-26dc-97caebad3acb@mindpackstudios.com> <b30f968c-cc41-f7de-5a54-35bed961e65a@multiplay.co.uk> <08C01646-9AF3-4E89-A545-C051A284E039@sarenet.es> <986e03a7-5dc8-f5e0-5a17-4bf49459f905@mindpackstudios.com> <2823D96D-881D-4D40-B610-FC8292FA2FC5@sarenet.es> <4072b65d-25d4-2a79-5911-573517b0ee57@mindpackstudios.com> <583dddc6-4614-9900-88f7-27347866d7aa@mindpackstudios.com> <331da785-c88b-d74e-512a-37bdb618d512@multiplay.co.uk> <d8c3284c-97aa-7ae0-48e2-2d6b3e5dcf39@mindpackstudios.com> <94380b81-fcd7-511c-bc35-b8c5459d2ea4@multiplay.co.uk> <99b3b075-3158-29aa-3a33-311594fb9270@mindpackstudios.com>

next in thread | previous in thread | raw e-mail | index | archive | help
If that works I'd switch the 3008 into the machine with 2008 in 
currently and retest.  That will help to confirm the 3008 card and 
driver is or isn't a potential issue.

On 07/06/2016 23:43, list-news wrote:
> No, it threw errors on both da6 and da7 and then I stopped it.
>
> Your last e-mail gave me thoughts though.  I have a server with 2008 
> controllers (entirely different backplane design, cpu, memory, etc).  
> I've moved the 4 drives to that and I'm running the test now.
>
> # uname = FreeBSD 10.2-RELEASE-p12 #1 r296215
> # sysctl dev.mps.0
> dev.mps.0.spinup_wait_time: 3
> dev.mps.0.chain_alloc_fail: 0
> dev.mps.0.enable_ssu: 1
> dev.mps.0.max_chains: 2048
> dev.mps.0.chain_free_lowwater: 1176
> dev.mps.0.chain_free: 2048
> dev.mps.0.io_cmds_highwater: 510
> dev.mps.0.io_cmds_active: 0
> dev.mps.0.driver_version: 20.00.00.00-fbsd
> dev.mps.0.firmware_version: 17.00.01.00
> dev.mps.0.disable_msi: 0
> dev.mps.0.disable_msix: 0
> dev.mps.0.debug_level: 3
> dev.mps.0.%parent: pci5
> dev.mps.0.%pnpinfo: vendor=0x1000 device=0x0072 subvendor=0x1000 
> subdevice=0x3020 class=0x010700
> dev.mps.0.%location: slot=0 function=0
> dev.mps.0.%driver: mps
> dev.mps.0.%desc: Avago Technologies (LSI) SAS2008
>
> About 1.5 hours has passed at full load, no errors.
>
> gstat drive busy% seems to hang out around 30-40 instead of ~60-70.  
> Overall throughput seems to be 20-30% less with my rough benchmarks.
>
> I'm not sure if this gets us closer to the answer, if this doesn't 
> time-out on the 2008 controller, it looks like one of these:
> 1) The Intel drive firmware is being overloaded somehow when connected 
> to the 3008.
> or
> 2) The 3008 firmware or driver has an issue reading drive responses, 
> sporadically thinking the command timed-out (when maybe it really 
> didn't).
>
> Puzzle pieces:
> A) Why does setting tags of 1 on drives connected to the 3008 fix the 
> problem?
> B) With tags of 255.  Why does postgres (and assuming a large fsync 
> count), seem to cause the problem within minutes?  While running other 
> heavy i/o commands (zpool scrub, bonnie++, fio), all of which show 
> similarly high or higher iops take hours to cause the problem (if ever).
>
> I'll let this continue to run to further test.
>
> Thanks again for all the help.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?73dd23bd-7989-6dde-f3ff-e6e51610390a>