Date: Wed, 8 Jun 2016 00:28:38 +0100 From: Steven Hartland <killing@multiplay.co.uk> To: freebsd-scsi@freebsd.org Subject: Re: Avago LSI SAS 3008 & Intel SSD Timeouts Message-ID: <73dd23bd-7989-6dde-f3ff-e6e51610390a@multiplay.co.uk> In-Reply-To: <99b3b075-3158-29aa-3a33-311594fb9270@mindpackstudios.com> References: <30c04d8b-80cb-c637-26dc-97caebad3acb@mindpackstudios.com> <b30f968c-cc41-f7de-5a54-35bed961e65a@multiplay.co.uk> <08C01646-9AF3-4E89-A545-C051A284E039@sarenet.es> <986e03a7-5dc8-f5e0-5a17-4bf49459f905@mindpackstudios.com> <2823D96D-881D-4D40-B610-FC8292FA2FC5@sarenet.es> <4072b65d-25d4-2a79-5911-573517b0ee57@mindpackstudios.com> <583dddc6-4614-9900-88f7-27347866d7aa@mindpackstudios.com> <331da785-c88b-d74e-512a-37bdb618d512@multiplay.co.uk> <d8c3284c-97aa-7ae0-48e2-2d6b3e5dcf39@mindpackstudios.com> <94380b81-fcd7-511c-bc35-b8c5459d2ea4@multiplay.co.uk> <99b3b075-3158-29aa-3a33-311594fb9270@mindpackstudios.com>
next in thread | previous in thread | raw e-mail | index | archive | help
If that works I'd switch the 3008 into the machine with 2008 in currently and retest. That will help to confirm the 3008 card and driver is or isn't a potential issue. On 07/06/2016 23:43, list-news wrote: > No, it threw errors on both da6 and da7 and then I stopped it. > > Your last e-mail gave me thoughts though. I have a server with 2008 > controllers (entirely different backplane design, cpu, memory, etc). > I've moved the 4 drives to that and I'm running the test now. > > # uname = FreeBSD 10.2-RELEASE-p12 #1 r296215 > # sysctl dev.mps.0 > dev.mps.0.spinup_wait_time: 3 > dev.mps.0.chain_alloc_fail: 0 > dev.mps.0.enable_ssu: 1 > dev.mps.0.max_chains: 2048 > dev.mps.0.chain_free_lowwater: 1176 > dev.mps.0.chain_free: 2048 > dev.mps.0.io_cmds_highwater: 510 > dev.mps.0.io_cmds_active: 0 > dev.mps.0.driver_version: 20.00.00.00-fbsd > dev.mps.0.firmware_version: 17.00.01.00 > dev.mps.0.disable_msi: 0 > dev.mps.0.disable_msix: 0 > dev.mps.0.debug_level: 3 > dev.mps.0.%parent: pci5 > dev.mps.0.%pnpinfo: vendor=0x1000 device=0x0072 subvendor=0x1000 > subdevice=0x3020 class=0x010700 > dev.mps.0.%location: slot=0 function=0 > dev.mps.0.%driver: mps > dev.mps.0.%desc: Avago Technologies (LSI) SAS2008 > > About 1.5 hours has passed at full load, no errors. > > gstat drive busy% seems to hang out around 30-40 instead of ~60-70. > Overall throughput seems to be 20-30% less with my rough benchmarks. > > I'm not sure if this gets us closer to the answer, if this doesn't > time-out on the 2008 controller, it looks like one of these: > 1) The Intel drive firmware is being overloaded somehow when connected > to the 3008. > or > 2) The 3008 firmware or driver has an issue reading drive responses, > sporadically thinking the command timed-out (when maybe it really > didn't). > > Puzzle pieces: > A) Why does setting tags of 1 on drives connected to the 3008 fix the > problem? > B) With tags of 255. Why does postgres (and assuming a large fsync > count), seem to cause the problem within minutes? While running other > heavy i/o commands (zpool scrub, bonnie++, fio), all of which show > similarly high or higher iops take hours to cause the problem (if ever). > > I'll let this continue to run to further test. > > Thanks again for all the help.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?73dd23bd-7989-6dde-f3ff-e6e51610390a>