From owner-freebsd-scsi@freebsd.org Tue Jun 7 23:28:37 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0E6B2B6DEE3 for ; Tue, 7 Jun 2016 23:28:37 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x231.google.com (mail-wm0-x231.google.com [IPv6:2a00:1450:400c:c09::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9AEDF1C3F for ; Tue, 7 Jun 2016 23:28:36 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x231.google.com with SMTP id k204so89100473wmk.0 for ; Tue, 07 Jun 2016 16:28:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=tnFN9BhE4FbQKpErinXYONtIG6CCIoCOSzMUFwLt9Ws=; b=YWoqU0qk7MNPVPY1UvaNEC8GBhkfgcOLWxp4OHHqyGspro+A8cvJAeEIGo4aTJkyjR OdX2Q4/FwoUzn5cycizRKNQZn6qQ/+pltWK3nrXXjxWjWgmwrLNs/sfyrryxkWkrjU8A RnOoGY78B3mVOaa2/a5VhFe1qz5UHq2AdSxSHqQFH18yHNsWisKoLQBdPwyNm7tAlpjI w6Nm2MXaNukfUPZFdKKbzz7z6lRLRHZvrwBOJlllSM24nkfhx9ncgU2ZsRbgaewe0W3g 4AOQTfoYU38rMlayNt81xq8tcGbuS1lXkzSUDpxbWnQu70UBLm61cc4ErWRdPAgVGyk2 KPBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=tnFN9BhE4FbQKpErinXYONtIG6CCIoCOSzMUFwLt9Ws=; b=V1Ot6IpBzDFJsqo8mze4QzwiGb3i7JEVFYtOWORPXRafWBSYZ8IzqFVyyy7nhne35E w6pKzmUG0YY+DmuZu8uYPYrTjHdB9Hs/krI1X4ivqAcpitNB7//DWoAT2a7/m6ghvm31 F7cG0HdCBJJwr5Q18sNUjOp0MKdHHS8grvgaboFS1FKsp+EIWaaQcnl+rj9jv0CO2wV4 ZO44BQ+PiIEI09qP43hRBYdlYvF0OLJNeDU5QbuQMTtDmI+NXqo03yzVICVy1mYdccjP RfRhX/1EycrRifmo4rMh2zmzUv8MOMsRHnCPrjlEYSh5vcbXkh0OJgGtpRAcSlmqOSep MiYQ== X-Gm-Message-State: ALyK8tKo9+yR8ga10ypevPmevWO2irjWT0B2YeulMdLPxQCONIV7gJ17K63FY4Jmz8dJggjv X-Received: by 10.28.132.144 with SMTP id g138mr4836615wmd.47.1465342113847; Tue, 07 Jun 2016 16:28:33 -0700 (PDT) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id d195sm21730589wmd.12.2016.06.07.16.28.32 for (version=TLSv1/SSLv3 cipher=OTHER); Tue, 07 Jun 2016 16:28:32 -0700 (PDT) Subject: Re: Avago LSI SAS 3008 & Intel SSD Timeouts To: freebsd-scsi@freebsd.org References: <30c04d8b-80cb-c637-26dc-97caebad3acb@mindpackstudios.com> <08C01646-9AF3-4E89-A545-C051A284E039@sarenet.es> <986e03a7-5dc8-f5e0-5a17-4bf49459f905@mindpackstudios.com> <2823D96D-881D-4D40-B610-FC8292FA2FC5@sarenet.es> <4072b65d-25d4-2a79-5911-573517b0ee57@mindpackstudios.com> <583dddc6-4614-9900-88f7-27347866d7aa@mindpackstudios.com> <331da785-c88b-d74e-512a-37bdb618d512@multiplay.co.uk> <94380b81-fcd7-511c-bc35-b8c5459d2ea4@multiplay.co.uk> <99b3b075-3158-29aa-3a33-311594fb9270@mindpackstudios.com> From: Steven Hartland Message-ID: <73dd23bd-7989-6dde-f3ff-e6e51610390a@multiplay.co.uk> Date: Wed, 8 Jun 2016 00:28:38 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: <99b3b075-3158-29aa-3a33-311594fb9270@mindpackstudios.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2016 23:28:37 -0000 If that works I'd switch the 3008 into the machine with 2008 in currently and retest. That will help to confirm the 3008 card and driver is or isn't a potential issue. On 07/06/2016 23:43, list-news wrote: > No, it threw errors on both da6 and da7 and then I stopped it. > > Your last e-mail gave me thoughts though. I have a server with 2008 > controllers (entirely different backplane design, cpu, memory, etc). > I've moved the 4 drives to that and I'm running the test now. > > # uname = FreeBSD 10.2-RELEASE-p12 #1 r296215 > # sysctl dev.mps.0 > dev.mps.0.spinup_wait_time: 3 > dev.mps.0.chain_alloc_fail: 0 > dev.mps.0.enable_ssu: 1 > dev.mps.0.max_chains: 2048 > dev.mps.0.chain_free_lowwater: 1176 > dev.mps.0.chain_free: 2048 > dev.mps.0.io_cmds_highwater: 510 > dev.mps.0.io_cmds_active: 0 > dev.mps.0.driver_version: 20.00.00.00-fbsd > dev.mps.0.firmware_version: 17.00.01.00 > dev.mps.0.disable_msi: 0 > dev.mps.0.disable_msix: 0 > dev.mps.0.debug_level: 3 > dev.mps.0.%parent: pci5 > dev.mps.0.%pnpinfo: vendor=0x1000 device=0x0072 subvendor=0x1000 > subdevice=0x3020 class=0x010700 > dev.mps.0.%location: slot=0 function=0 > dev.mps.0.%driver: mps > dev.mps.0.%desc: Avago Technologies (LSI) SAS2008 > > About 1.5 hours has passed at full load, no errors. > > gstat drive busy% seems to hang out around 30-40 instead of ~60-70. > Overall throughput seems to be 20-30% less with my rough benchmarks. > > I'm not sure if this gets us closer to the answer, if this doesn't > time-out on the 2008 controller, it looks like one of these: > 1) The Intel drive firmware is being overloaded somehow when connected > to the 3008. > or > 2) The 3008 firmware or driver has an issue reading drive responses, > sporadically thinking the command timed-out (when maybe it really > didn't). > > Puzzle pieces: > A) Why does setting tags of 1 on drives connected to the 3008 fix the > problem? > B) With tags of 255. Why does postgres (and assuming a large fsync > count), seem to cause the problem within minutes? While running other > heavy i/o commands (zpool scrub, bonnie++, fio), all of which show > similarly high or higher iops take hours to cause the problem (if ever). > > I'll let this continue to run to further test. > > Thanks again for all the help.