From owner-freebsd-stable@FreeBSD.ORG Sat May 26 12:53:16 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B59CA106566B for ; Sat, 26 May 2012 12:53:16 +0000 (UTC) (envelope-from matheus@eternamente.info) Received: from phoenix.eternamente.info (phoenix.eternamente.info [109.169.62.232]) by mx1.freebsd.org (Postfix) with ESMTP id 5E3708FC0C for ; Sat, 26 May 2012 12:53:16 +0000 (UTC) Received: by phoenix.eternamente.info (Postfix, from userid 80) id 9E5861CC50; Sat, 26 May 2012 09:53:09 -0300 (BRT) Received: from 187.115.176.56 (SquirrelMail authenticated user matheus) by eternamente.info with HTTP; Sat, 26 May 2012 09:53:09 -0300 Message-ID: In-Reply-To: References: <4FBCF2B6.1060200@sentex.net> <460e1bd626613f125b878f5be65a6b6e.squirrel@eternamente.info> Date: Sat, 26 May 2012 09:53:09 -0300 From: "Nenhum_de_Nos" To: freebsd-stable@freebsd.org User-Agent: SquirrelMail/1.4.21 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Subject: Re: siis_timeout with port multiplier on 9.0R X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 May 2012 12:53:16 -0000 On Wed, May 23, 2012 17:07, Nenhum_de_Nos wrote: > > On Wed, May 23, 2012 12:54, Nenhum_de_Nos wrote: >> >> On Wed, May 23, 2012 11:22, Mike Tancsa wrote: >>> On 5/21/2012 9:04 PM, Matthew Gamble wrote: >>>> We have a box with 3 SiI3124 SATA controllers and 9 CFI-B53PM 5 Port Backplane port >>>> multipliers >>>> (the "backblaze storage pod"). Under intense IO (ZFS rebuild, presently) the system will lock >>>> up all IO for 3-4 minutes and the following entry appears in the dmesg: >>>> >>>> siisch11: Timeout on slot 30 >>>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr >>>> 00000000 >>>> siisch11: ... waiting for slots 25000000 >>>> siisch11: Timeout on slot 26 >>>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr >>>> 00000000 >>>> siisch11: ... waiting for slots 21000000 >>>> siisch11: Timeout on slot 29 >>>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr >>>> 00000000 >>>> siisch11: ... waiting for slots 01000000 >>>> siisch11: Timeout on slot 24 >>>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr >>>> 00000000 >>>> >>>> The errors are on different siisch devices so its not likely to be a SATA cable issue unless >>>> multiple cables all went bad at the same time. On the advice of some other posts to the >>>> mailing >>>> list I've already tried locking the SATA rev to one with the following in /boot/loader.conf >>>> which didn't >>> >>> If they are on different siisch devices then yes, it does not sound like >>> a bad cable. However, I have had that issue with similar errors above >>> that were fixed by using new cables. If you are using 9.0R, I would >>> suggest upgrading to stable. There have been a few bug fixes / >>> improvements to the drivers as well as various parts of the disk >>> subsystem. I have RELENG8 right now and its quite stable for me on a >>> 25TB system which is for the most part similar to 9.x >>> >>> # zpool status >>> pool: zbackup1 >>> state: ONLINE >>> scan: scrub repaired 0 in 11h11m with 0 errors on Mon Jul 25 19:51:11 2011 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> zbackup1 ONLINE 0 0 0 >>> raidz1-0 ONLINE 0 0 0 >>> ada14 ONLINE 0 0 0 >>> ada16 ONLINE 0 0 0 >>> ada13 ONLINE 0 0 0 >>> ada15 ONLINE 0 0 0 >>> raidz1-1 ONLINE 0 0 0 >>> ada0 ONLINE 0 0 0 >>> ada1 ONLINE 0 0 0 >>> ada2 ONLINE 0 0 0 >>> ada3 ONLINE 0 0 0 >>> raidz1-2 ONLINE 0 0 0 >>> ada4 ONLINE 0 0 0 >>> ada5 ONLINE 0 0 0 >>> ada6 ONLINE 0 0 0 >>> ada7 ONLINE 0 0 0 >>> raidz1-3 ONLINE 0 0 0 >>> ada9 ONLINE 0 0 0 >>> ada10 ONLINE 0 0 0 >>> ada11 ONLINE 0 0 0 >>> ada12 ONLINE 0 0 0 >>> >>> errors: No known data errors >>> # zpool get all zbackup1 >>> NAME PROPERTY VALUE SOURCE >>> zbackup1 size 25.4T - >>> zbackup1 capacity 68% - >>> zbackup1 altroot - default >>> zbackup1 health ONLINE - >>> zbackup1 guid 917659042733882722 default >>> zbackup1 version 28 default >>> zbackup1 bootfs - default >>> zbackup1 delegation on default >>> zbackup1 autoreplace off default >>> zbackup1 cachefile - default >>> zbackup1 failmode wait default >>> zbackup1 listsnapshots on local >>> zbackup1 autoexpand off default >>> zbackup1 dedupditto 0 default >>> zbackup1 dedupratio 1.00x - >>> zbackup1 free 7.95T - >>> zbackup1 allocated 17.4T - >>> zbackup1 readonly off - >>> zbackup1 comment - default >>> >>> This is on an adonics adaptor. >> >> my adapter is this adonics as well, and my lucky is not the same. the host card is also sis3124 >> PCI ? >> >> I will upgrade to 9-STABLE and try. >> >> thanks, >> >> matheus > > Mike, > > I saw FreeBSD webcvs info on siis.c. The only change in 9-STABLE is this: > > Revision 1.43.2.2: download - view: text, markup, annotated - select for diffs > Sat Dec 31 15:31:34 2011 UTC (4 months, 3 weeks ago) by hselasky > Branches: RELENG_9 > Diff to: previous 1.43.2.1: preferred, colored; branchpoint 1.43: preferred, colored; next MAIN > 1.44: preferred, colored > Changes since revision 1.43.2.1: +2 -7 lines > > SVN rev 229118 on 2011-12-31 15:31:34Z by hselasky > > MFC r227701, r227847 and r227849: > Move the device_delete_all_children() function from usb_util.c > to kern/subr_bus.c. Simplify this function so that it no longer > depends on malloc() to execute. Rename device_delete_all_children() > into device_delete_children(). Identify a few other places where > it makes sense to use device_delete_children(). > > all others, 9.0R has it. As i don't know this stuff, I can't tell how much it would affect my > issue (and the other Matheus/Matthew as well), but I imagine not much as it says something usb on > it :) > > as I'm not at home, will try the cabling thing when I get home. > > thanks, > > matheus Finished, unfortunately the same result :( I've also changed cables, used brand new ones, and the same thing happened :( thanks, matheus >>> ---Mike >>>> >>>> hint.siisch.0.sata_rev=1 >>>> hint.siisch.1.sata_rev=1 >>>> hint.siisch.2.sata_rev=1 >>>> hint.siisch.3.sata_rev=1 >>>> hint.siisch.4.sata_rev=1 >>>> hint.siisch.5.sata_rev=1 >>>> hint.siisch.6.sata_rev=1 >>>> hint.siisch.7.sata_rev=1 >>>> hint.siisch.8.sata_rev=1 >>>> hint.siisch.9.sata_rev=1 >>>> hint.siisch.10.sata_rev=1 >>>> hint.siisch.11.sata_rev=1 >>>> >>>> From time to time this is also causing one of the attached drives to go offline: >>>> >>>> siisch0: siis_timeout is 00040000 ss 40000000 rs 40000000 es 00000000 sts 801f2000 serr >>>> 00000000 >>>> (ada0:siisch0:0:0:0): lost device >>>> (ada0:siisch0:0:0:0): removing device entry >>>> ada0 at siisch0 bus 0 scbus0 target 0 lun 0 >>>> ada0: ATA-8 SATA 3.x device >>>> ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) >>>> ada0: Command Queueing enabled >>>> ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C) >>>> ada0: Previously was known as ad4 >>>> siisch11: Timeout on slot 30 >>>> >>>> When the drive goes offline that causes the ZFS rebuild to restart, and so it's never >>>> finishing >>>> the rebuild of the array. Does anyone have any insight into what could be causing the >>>> timeouts >>>> and what we can do to resolve them? Right now my priority is to get the system a bit more >>>> stable so the current ZFS rebuild can complete – right now it's been doing the same rebuild >>>> for just over 6 days and the timeouts and drive drop offs are causing it to restart >>>> constantly. >>>> >>>> >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> This electronic message contains information from Primus Telecommunications Canada Inc. >>>> ("PRIMUS") , which may be legally privileged and confidential. The information is intended to >>>> be for the use of the individual(s) or entity named above. If you are not the intended >>>> recipient, be aware that any disclosure, copying, distribution or use of the contents of this >>>> information is prohibited. If you have received this electronic message in error, please >>>> notify >>>> us by telephone or e-mail (to the number or address above) immediately. Any views, opinions or >>>> advice expressed in this electronic message are not necessarily the views, opinions or advice >>>> of PRIMUS. It is the responsibility of the recipient to ensure that any attachments are virus >>>> free and PRIMUS bears no responsibility for any loss or damage arising in any way from the use >>>> thereof.The term "PRIMUS" includes its affiliates. >>>> >>>> ________________________________ >>>> Pour la version en français de ce message, veuillez voir >>>> http://www.primustel.ca/fr/legal/cs.htm >>>> >>>> >>>> >>>> _______________________________________________ >>>> freebsd-stable@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >>> >>> >>> -- >>> ------------------- >>> Mike Tancsa, tel +1 519 651 3400 >>> Sentex Communications, mike@sentex.net >>> Providing Internet services since 1994 www.sentex.net >>> Cambridge, Ontario Canada http://www.tancsa.com/ >>> _______________________________________________ >>> freebsd-stable@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >>> >> >> >> -- >> We will call you Cygnus, >> The God of balance you shall be >> >> A: Because it messes up the order in which people normally read text. >> Q: Why is top-posting such a bad thing? >> >> http://en.wikipedia.org/wiki/Posting_style >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >> > > > -- > We will call you Cygnus, > The God of balance you shall be > > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > > http://en.wikipedia.org/wiki/Posting_style > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > -- We will call you Cygnus, The God of balance you shall be A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? http://en.wikipedia.org/wiki/Posting_style