From owner-freebsd-hardware@FreeBSD.ORG Mon Jul 22 11:06:44 2013 Return-Path: Delivered-To: freebsd-hardware@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EEFEA44C for ; Mon, 22 Jul 2013 11:06:44 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C6C152499 for ; Mon, 22 Jul 2013 11:06:44 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r6MB6iuF053689 for ; Mon, 22 Jul 2013 11:06:44 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r6MB6i5S053687 for freebsd-hardware@FreeBSD.org; Mon, 22 Jul 2013 11:06:44 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 22 Jul 2013 11:06:44 GMT Message-Id: <201307221106.r6MB6i5S053687@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-hardware@FreeBSD.org Subject: Current problem reports assigned to freebsd-hardware@FreeBSD.org X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jul 2013 11:06:45 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- f kern/156241 hardware [mfi] 'zfs send' does not prevents disks to suspend if 1 problem total. From owner-freebsd-hardware@FreeBSD.ORG Mon Jul 22 14:36:13 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3A274D3C for ; Mon, 22 Jul 2013 14:36:13 +0000 (UTC) (envelope-from Bob.Bawn@nirvanix.com) Received: from db9outboundpool.messaging.microsoft.com (mail-db9lp0250.outbound.messaging.microsoft.com [213.199.154.250]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A24F82219 for ; Mon, 22 Jul 2013 14:36:12 +0000 (UTC) Received: from mail128-db9-R.bigfish.com (10.174.16.244) by DB9EHSOBE033.bigfish.com (10.174.14.96) with Microsoft SMTP Server id 14.1.225.22; Mon, 22 Jul 2013 14:36:04 +0000 Received: from mail128-db9 (localhost [127.0.0.1]) by mail128-db9-R.bigfish.com (Postfix) with ESMTP id D229240252 for ; Mon, 22 Jul 2013 14:36:04 +0000 (UTC) X-Forefront-Antispam-Report: CIP:208.84.97.55; KIP:(null); UIP:(null); IPV:NLI; H:CORPEX001.nirvanix.com; RD:mail.nirvanix.com; EFVD:NLI X-SpamScore: 10 X-BigFish: VPS10(zz103dKzz1f42h208ch1ee6h1de0h1fdah2073h1202h1e76h1d1ah1d2ah1fc6hzz177df4h17326ah1de097h1de096h8275bhf73b6uz2dh2a8h668h839h944hd25hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h14ddh1504h1537h153bh15d0h162dh1631h1758h18e1h1946h19b5h19ceh1b0ah1d0ch1d2eh1d3fh1dc1h1dfeh1dffh1e1dh1155h) Received: from mail128-db9 (localhost.localdomain [127.0.0.1]) by mail128-db9 (MessageSwitch) id 1374503762848179_9687; Mon, 22 Jul 2013 14:36:02 +0000 (UTC) Received: from DB9EHSMHS019.bigfish.com (unknown [10.174.16.240]) by mail128-db9.bigfish.com (Postfix) with ESMTP id CADE32201FF for ; Mon, 22 Jul 2013 14:36:02 +0000 (UTC) Received: from CORPEX001.nirvanix.com (208.84.97.55) by DB9EHSMHS019.bigfish.com (10.174.14.29) with Microsoft SMTP Server (TLS) id 14.16.227.3; Mon, 22 Jul 2013 14:36:00 +0000 Received: from CORPEX001.nirvanix.com ([::1]) by CORPEX001.nirvanix.com ([::1]) with mapi id 14.01.0355.002; Mon, 22 Jul 2013 07:35:58 -0700 From: Bob Bawn To: "freebsd-hardware@freebsd.org" Subject: Reset Problem with SATA Port Multiplier Thread-Topic: Reset Problem with SATA Port Multiplier Thread-Index: Ac6G6MSbMPl46FAoQOey01iNxNo2cA== Date: Mon, 22 Jul 2013 14:35:57 +0000 Message-ID: <94969AC586B81A4BBD2484F9862736A80CDAE28E@CORPEX001.nirvanix.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [208.14.191.60] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nirvanix.com X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn% X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jul 2013 14:36:13 -0000 Hello, I'm testing high-density SATA storage with FreeBSD 9.1-STABLE. The hardware is: Drives: 45 * Seagate Altos ST3000NC002 Port Multipliers: 9 * SiI3826 SATA Controller: 3 * Marvell 88SX7042 After a few hours of a database-like workload over ZFS (NCQ enable, disk write caches disabled), a disk becomes unresponsive (we think due to a drive firmware problem): Jun 14 21:39:54 adlax12st002 root: sysbench tests are now underway Jun 15 12:12:07 adlax12st002 kernel: mvsch1: SNTF 15 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: Timeout on slot 12 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: iec 00000000 sstat 00000123 se= rr 00400000 edma_s 00000024 dma_c 10000708 dma_s 00000008 rs 08c81408 statu= s 40 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: ... waiting for slots 08c8040= 8 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: Timeout on slot 3 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: iec 00000000 sstat 00000123 se= rr 00400000 edma_s 00000024 dma_c 10000708 dma_s 00000008 rs 08c81408 statu= s 40 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: ... waiting for slots 08c8040= 0 After a few timeout/reset cycles, the afflicted device is removed: Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): ATA_IDENTIFY. = ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): CAM status: Co= mmand timeout Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): Error 5, Retry= was blocked Jun 15 12:13:41 adlax12st002 kernel: (ada6:mvsch1:0:1:0): lost device Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): lost device Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): removing device = entry Jun 15 12:13:41 adlax12st002 kernel: mvsch1: MVS reset: device ready after = 500ms All of that seems like reasonable OS behavior when a drive is unresponsive. In fact Linux/CentOS/ZoL behaves pretty much the same up to this point. The problem is that the other four drives behind the port multiplier start timing out and get removed, one at a time, in target order, over the next few minutes: # grep "lost device" adlax12st002-messages.log Jun 15 12:13:41 adlax12st002 kernel: (ada6:mvsch1:0:1:0): lost device Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): lost device Jun 15 12:16:16 adlax12st002 kernel: (ada7:mvsch1:0:2:0): lost device Jun 15 12:16:16 adlax12st002 kernel: (pass8:mvsch1:0:2:0): lost device Jun 15 12:18:50 adlax12st002 kernel: (ada8:mvsch1:0:3:0): lost device Jun 15 12:18:50 adlax12st002 kernel: (pass9:mvsch1:0:3:0): lost device Jun 15 12:22:23 adlax12st002 kernel: (ada9:mvsch1:0:4:0): lost device Jun 15 12:22:23 adlax12st002 kernel: (pass10:mvsch1:0:4:0): lost device Jun 15 12:26:57 adlax12st002 kernel: (ada5:mvsch1:0:0:0): lost device Jun 15 12:26:57 adlax12st002 kernel: (pass6:mvsch1:0:0:0): lost device It looks like the timeout/reset/recovery sequence for the initial frozen disk has somehow broken connectivity to all the drives behind the port multiplier. This part does not happen on Linux. Sometimes the entire machine is locked up after the "lost device" sequence. In all cases, a full power cycle is required to make the devices available again. When I soft reset the box over IPMI, the boot process gets stuck in a loop with "mvsch2: MVS reset" and "mvsch2: Wait status d0".=20 Full /var/log/messages are at: http://pastebin.com/xCJyfvSN Unfortunately, I failed to grab the dmesg output and the box has since been re-imaged. Here is a dmesg from a machine which I believe to be identical to the test box: http://pastebin.com/NYjezuMX /var/log/messages for the CentOS/Linux case is at: http://pastebin.com/qrWm0HJ0 Maybe this is a topic for a different post, but has anybody successfully used high-density port-multiplied SATA platforms with FreeBSD? I've heard lots of anecdotes about hardware and/or driver flakiness (like the above), undocumented hardware, etc. (Actually, I've heard similar complaints from Linux folks.) SAS machines seem to handle this workload without any problems. We have tried 9.1-RELEASE and the behavior was worse. =20 We're actually more interested in archive type workloads than this database workload and we have not observed the problem with an archive workload. However, we're worried that general single-drive failures could turn into unavailability of five drives regardless of workload. Any guidance would be appreciated.=20 Thanks! Bob Bawn From owner-freebsd-hardware@FreeBSD.ORG Mon Jul 22 23:17:15 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B72BEB1F for ; Mon, 22 Jul 2013 23:17:15 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: from mail-ie0-x229.google.com (mail-ie0-x229.google.com [IPv6:2607:f8b0:4001:c03::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 90CC52A32 for ; Mon, 22 Jul 2013 23:17:15 +0000 (UTC) Received: by mail-ie0-f169.google.com with SMTP id at20so5795825iec.14 for ; Mon, 22 Jul 2013 16:17:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=4ammA2XK3GVuAvPaDs8KisV8h2hgwcRVH8nuwmLb0/8=; b=AZp+whDcqh01nil1KgJlCyc7snYzeeUPLwjFJ/7ZhVheV9mhQKIWlcG5nLDP7Olwg9 NHL4r3/qF9dmlvSC0OVtxSg7iCfVg7qNHe3HVIrjx6Z/plsBKwUiaJQ9lNKW3V2wbria L6urmpHAmoUEo3/avumYDGyweV2wRa/jYB2aw2w91PRZLEPnF3iJgk7ac8LgiSkKoiWP PBX0zyJ1O3pRTfWohlIrN2ybit14ar2+JiKXiXi5htTXbt7VgqscHLJ9PrChVIjxAp13 1m1cHkU7o7cHHAhlH1o3tPuDkdDbt2Ofzy7v2FEZL/F8adzoegvzOJu6Wx8QPSiGMRZj FIQg== MIME-Version: 1.0 X-Received: by 10.50.47.107 with SMTP id c11mr20345510ign.52.1374535035088; Mon, 22 Jul 2013 16:17:15 -0700 (PDT) Received: by 10.64.135.33 with HTTP; Mon, 22 Jul 2013 16:17:14 -0700 (PDT) Date: Mon, 22 Jul 2013 16:17:14 -0700 Message-ID: Subject: Re: Reset Problem with SATA Port Multiplier From: Dieter BSD To: freebsd-hardware@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jul 2013 23:17:15 -0000 > Drives: 45 * Seagate Altos ST3000NC002 > Port Multipliers: 9 * SiI3826 > SATA Controller: 3 * Marvell 88SX7042 > > After a few hours of a database-like workload over ZFS (NCQ enable, disk > write caches disabled), a disk becomes unresponsive (we think due to a > drive firmware problem): I have an 8.2 machine with Sil3132 controllers with Sil3726 pm with variety of drives. I have been getting the "Timeout on slot " followed by "lost device". Sometimes the device reappears. (Although the /dev/ufs/label does *not* reappear. :-( ) I have not seen the other drives on the pm get removed, or had to power cycle to recover. Seagate ST3000DM001 with CC4B firmware seems especially bad. ST3000DM001 with CC24 firmware have been ok. So your theory that the drive firmware has a problem seems promising. Sounds like FreeBSD is doing something bad to the pm, which Linux isn't doing. Perhaps log the commands the OS sends to the controller (over the network to a 2nd machine, or to a local disk not on a pm) and compare BSD to Linux? Perhaps start logging when you get the first timeout, to save hours of commands to wade through. Alternately you could stare at the driver sources until enlightenment occurs. AFAIK FreeBSD has never gotten a proper workaround for the quirk in the 1st generation Sil sata controllers, while they run fine on NetBSD. There might be a bug/quirk in the pm's firmware that FreeBSD triggers but Linus doesn't. From owner-freebsd-hardware@FreeBSD.ORG Tue Jul 23 13:50:27 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 189DF8C7 for ; Tue, 23 Jul 2013 13:50:27 +0000 (UTC) (envelope-from dan@3geeks.org) Received: from mail-oa0-x232.google.com (mail-oa0-x232.google.com [IPv6:2607:f8b0:4003:c02::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D64182F71 for ; Tue, 23 Jul 2013 13:50:26 +0000 (UTC) Received: by mail-oa0-f50.google.com with SMTP id k7so11417817oag.9 for ; Tue, 23 Jul 2013 06:50:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=3geeks.org; s=google; h=from:content-type:content-transfer-encoding:subject:date:message-id :to:mime-version:x-mailer; bh=kiiCHn8F6ehHknsnQ3GG/W1woBhzSvoxdGEfTlzClu4=; b=j/97BRCDFB5FG3oHtZfqGZF0saBSi4qn1u/Nsjdor0jp4nmwv27k3Jfps46cvpF3XU OsV9hLNCBtABlSfbLMz7mwBJUu0E9ML5xy+dD4XN6dL16zjDYZKP9YI5ifdz+a65PFg0 WjJ0XYU3SjcDcjPd/Ecyxwc0peZAU+SnOnyHk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:content-type:content-transfer-encoding:subject:date:message-id :to:mime-version:x-mailer:x-gm-message-state; bh=kiiCHn8F6ehHknsnQ3GG/W1woBhzSvoxdGEfTlzClu4=; b=U4ZR5pbE/19ArssTXDxB1ShZdl0j8qqFpjvZYQm2+3M+I3NNBf07vuZgiVomn9XrCU B2vCvtP4aC0CnD6h/Pl1CcqUs5vrjhyUjhDG37vGsyMaNjFh/E4ewyZIitb9ArqBX4vH C0s1evYPdy/s9fl9XoTPazf+fO76HBDkWa3bBGHjpv/PQ6M8z50BmB8P1eRnXSiEDEN/ eJLm7rVFIqrC4xhlgLf93ZbcMWqCe3W3Vqz10p8sz0vFPEL1Rx6yO7m0i1qLG2LmoizT 5VuARCb1/Fl1ALNA/v2bhgcUrQgVQZXDNTUaywmF67cYMDAruHD7chjCT0MXICiCmDoY eCDg== X-Received: by 10.182.241.71 with SMTP id wg7mr24927646obc.50.1374587425931; Tue, 23 Jul 2013 06:50:25 -0700 (PDT) Received: from treehunter.3geeks.org (99-126-192-237.lightspeed.austtx.sbcglobal.net. [99.126.192.237]) by mx.google.com with ESMTPSA id fk3sm40224047obb.2.2013.07.23.06.50.24 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 23 Jul 2013 06:50:25 -0700 (PDT) From: Daniel Mayfield Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: S2882D hard freeze 45-90 seconds after boot Date: Tue, 23 Jul 2013 08:50:23 -0500 Message-Id: <4688B8DA-9BC9-4351-ABC1-19B2D69868C8@3geeks.org> To: freebsd-hardware@freebsd.org Mime-Version: 1.0 (Apple Message framework v1085) X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQn8SZNHMnxwefRxZVPZX1H9quPmHewOVdIFkIAlSzDNBLJT+q9o4Ui8mojkfxyF9SqHALDa X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Jul 2013 13:50:27 -0000 I have a Tyan S2882D based machine with an ARECA-1160 that can sit on = the 9.1-RELEASE install CD kernel for days on end. But if you boot the = installed kernel, it will just freeze 30-90 seconds after booting. This = happens regardless of NIC config, storage config (UFS on RAID on the = card, ZFS on JBOD on the card, etc). Thoughts? Dan= From owner-freebsd-hardware@FreeBSD.ORG Thu Jul 25 10:41:14 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 6F75B3CB for ; Thu, 25 Jul 2013 10:41:14 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [46.4.40.135]) by mx1.freebsd.org (Postfix) with ESMTP id 30E522F66 for ; Thu, 25 Jul 2013 10:41:14 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:ad72:2c3d:f785:a64c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPSA id 31D324AC58; Thu, 25 Jul 2013 14:41:12 +0400 (MSK) Date: Thu, 25 Jul 2013 14:41:07 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <1964575447.20130725144107@serebryakov.spb.ru> To: Bob Bawn Subject: Re: Reset Problem with SATA Port Multiplier In-Reply-To: <94969AC586B81A4BBD2484F9862736A80CDAE28E@CORPEX001.nirvanix.com> References: <94969AC586B81A4BBD2484F9862736A80CDAE28E@CORPEX001.nirvanix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-hardware@freebsd.org" X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: lev@FreeBSD.org List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jul 2013 10:41:14 -0000 Hello, Bob. You wrote 22 =D0=B8=D1=8E=D0=BB=D1=8F 2013 =D0=B3., 18:35:57: BB> Drives: 45 * Seagate Altos ST3000NC002 BB> Port Multipliers: 9 * SiI3826 BB> SATA Controller: 3 * Marvell 88SX7042 I've heard, that only SiI3132 (2 port controller) works really well with port multipliers But we should wait answer from Alexander Motin (mav@) to be sure :) --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-hardware@FreeBSD.ORG Thu Jul 25 21:42:26 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C302BC45 for ; Thu, 25 Jul 2013 21:42:26 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: from mail-ob0-x233.google.com (mail-ob0-x233.google.com [IPv6:2607:f8b0:4003:c01::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9231421A3 for ; Thu, 25 Jul 2013 21:42:26 +0000 (UTC) Received: by mail-ob0-f179.google.com with SMTP id xk17so2699479obc.38 for ; Thu, 25 Jul 2013 14:42:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=GcuUkKFY8pJNhcak5ASW2kvq066J+MS61mn0C4FGU2A=; b=fGtkXuYPGGYd9iQTi0vfBqkspU6iomJx4LQL3mequBW4Lam7jAAGAry2ZKv3HRTgjW JHoE7JblwT2k0/fQHxdbV2yxFiam8fbcE6oKbwU7OBAwbnhir0Hauh4jhq52HbmWpvxy RMG1Z45eq/XfI956MpBf5HmKGwuNzoP60IiNQiNX8APg9ZAbRjyXHNmTHzrJfijAWHvd mfjoNgxCcEpXyJYT3bHPJX73p+CoQ745Rug0LTwxQJ8FzuQioncFVAototMngGIgxCXq ixW+9hJ9BB07L22hPAjWozYq6kFtFTl2Zv3ruwlC8fumA66SHk/Ipf1yNxrxIODFqPrH pIgg== MIME-Version: 1.0 X-Received: by 10.43.137.131 with SMTP id io3mr19932228icc.79.1374788545805; Thu, 25 Jul 2013 14:42:25 -0700 (PDT) Received: by 10.64.238.97 with HTTP; Thu, 25 Jul 2013 14:42:25 -0700 (PDT) Date: Thu, 25 Jul 2013 14:42:25 -0700 Message-ID: Subject: Re: Reset Problem with SATA Port Multiplier From: Dieter BSD To: freebsd-hardware@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jul 2013 21:42:26 -0000 > I've heard, that only SiI3132 (2 port controller) works really well with > port multipliers > But we should wait answer from Alexander Motin (mav@) to be sure :) I've heard the theory that you are better off matching a Silicon Image port multiplier with a Silicon Image controller (e.g. 3132 or 3124), but with no data to back it up. For me, the Sil3726 pm seems to work as well with JMB363 (achi(4)) controller as with Sil3132 (siis(4)). My theory is that as long as everything is working normally, it works fine. But if you get some glitch (caused by hardware, firmware, whatever), the recovery may not be as smooth as it should be, or it may not recover at all. Anyone have a way to inject various SATA faults on demand? BTW, would the -drivers list be a better place to discuss this? From owner-freebsd-hardware@FreeBSD.ORG Fri Jul 26 00:07:05 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 9B1D87FD for ; Fri, 26 Jul 2013 00:07:05 +0000 (UTC) (envelope-from sfourman@gmail.com) Received: from mail-vb0-x22e.google.com (mail-vb0-x22e.google.com [IPv6:2607:f8b0:400c:c02::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 5B605286E for ; Fri, 26 Jul 2013 00:07:05 +0000 (UTC) Received: by mail-vb0-f46.google.com with SMTP id w8so845191vbf.19 for ; Thu, 25 Jul 2013 17:07:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+cAxiYYQhakHNEAGhb5prZi2rCRZywIjf3NJemIKhfQ=; b=zFD/U9qxFZwPmSn8K6bfrqIX31wprFbTkQvYHl3blcgwSnYztgfmIhn7iVj4Kxx3iJ YZXrZfL23u5+5mz2n6BiZlOdipQZC0Ndxli3lYBbHX5Z+b8pIUD6Htz0f2TvyMFUbd13 ssrK30Xtvp0mVr2CeLupbsNJxPD9fd2IWUMGHPKwYfwbsogUC0UVHKBvNmYMJeIC2eul 08SkJstKpYXLp9Z0Kec7ZrpYHR0iHCcGSeQSHUazTl24douamldktAhX8PtECUozdNiI LpbmvBn0We+xDpRGQw8OZjpNbcj++zX/+9QPx7zlusZtIfIwG3fW+tZprlZ/3/jd3JuT JwDA== MIME-Version: 1.0 X-Received: by 10.52.248.166 with SMTP id yn6mr15939142vdc.41.1374797224476; Thu, 25 Jul 2013 17:07:04 -0700 (PDT) Received: by 10.220.96.78 with HTTP; Thu, 25 Jul 2013 17:07:04 -0700 (PDT) In-Reply-To: References: Date: Thu, 25 Jul 2013 20:07:04 -0400 Message-ID: Subject: Re: Reset Problem with SATA Port Multiplier From: "Sam Fourman Jr." To: Dieter BSD Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-hardware@freebsd.org X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jul 2013 00:07:05 -0000 On Thu, Jul 25, 2013 at 5:42 PM, Dieter BSD wrote: > > I've heard, that only SiI3132 (2 port controller) works really well with > > port multipliers > > But we should wait answer from Alexander Motin (mav@) to be sure :) > > I've heard the theory that you are better off matching a Silicon Image > port multiplier with a Silicon Image controller (e.g. 3132 or 3124), > but with no data to back it up. For me, the Sil3726 pm seems to work as > well with JMB363 (achi(4)) controller as with Sil3132 (siis(4)). > > I can confirm I have this SAME exact problem on -HEAD ... both of my mirrored ZFS disks disappear seemingly at random during a build world.... there is a dmesg for the motherboard in question, in a different post... I also have pciconf output... I hope there is a patch for this problem... -- Sam Fourman Jr. From owner-freebsd-hardware@FreeBSD.ORG Fri Jul 26 00:07:34 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D58EC841 for ; Fri, 26 Jul 2013 00:07:34 +0000 (UTC) (envelope-from sfourman@gmail.com) Received: from mail-vb0-x232.google.com (mail-vb0-x232.google.com [IPv6:2607:f8b0:400c:c02::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 957E02876 for ; Fri, 26 Jul 2013 00:07:34 +0000 (UTC) Received: by mail-vb0-f50.google.com with SMTP id x13so421162vbb.23 for ; Thu, 25 Jul 2013 17:07:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LdFFKFKR6BmjYT/8aWkI2/9jetcK838Q3VoALX5JSNA=; b=cuL/Sl5arI77kXdcWTEQ5QirLSE6GvvyufZ0QBUL8EZnCjxpeGyq36yBjA0TmeyiUb Zy7pioevH39bKF+2okrBEEiKFSRN4+ioxt5Czu4YMBChLTIgAsryrICWD877+nbAGh+d a117prRSgYJdQffeqFHU0nSL9cnxk6l9im3/f0RdgqYWngERvo5I6hHswsBDoX2hwQ0G V6Zj0kIWFNSb2Srx2zwausiQdkcSjgTkXYKAPAA1h1VkyaQRfBBoq1CAOvvXHGk3PHxm FpOtEFtWhyJ9BU1IgGGtz5vLrcGZOhMpMU42OH+6q+AMQdjT2BCs3TXrm2s5m1iEaZWm zalA== MIME-Version: 1.0 X-Received: by 10.52.32.133 with SMTP id j5mr16064456vdi.103.1374797253778; Thu, 25 Jul 2013 17:07:33 -0700 (PDT) Received: by 10.220.96.78 with HTTP; Thu, 25 Jul 2013 17:07:33 -0700 (PDT) In-Reply-To: References: Date: Thu, 25 Jul 2013 20:07:33 -0400 Message-ID: Subject: Re: Reset Problem with SATA Port Multiplier From: "Sam Fourman Jr." To: Dieter BSD Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-hardware@freebsd.org X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jul 2013 00:07:34 -0000 On Thu, Jul 25, 2013 at 8:07 PM, Sam Fourman Jr. wrote: > > > On Thu, Jul 25, 2013 at 5:42 PM, Dieter BSD wrote: > >> > I've heard, that only SiI3132 (2 port controller) works really well with >> > port multipliers >> > But we should wait answer from Alexander Motin (mav@) to be sure :) >> >> I've heard the theory that you are better off matching a Silicon Image >> port multiplier with a Silicon Image controller (e.g. 3132 or 3124), >> but with no data to back it up. For me, the Sil3726 pm seems to work as >> well with JMB363 (achi(4)) controller as with Sil3132 (siis(4)). >> >> > > I can confirm I have this SAME exact problem on -HEAD ... both of my > mirrored ZFS disks disappear seemingly at random during a build world.... > there is a dmesg for the motherboard in question, in a different post... I > also have pciconf output... I hope there is a patch for this problem... > -- > forgot to paste the link :) http://lists.freebsd.org/pipermail/freebsd-current/2013-July/043239.html > -- Sam Fourman Jr. From owner-freebsd-hardware@FreeBSD.ORG Fri Jul 26 22:16:43 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C665FC8C for ; Fri, 26 Jul 2013 22:16:43 +0000 (UTC) (envelope-from Bob.Bawn@nirvanix.com) Received: from db9outboundpool.messaging.microsoft.com (mail-db9lp0250.outbound.messaging.microsoft.com [213.199.154.250]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 5CD9823AD for ; Fri, 26 Jul 2013 22:16:42 +0000 (UTC) Received: from mail110-db9-R.bigfish.com (10.174.16.225) by DB9EHSOBE021.bigfish.com (10.174.14.84) with Microsoft SMTP Server id 14.1.225.22; Fri, 26 Jul 2013 22:01:26 +0000 Received: from mail110-db9 (localhost [127.0.0.1]) by mail110-db9-R.bigfish.com (Postfix) with ESMTP id 8E08280137 for ; Fri, 26 Jul 2013 22:01:26 +0000 (UTC) X-Forefront-Antispam-Report: CIP:208.84.97.55; KIP:(null); UIP:(null); IPV:NLI; H:CORPEX001.nirvanix.com; RD:mail.nirvanix.com; EFVD:NLI X-SpamScore: -2 X-BigFish: VPS-2(zz98dIfecIzz1f42h208ch1ee6h1de0h1fdah2073h1202h1e76h1d1ah1d2ah1fc6hzz1de098h8275bh1de097hz2dh2a8h668h839h944hd25hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h14ddh1504h1537h153bh15d0h162dh1631h1758h18e1h1946h19b5h1b0ah1d0ch1d2eh1d3fh1dfeh1dffh1e1dh1155h) Received: from mail110-db9 (localhost.localdomain [127.0.0.1]) by mail110-db9 (MessageSwitch) id 1374876085110720_24846; Fri, 26 Jul 2013 22:01:25 +0000 (UTC) Received: from DB9EHSMHS006.bigfish.com (unknown [10.174.16.249]) by mail110-db9.bigfish.com (Postfix) with ESMTP id 0A0B72E0049 for ; Fri, 26 Jul 2013 22:01:25 +0000 (UTC) Received: from CORPEX001.nirvanix.com (208.84.97.55) by DB9EHSMHS006.bigfish.com (10.174.14.16) with Microsoft SMTP Server (TLS) id 14.16.227.3; Fri, 26 Jul 2013 22:01:24 +0000 Received: from CORPEX001.nirvanix.com ([::1]) by CORPEX001.nirvanix.com ([::1]) with mapi id 14.01.0355.002; Fri, 26 Jul 2013 15:01:22 -0700 From: Bob Bawn To: "freebsd-hardware@freebsd.org" Subject: Re: Reset Problem with SATA Port Multiplier Thread-Topic: Reset Problem with SATA Port Multiplier Thread-Index: AQHOikupn7cNLZX6kUmNhLgF8q/2JQ== Date: Fri, 26 Jul 2013 22:01:21 +0000 Message-ID: <94969AC586B81A4BBD2484F9862736A80CDAE9A2@CORPEX001.nirvanix.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.0.8.13] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nirvanix.com X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn% X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jul 2013 22:16:43 -0000 On Thu, Jul 25, 2013, Dieter BSD wrote: > Sounds like FreeBSD is doing something bad to the pm, which Linux isn't > doing. Perhaps log the commands the OS sends to the controller (over the > network to a 2nd machine, or to a local disk not on a pm) and compare > BSD to Linux? Perhaps start logging when you get the first timeout, to > save hours of commands to wade through. Yes, that occurred to me. I was hoping to avoid learning how to build kernels with debug messages but I suppose it's a good skill to have. :-) > Alternately you could stare at the driver sources until enlightenment > occurs. I did a little of this and superficially it does seem like there could be differences between FreeBSD and Linux in the treatment of the mysterious 6th port (SEMB) on the 5-port multiplier. Hopefully, the logging you suggest will clarify the situation. Thanks for your help. From owner-freebsd-hardware@FreeBSD.ORG Fri Jul 26 22:29:52 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C07D6E53 for ; Fri, 26 Jul 2013 22:29:52 +0000 (UTC) (envelope-from sfourman@gmail.com) Received: from mail-vc0-x22d.google.com (mail-vc0-x22d.google.com [IPv6:2607:f8b0:400c:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7FB5B2407 for ; Fri, 26 Jul 2013 22:29:52 +0000 (UTC) Received: by mail-vc0-f173.google.com with SMTP id id13so861570vcb.18 for ; Fri, 26 Jul 2013 15:29:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=fw9c6R/Z2WSdIULTnQ7LBMqEiZdBuN5mq5YsR++DvLM=; b=cvVIs9EMo073Ec5CHW/jBxulku7DBuC/BKwE/EpHSLKehRvUotDoqSWy/OHpIGsypF VJVE+xum8CynwxArZchLtVnih9tvv/dz1Yxbm5qTt45jY1cAS6fZ/i1nbbk04GKOfanI uCU1J9JFvLReHJZN0ASMxRwxhqfWFrG5W1hovg7rs/4oS2X+hgQ8Y78m/+x4AQGa93Gb 7kl981UsAKSAMF786fc3pi22HVjp5yBHTebUDpGBh/1PpY2W8/ZypoHNwyx0t8rdJgJ5 7y6jH82ZlamuBga0xNPrvoIk/Op8pZppG1+YQadwaPcFEfSTm/TNm11l09O2TJkbhmrQ DAeg== MIME-Version: 1.0 X-Received: by 10.58.100.234 with SMTP id fb10mr14881516veb.5.1374877791466; Fri, 26 Jul 2013 15:29:51 -0700 (PDT) Received: by 10.220.96.78 with HTTP; Fri, 26 Jul 2013 15:29:51 -0700 (PDT) In-Reply-To: <94969AC586B81A4BBD2484F9862736A80CDAE9A2@CORPEX001.nirvanix.com> References: <94969AC586B81A4BBD2484F9862736A80CDAE9A2@CORPEX001.nirvanix.com> Date: Fri, 26 Jul 2013 18:29:51 -0400 Message-ID: Subject: Re: Reset Problem with SATA Port Multiplier From: "Sam Fourman Jr." To: Bob Bawn Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-hardware@freebsd.org" X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jul 2013 22:29:52 -0000 > I did a little of this and superficially it does seem like there could > be differences between FreeBSD and Linux in the treatment of the > mysterious 6th port (SEMB) on the 5-port multiplier. Hopefully, the > logging you suggest will clarify the situation. > > Thanks for your help. if you have a script or a way to build a kernel to help debug this I will run it if you post it here... I have the same issue on a 3 port multiplier using -HEAD -- Sam Fourman Jr.