From owner-freebsd-hardware@FreeBSD.ORG Mon Jul 22 23:17:15 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B72BEB1F for ; Mon, 22 Jul 2013 23:17:15 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: from mail-ie0-x229.google.com (mail-ie0-x229.google.com [IPv6:2607:f8b0:4001:c03::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 90CC52A32 for ; Mon, 22 Jul 2013 23:17:15 +0000 (UTC) Received: by mail-ie0-f169.google.com with SMTP id at20so5795825iec.14 for ; Mon, 22 Jul 2013 16:17:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=4ammA2XK3GVuAvPaDs8KisV8h2hgwcRVH8nuwmLb0/8=; b=AZp+whDcqh01nil1KgJlCyc7snYzeeUPLwjFJ/7ZhVheV9mhQKIWlcG5nLDP7Olwg9 NHL4r3/qF9dmlvSC0OVtxSg7iCfVg7qNHe3HVIrjx6Z/plsBKwUiaJQ9lNKW3V2wbria L6urmpHAmoUEo3/avumYDGyweV2wRa/jYB2aw2w91PRZLEPnF3iJgk7ac8LgiSkKoiWP PBX0zyJ1O3pRTfWohlIrN2ybit14ar2+JiKXiXi5htTXbt7VgqscHLJ9PrChVIjxAp13 1m1cHkU7o7cHHAhlH1o3tPuDkdDbt2Ofzy7v2FEZL/F8adzoegvzOJu6Wx8QPSiGMRZj FIQg== MIME-Version: 1.0 X-Received: by 10.50.47.107 with SMTP id c11mr20345510ign.52.1374535035088; Mon, 22 Jul 2013 16:17:15 -0700 (PDT) Received: by 10.64.135.33 with HTTP; Mon, 22 Jul 2013 16:17:14 -0700 (PDT) Date: Mon, 22 Jul 2013 16:17:14 -0700 Message-ID: Subject: Re: Reset Problem with SATA Port Multiplier From: Dieter BSD To: freebsd-hardware@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jul 2013 23:17:15 -0000 > Drives: 45 * Seagate Altos ST3000NC002 > Port Multipliers: 9 * SiI3826 > SATA Controller: 3 * Marvell 88SX7042 > > After a few hours of a database-like workload over ZFS (NCQ enable, disk > write caches disabled), a disk becomes unresponsive (we think due to a > drive firmware problem): I have an 8.2 machine with Sil3132 controllers with Sil3726 pm with variety of drives. I have been getting the "Timeout on slot " followed by "lost device". Sometimes the device reappears. (Although the /dev/ufs/label does *not* reappear. :-( ) I have not seen the other drives on the pm get removed, or had to power cycle to recover. Seagate ST3000DM001 with CC4B firmware seems especially bad. ST3000DM001 with CC24 firmware have been ok. So your theory that the drive firmware has a problem seems promising. Sounds like FreeBSD is doing something bad to the pm, which Linux isn't doing. Perhaps log the commands the OS sends to the controller (over the network to a 2nd machine, or to a local disk not on a pm) and compare BSD to Linux? Perhaps start logging when you get the first timeout, to save hours of commands to wade through. Alternately you could stare at the driver sources until enlightenment occurs. AFAIK FreeBSD has never gotten a proper workaround for the quirk in the 1st generation Sil sata controllers, while they run fine on NetBSD. There might be a bug/quirk in the pm's firmware that FreeBSD triggers but Linus doesn't.