From owner-freebsd-hardware@FreeBSD.ORG Sun Jul 28 10:23:49 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1F666AA5 for ; Sun, 28 Jul 2013 10:23:49 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ea0-x22b.google.com (mail-ea0-x22b.google.com [IPv6:2a00:1450:4013:c01::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A6E062837 for ; Sun, 28 Jul 2013 10:23:48 +0000 (UTC) Received: by mail-ea0-f171.google.com with SMTP id n15so2347666ead.16 for ; Sun, 28 Jul 2013 03:23:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=hnim3pTSldtMBE3ZOhq627n+BeijAq1wG6sLBRa5GR0=; b=OxN4Ac+NwJqxy9Mc4VWYIY5d25vXwBslyI0v/zfym6M9p3CwouCZet3OVNBAP9WTg3 egfJADKI4Ld+FLGBiL8CBk1eAD4fa498T7F5dwrp/rrhlF7RReNxItbQSGqLdQMxLmOt WXy/1Fzl5eS078gQhzAAjmR37vtOZo4WQ6l/mnLjU3MzD36gVpc3puAXtNa0SrqiKSPd 4iy0+hm6dlZbk7DgLbMptQ0iT2gStRbn0OSguQyr+CCMnx2WZwzOO1hsssk66qoxU69d oJKDLQB4DQCAHp9pS/wiqxh0jJGnqPDYihj3u9kdVcvpvdgub+mApMJIv/FBNNd+EpTj x/GA== X-Received: by 10.14.47.73 with SMTP id s49mr55272339eeb.71.1375007026930; Sun, 28 Jul 2013 03:23:46 -0700 (PDT) Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37]) by mx.google.com with ESMTPSA id a4sm94134825eez.0.2013.07.28.03.23.44 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 28 Jul 2013 03:23:46 -0700 (PDT) Sender: Alexander Motin Message-ID: <51F4F12F.80003@FreeBSD.org> Date: Sun, 28 Jul 2013 13:23:43 +0300 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130616 Thunderbird/17.0.6 MIME-Version: 1.0 To: Dieter BSD Subject: Re: Reset Problem with SATA Port Multiplier References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hardware@freebsd.org X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jul 2013 10:23:49 -0000 On 28.07.2013 03:08, Dieter BSD wrote: > Bob writes: >> After a few hours of a database-like workload > > A faster way to trigger the problem would be useful. > >> We're actually more interested in archive type workloads than this >> database workload and we have not observed the problem with an archive >> workload. > > So perhaps something about the timing triggers the bug? > > Sam writes >> if you have a script or a way to build a kernel to help debug this I will >> run it if you post it here... I have the same issue on a 3 port multiplier >> using -HEAD > > Can you share the make and model of this 3 port multiplier? > If it is happening with more than one model of pm, it is more likely > some generic problem, rather than triggering some model-specific quirk/bug. > Has anyone seen this problem with an older OS release? (say 7.x or 8.x?) > If the problem was introduced recently, we might be able to find it > by looking at what changed in the source code. I haven't seen the > problem with 8.2 or earlier. > > Looks like a verbose boot will give a little more info. > But I suspect that adding more log(9) statements will be needed. > Unless mav has a better idea? There are two sides of this problem: original issue and imperfect error recovery. First one is a big question. I can't say what is actually going on there that causes the problem. Just recently I've made one more attempt to get some documentation on SATA controllers from Marvell. But even after signing NDA process again stopped since I am neither buying thousands of their chips as vendor nor they are supporting for end-users. The alike situation is with other vendors. What's about the recovery, problem is that neither CAM nor mvs driver now track faulty status of the devices. So if some disk's firmware stuck and start causing infinite timeouts, that will substantially interrupt operation of other devices sharing that SATA port. Probably the mechanism of dropping faulty device could be improved somehow. What is about SAS, mentioned here -- that is quite different more expensive market. And even while protocols are much more sophisticated and hardware, firmware and software there are much better tested, there also situations happen sometimes when single misbehaving device may put down whole fabric. -- Alexander Motin