From owner-freebsd-scsi@freebsd.org Sun Jul 8 21:01:12 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4E93510311A4 for ; Sun, 8 Jul 2018 21:01:12 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id C77328B225 for ; Sun, 8 Jul 2018 21:01:11 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id 871141031181; Sun, 8 Jul 2018 21:01:11 +0000 (UTC) Delivered-To: scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 75A801031177 for ; Sun, 8 Jul 2018 21:01:11 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D24888B21B for ; Sun, 8 Jul 2018 21:01:10 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 1274F19B73 for ; Sun, 8 Jul 2018 21:01:10 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w68L197o070961 for ; Sun, 8 Jul 2018 21:01:09 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Received: (from bugzilla@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w68L19uu070950 for scsi@FreeBSD.org; Sun, 8 Jul 2018 21:01:09 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201807082101.w68L19uu070950@kenobi.freebsd.org> X-Authentication-Warning: kenobi.freebsd.org: bugzilla set sender to bugzilla-noreply@FreeBSD.org using -f From: bugzilla-noreply@FreeBSD.org To: scsi@FreeBSD.org Subject: Problem reports for scsi@FreeBSD.org that need special attention Date: Sun, 8 Jul 2018 21:01:09 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jul 2018 21:01:12 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- New | 221952 | cam iosched: Fix trim statistics 1 problems total for which you should take action. From owner-freebsd-scsi@freebsd.org Tue Jul 10 15:13:57 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5554A102F9D0 for ; Tue, 10 Jul 2018 15:13:57 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AF2517B47E for ; Tue, 10 Jul 2018 15:13:56 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from [10.12.22.246] ([193.170.152.64]) by mail.gmx.com (mrgmx102 [212.227.17.168]) with ESMTPSA (Nemesis) id 0LeSOH-1gQa1J1yij-00q7ur for ; Tue, 10 Jul 2018 17:13:48 +0200 Subject: Re: problems with SAS JBODs 2 From: Oliver Sech References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> To: FreeBSD-scsi Message-ID: Date: Tue, 10 Jul 2018 17:13:47 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:r02C5jHAr7PPxvUvgH1h798kBhZAKUUfXSeIkK3F4w4ZD0rIBaD Hb5SVz3ZA4b0MsBzu+hELGF64RjWMDcQumJZ3dmS76M+v5dBfqi2lM6jz7sRlehfYsBJ9Wg yz2ddWXOUgKXH5mVLVprdcLkjm9DKbGFqJ/iMXJSyfhRpSbt7BhJZGXvEgh5NEta3ESl7Zg PC6kE5sR5JMLfoowj7WOA== X-UI-Out-Filterresults: notjunk:1;V01:K0:/7z0975Yk5M=:cA5gEiAshfzdlJjAppK67R Oz+7tyxU8gPnhauSCp0nJ41Dwmm7E6+FC0oKLm70Sjw8Tv2CcOnLmpIQQMJzRGFG3NPZ17olZ 7zxaB69siZ0oqFjL+8m2l3ZGRVj1R2XRnhNZSwxxaM9p/ZyvaJPZFH70oWf00mJtTVx0GRGQC tqqBCtU1qAhrf0K0y7M/M01JeBLOZ1TwVOlLZWjOHt8ASjI5g1nuQwesDM2rlnHkuLC9qBkpd FBpg4iTerjj1pUZzDfXaD3iN+S3KILNudrZMFQFWEKV8ukzGL4JMhnTZBs/qICe2CrqnXoMs4 Xo3zBkZxEVfENfkq72raYglmGegcGOtHW48gSQZ0XI8J9rRAgD5L8uMqQNBmAqRyuELQP6lmc n1lUHqyvvIkQRmgQPJZbNQgpSQbljIRKwX5MoVrTzKoAoMcKGrW24Jq0MiZoLalss55XbcCCN AD0/a4jqo9vfkI0WNetfaTiTOYU9Gj5Ubgpne8bRBdfq6CApadL2mq4zTbPl2QZ5GGKXOUYjM gePxlz/KLEH8qCfd9LuWaPsiOiCx3QJkc8VVzEy97c4XI2QELPyBDoaviOp2VcPjAND7Q/dL8 tf7GPcxGIaUicHUSQjwVACZlCCL+IqhrdLjkyD2tV51E12Fw8NdNMQeHTD9lCaunWm2MYnIt5 UGNHFp3rgXUeQ1+wKsCjm/pdiOD/xGM9zXHykmsAJ5yRjGEkXDwzU3rsyeE3XaOedN9Yf/Grr ow9DpzgtljzJHf8O1FGSfZXyuhnLskjM2kQLyjwE30F7ym1I91EHhpIaxx2XPx3nxVabSMNLW jGTRrHY X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jul 2018 15:13:57 -0000 I tested a few additional things. I don't think this is a multipath, daisy chain nor a SAS wide ports problem. I can reproduce the problem with just a single connection to an Expander/JBOD. Test: * physically disconnect all shelves * reboot system * connect one shelf via SAS cable * check number of disks (after a reboot everything always shows up) * disconnect the shelf and wait (geom disk list still shows most disks.) * connect the shelf (missing disks) Tested Hardware: * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e ( internal daisy chain + wide links) * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e (straight HBA <-> EXPANDER connection. (no wide links, no daisy chain)) * Supermicro SAS2 SC847E26-RJBOD1 + SAS3 LSI 9305-16e (internal daisy chain) * Promise SAS2 VTrak 830 + SAS3 LSI 9305-16e (straight HBA <-> EXPANDER connection.) On 07/04/2018 12:15 PM, Oliver Sech wrote: >> 1) Are the expanders daisy chained?  Some SAS expanders don't work reliably >> when daisy chained.   Best to direct connect each one to the server. > At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4 lanes?). > Unfortunately the JBOD has 24 slots in the front and 20 in the back and, those are connected via a internal SAS daisy chaining. > I could rewire and connect each backplane directly to the server, but unfortunately I do not have enough ports.. > > JOBD Model: Supermicro 847E2C-R1K28JBOD > >> 2) Are the expanders connected in multipath or single path?  You need >> geom_multipath if you're going to do that. > See answer 1. There is a single path from the host to the first expander. > >> 3) Are you attempting to use wide ports (two SAS cables connecting each >> expander to the HBA).  If do, you'll need to make sure that each pair of >> SAS cables goes to the same HBA chip (not merely the same card, as some >> cards contain two HBA chips). > see 1. The last time I opened one of those JBODs there were 8 SAS cables between the Front and Back expander. I assume that wide ports are being used. > (2 expanders per backplane as well) > >> 4) Are you trying to remove an expander while ZFS is active on that >> expander?  That will suspend your pool, and ZFS doesn't always recover from >> a suspended state. > I'm testing with a new unused disk shelf that was never part of the ZFS pool. There were > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Tue Jul 10 15:28:01 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 343151030B75 for ; Tue, 10 Jul 2018 15:28:01 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x22e.google.com (mail-it0-x22e.google.com [IPv6:2607:f8b0:4001:c0b::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BFA137BB7F for ; Tue, 10 Jul 2018 15:28:00 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x22e.google.com with SMTP id s7-v6so31057483itb.4 for ; Tue, 10 Jul 2018 08:28:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to; bh=8JKS9kLc8sx3h7ROseTAjQcXHeXH1u6wpZHsZSgT0+8=; b=dJrC3C8v6DjzejHqmq0iQaJ1muH5+FBgUbtOPx5OEHaLVyP8vcQ5hIl4cCnoRgyu7Y P8lOJgJL4xi6ceKijEJfxmFYJ9LRmOIcVo1ggzcllTwRe1liOeg97/c0CZ+H1h+ATYUf EA/YolCz5b+aA6IFOsmyAbqAM6jfU84utwYQXkqnCqK7QsGCnKbeIsSZxZXL7Yxjjt6h eVC3u9DJZh0t0iaCji3AjuRvBJ0dGy3oK9wSei77P16aWD/ZeGv/s23uJQKvmWnOBFFY Ao+/X6q63ABxV3cLuWZxacSRg1g4OARfVbOEO9njRXQScCRn1dCFTESI7G+g0nioxW7U 5Gvg== X-Gm-Message-State: APt69E2+01YH9sFFaqJE8XwZMFJCJzG8cGeeledmkgSS0P6+RBP+1YJI lgIAxTDQNfmpN8tqtb4JgaKze6qKHWDTkE6fTU1rBA== X-Google-Smtp-Source: AAOMgpdq/vXldhG/5by/Zpa4vUMFMXF41QT8snno5cNZIx+3KiR4uWHmAjQ4UOqDQJ4+LDzRDvT/qf7ZpFIM0wftpj4= X-Received: by 2002:a24:3a87:: with SMTP id m129-v6mr20177298itm.90.1531236479801; Tue, 10 Jul 2018 08:27:59 -0700 (PDT) From: Stephen Mcconnell References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> In-Reply-To: MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQHJZ/UmTT9Y1rodqvzH7TRwbPT2YALnpLW+Ap4aqgkCOUyURaRf/K0A Date: Tue, 10 Jul 2018 09:27:58 -0600 Message-ID: <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> Subject: RE: problems with SAS JBODs 2 To: Oliver Sech , FreeBSD-scsi Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jul 2018 15:28:01 -0000 Hi Oliver, I can't get to your links. Can you try to send the logs in another way? Steve > -----Original Message----- > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > scsi@freebsd.org] On Behalf Of Oliver Sech > Sent: Tuesday, July 10, 2018 9:14 AM > To: FreeBSD-scsi > Subject: Re: problems with SAS JBODs 2 > > I tested a few additional things. I don't think this is a multipath, daisy > chain > nor a SAS wide ports problem. > I can reproduce the problem with just a single connection to an > Expander/JBOD. > > Test: > * physically disconnect all shelves > * reboot system > * connect one shelf via SAS cable > * check number of disks (after a reboot everything always shows up) > * disconnect the shelf and wait (geom disk list still shows most disks.) > * connect the shelf (missing disks) > > Tested Hardware: > * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e ( internal > daisy > chain + wide links) > * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e (straight HBA > <- > > EXPANDER connection. (no wide links, no daisy chain)) > * Supermicro SAS2 SC847E26-RJBOD1 + SAS3 LSI 9305-16e (internal daisy > chain) > * Promise SAS2 VTrak 830 + SAS3 LSI 9305-16e (straight HBA > <-> > EXPANDER connection.) > > > > On 07/04/2018 12:15 PM, Oliver Sech wrote: > >> 1) Are the expanders daisy chained? Some SAS expanders don't work > reliably > >> when daisy chained. Best to direct connect each one to the server. > > At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4 > lanes?). > > Unfortunately the JBOD has 24 slots in the front and 20 in the back and, > those are connected via a internal SAS daisy chaining. > > I could rewire and connect each backplane directly to the server, but > unfortunately I do not have enough ports.. > > > > JOBD Model: Supermicro 847E2C-R1K28JBOD > > > >> 2) Are the expanders connected in multipath or single path? You need > >> geom_multipath if you're going to do that. > > See answer 1. There is a single path from the host to the first > > expander. > > > >> 3) Are you attempting to use wide ports (two SAS cables connecting each > >> expander to the HBA). If do, you'll need to make sure that each pair > >> of > >> SAS cables goes to the same HBA chip (not merely the same card, as some > >> cards contain two HBA chips). > > see 1. The last time I opened one of those JBODs there were 8 SAS cables > between the Front and Back expander. I assume that wide ports are being > used. > > (2 expanders per backplane as well) > > > >> 4) Are you trying to remove an expander while ZFS is active on that > >> expander? That will suspend your pool, and ZFS doesn't always recover > from > >> a suspended state. > > I'm testing with a new unused disk shelf that was never part of the ZFS > pool. There were > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Tue Jul 10 15:48:31 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4A4C71032E45 for ; Tue, 10 Jul 2018 15:48:31 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x233.google.com (mail-it0-x233.google.com [IPv6:2607:f8b0:4001:c0b::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 073617CA93 for ; Tue, 10 Jul 2018 15:48:30 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x233.google.com with SMTP id w16-v6so9825749ita.0 for ; Tue, 10 Jul 2018 08:48:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to; bh=l/mkfNvLY1riVT1IboUU7yHXw1OXCvBL2Z4sdcKd2EA=; b=JMgvOBwru//nzHDLHO6QYrKBFY8KnkqmqOU5jau/BdDubZyCtl99l+YY1FeNsinK5E akbO22mdBrTM+2TP0+c48nhl+sA0Oy5/kwD/ADb+Fio8rbbjSrZeHczUfMDLhcqgs0As FiAsgzF7N0tWLfbruJZ8gIf1ZrSO84q0J56GkIcX8iHuabGUJS026GwUrBo8827SC+jY WVSj0bj6HJC3bGU7Wr2z0pexYloGMOze3MDVEwpDHVaKoUsao5rWwuW1IVak+OLjqCdo W4KfQ5STLpw0DlSk6gqtUeEaEzjMrxFXkJAgQuAv+czUDGQUN5txcqA68XzPUyJeHaHy chhA== X-Gm-Message-State: APt69E0zwp31d343f679HADWRuS9x07NBslVOZ2Panh0YfVXU/McXzdy zpkJlQ8uWLBDTfNPigyybqZ+VUC88cubPt+ZQTCxhA== X-Google-Smtp-Source: AAOMgpeM8ZUaISuH/DsKYBiREJ9buYTg3tMVniyttx/2GVEaUfTALv/6ultkHTt85vaUdJFc6rjcnST13VjQmkAbgpI= X-Received: by 2002:a24:edce:: with SMTP id r197-v6mr12288983ith.23.1531237710325; Tue, 10 Jul 2018 08:48:30 -0700 (PDT) From: Stephen Mcconnell References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> 3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com In-Reply-To: 3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQHJZ/UmTT9Y1rodqvzH7TRwbPT2YALnpLW+Ap4aqgkCOUyURaRf/K0AgAAFNrA= Date: Tue, 10 Jul 2018 09:48:29 -0600 Message-ID: <0af047d477d15ec364140653bd967c89@mail.gmail.com> Subject: RE: problems with SAS JBODs 2 To: Oliver Sech , FreeBSD-scsi Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jul 2018 15:48:31 -0000 Ken, I looked at the logs and I don't see anything in them that suggests that the driver is not adding any of the devices. In fact, I don't see anything that looks strange at all. This looks like a different problem than the other one you mentioned. What do you think? Steve > -----Original Message----- > From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com] > Sent: Tuesday, July 10, 2018 9:28 AM > To: 'Oliver Sech'; 'FreeBSD-scsi' > Subject: RE: problems with SAS JBODs 2 > > Hi Oliver, I can't get to your links. Can you try to send the logs in > another > way? > > Steve > > > -----Original Message----- > > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > > scsi@freebsd.org] On Behalf Of Oliver Sech > > Sent: Tuesday, July 10, 2018 9:14 AM > > To: FreeBSD-scsi > > Subject: Re: problems with SAS JBODs 2 > > > > I tested a few additional things. I don't think this is a multipath, > > daisy > chain > > nor a SAS wide ports problem. > > I can reproduce the problem with just a single connection to an > > Expander/JBOD. > > > > Test: > > * physically disconnect all shelves > > * reboot system > > * connect one shelf via SAS cable > > * check number of disks (after a reboot everything always shows up) > > * disconnect the shelf and wait (geom disk list still shows most disks.) > > * connect the shelf (missing disks) > > > > Tested Hardware: > > * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e ( internal > daisy > > chain + wide links) > > * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e (straight HBA > <- > > > EXPANDER connection. (no wide links, no daisy chain)) > > * Supermicro SAS2 SC847E26-RJBOD1 + SAS3 LSI 9305-16e (internal > > daisy > > chain) > > * Promise SAS2 VTrak 830 + SAS3 LSI 9305-16e (straight HBA > > <-> > > EXPANDER connection.) > > > > > > > > On 07/04/2018 12:15 PM, Oliver Sech wrote: > > >> 1) Are the expanders daisy chained? Some SAS expanders don't work > > reliably > > >> when daisy chained. Best to direct connect each one to the server. > > > At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4 > > lanes?). > > > Unfortunately the JBOD has 24 slots in the front and 20 in the back > > > and, > > those are connected via a internal SAS daisy chaining. > > > I could rewire and connect each backplane directly to the server, but > > unfortunately I do not have enough ports.. > > > > > > JOBD Model: Supermicro 847E2C-R1K28JBOD > > > > > >> 2) Are the expanders connected in multipath or single path? You need > > >> geom_multipath if you're going to do that. > > > See answer 1. There is a single path from the host to the first > > > expander. > > > > > >> 3) Are you attempting to use wide ports (two SAS cables connecting > each > > >> expander to the HBA). If do, you'll need to make sure that each pair > > >> of > > >> SAS cables goes to the same HBA chip (not merely the same card, as > some > > >> cards contain two HBA chips). > > > see 1. The last time I opened one of those JBODs there were 8 SAS > > > cables > > between the Front and Back expander. I assume that wide ports are being > > used. > > > (2 expanders per backplane as well) > > > > > >> 4) Are you trying to remove an expander while ZFS is active on that > > >> expander? That will suspend your pool, and ZFS doesn't always > > >> recover > > from > > >> a suspended state. > > > I'm testing with a new unused disk shelf that was never part of the > > > ZFS > > pool. There were > > > _______________________________________________ > > > freebsd-scsi@freebsd.org mailing list > > > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > > To unsubscribe, send any mail to > > > "freebsd-scsi-unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Tue Jul 10 17:32:28 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DCF91103EFF5 for ; Tue, 10 Jul 2018 17:32:27 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3E60681DF7 for ; Tue, 10 Jul 2018 17:32:27 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from [10.12.22.246] ([193.170.152.64]) by mail.gmx.com (mrgmx101 [212.227.17.168]) with ESMTPSA (Nemesis) id 0Lmvlg-1gHVRz3lnQ-00h6eO; Tue, 10 Jul 2018 19:32:23 +0200 Subject: Re: problems with SAS JBODs 2 To: Stephen Mcconnell , FreeBSD-scsi References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <0af047d477d15ec364140653bd967c89@mail.gmail.com> From: Oliver Sech Message-ID: Date: Tue, 10 Jul 2018 19:32:23 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <0af047d477d15ec364140653bd967c89@mail.gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K1:vJlQGIsGuWSol0GZc/SGMDcStKitTIyqAdiP7ULzS7UcPA/XL3B 4tQU5WM0t7E/eu4WkoEc9DTurvdiw3HyRrFOI+JCX3E4KF2xqw8TBYHGeOUTaBGn4ZHKdFC IWcNXv9zT7kOpkkT1bpx9wrul3k5oaCnMu2GpdoYr7UQFluBGdAXEqTJMG89VKftTgQgR8Z 6YZHuvyRP76RjxSGiUXyA== X-UI-Out-Filterresults: notjunk:1;V01:K0:TGMrTMRuaHo=:/tuvrkT6sbZhUIdEhW5o/r Kcrp3QAeCEdSDB6sRUV7YleEWwfPhhiA/Ac8Aq/iBnOmpu3UVLjftlJ3U4AXIYMDeFEMH9Yre 1sLtvXOgKelQl6k6gTl79lsaueZTdRFwe/2RUncX26TcgcNXKzyHMlbyWq+v1+VVikFruRVWX 796VVIxS8wPrVqWHqXKCs62+ATPhsIb9h49/bslQp89oMl+eD6lwfHmP+20SwMNGTDrin3j80 lydpyZjP6Mg1in9/875JbHn+rRlQubSk7aL6gqveGGD1122Ai2pXvmyDjs/dCg1P7oBVTyalB bHNqFP4JjfgZPC/AlRbLqphKer+m6bJ+poJyti44ueEffqijj1+d71WkyVCJqpxwIInsJfl6u BMuBp1atCKnw5JlDgSeKBPRJvcZ3oZYHGxWd1Byd3Zl7rGJBmMHC1A7bHKkVPS7iFGXgNAVga dext3FwigpQ+6CYEBJ/xpXRrrCYJi840oSsPs8yQIkrxKoLxCSQyQ2tx4a3ZVPKruazhCoIub 6Kq4HwOgsjY0Znx45fhEVRdhwe+2CwfZbxv6Es5maFo3mnR2vStsGcvpBXmpE7ekBVTa7ceCW gsHnzfsoyXGP5PQIjJ6yVFLr/WFZCBLVpo5wgn9hu5TMQbFTwXNbIrWda6GwMKdLwLjMxx6PQ VWGB4DjEo74FW+MZf/TK4eEmFTglrR0BcFCu9b9UD/DsHBjtffy7IXRThLlFISkRRQMPPpRcA L4ipCrT8NQYxWDU52TVyXl4sVt8SQ9m2lLjeJ5LMGV3qUSE5/fQDE3RFFtMzPVdzx0BV446d3 Yy3Cz6s X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jul 2018 17:32:28 -0000 sorry for sending dead links earlier... (Here is a link for the previous files: https://www.dropbox.com/s/5dlwizrzy48vme3/freebsd_sas.zip?dl=0 ) Here is the link for the new logs: https://www.dropbox.com/s/7bbt1fipg2a50oq/freebsd_sas2.zip?dl=0 notes: logfile: "1_clean_boot_without_shelves_dmesg" while booting with no shelves are attached it actually resets something: mpr0: mpr_mapping_check_devices: Enclosure XX is missing from the topology. Update its missing count. mpr0: _mapping_commit_enc_entry: Writing DPM entry XX for enclosure. logfile: "3_shelf_disconnected_geom" the only disks that really are connected are ada0, ada1, da0 everything else cannot get accessed Hardware: Promise SAS2 VTrak 830 (Full of SATA disks) + LSI 9305-16e Oliver On 07/10/2018 05:48 PM, Stephen Mcconnell wrote: > Ken, I looked at the logs and I don't see anything in them that suggests > that the driver is not adding any of the devices. In fact, I don't see > anything that looks strange at all. This looks like a different problem than > the other one you mentioned. What do you think? > > Steve > >> -----Original Message----- >> From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com] >> Sent: Tuesday, July 10, 2018 9:28 AM >> To: 'Oliver Sech'; 'FreeBSD-scsi' >> Subject: RE: problems with SAS JBODs 2 >> >> Hi Oliver, I can't get to your links. Can you try to send the logs in >> another >> way? >> >> Steve >> >>> -----Original Message----- >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >>> scsi@freebsd.org] On Behalf Of Oliver Sech >>> Sent: Tuesday, July 10, 2018 9:14 AM >>> To: FreeBSD-scsi >>> Subject: Re: problems with SAS JBODs 2 >>> >>> I tested a few additional things. I don't think this is a multipath, >>> daisy >> chain >>> nor a SAS wide ports problem. >>> I can reproduce the problem with just a single connection to an >>> Expander/JBOD. >>> >>> Test: >>> * physically disconnect all shelves >>> * reboot system >>> * connect one shelf via SAS cable >>> * check number of disks (after a reboot everything always shows up) >>> * disconnect the shelf and wait (geom disk list still shows most disks.) >>> * connect the shelf (missing disks) >>> >>> Tested Hardware: >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e ( internal >> daisy >>> chain + wide links) >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e (straight HBA >> <- >>>> EXPANDER connection. (no wide links, no daisy chain)) >>> * Supermicro SAS2 SC847E26-RJBOD1 + SAS3 LSI 9305-16e (internal >>> daisy >>> chain) >>> * Promise SAS2 VTrak 830 + SAS3 LSI 9305-16e (straight HBA >>> <-> >>> EXPANDER connection.) >>> >>> >>> >>> On 07/04/2018 12:15 PM, Oliver Sech wrote: >>>>> 1) Are the expanders daisy chained? Some SAS expanders don't work >>> reliably >>>>> when daisy chained. Best to direct connect each one to the server. >>>> At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4 >>> lanes?). >>>> Unfortunately the JBOD has 24 slots in the front and 20 in the back >>>> and, >>> those are connected via a internal SAS daisy chaining. >>>> I could rewire and connect each backplane directly to the server, but >>> unfortunately I do not have enough ports.. >>>> >>>> JOBD Model: Supermicro 847E2C-R1K28JBOD >>>> >>>>> 2) Are the expanders connected in multipath or single path? You need >>>>> geom_multipath if you're going to do that. >>>> See answer 1. There is a single path from the host to the first >>>> expander. >>>> >>>>> 3) Are you attempting to use wide ports (two SAS cables connecting >> each >>>>> expander to the HBA). If do, you'll need to make sure that each pair >>>>> of >>>>> SAS cables goes to the same HBA chip (not merely the same card, as >> some >>>>> cards contain two HBA chips). >>>> see 1. The last time I opened one of those JBODs there were 8 SAS >>>> cables >>> between the Front and Back expander. I assume that wide ports are being >>> used. >>>> (2 expanders per backplane as well) >>>> >>>>> 4) Are you trying to remove an expander while ZFS is active on that >>>>> expander? That will suspend your pool, and ZFS doesn't always >>>>> recover >>> from >>>>> a suspended state. >>>> I'm testing with a new unused disk shelf that was never part of the >>>> ZFS >>> pool. There were >>>> _______________________________________________ >>>> freebsd-scsi@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>> To unsubscribe, send any mail to >>>> "freebsd-scsi-unsubscribe@freebsd.org" >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Wed Jul 11 15:37:21 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 16D481048805 for ; Wed, 11 Jul 2018 15:37:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id A6DA67791F for ; Wed, 11 Jul 2018 15:37:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 63EBD1048802; Wed, 11 Jul 2018 15:37:20 +0000 (UTC) Delivered-To: scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 511401048801 for ; Wed, 11 Jul 2018 15:37:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E16CC7791E for ; Wed, 11 Jul 2018 15:37:19 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 260071CBB9 for ; Wed, 11 Jul 2018 15:37:19 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w6BFbJgj038032 for ; Wed, 11 Jul 2018 15:37:19 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w6BFbJBQ038029 for scsi@FreeBSD.org; Wed, 11 Jul 2018 15:37:19 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: scsi@FreeBSD.org Subject: [Bug 219857] panic in scsi_cd code Date: Wed, 11 Jul 2018 15:37:18 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: ken@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: ken@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jul 2018 15:37:21 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219857 --- Comment #10 from Kenneth D. Merry --- Created attachment 195055 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D195055&action= =3Dedit Proposed patch to make the media check async Here is a proposed patch against FreeBSD/head as of revision 335477 on June 21st, 2018. This makes the media check process asynchronous, so we no longer block in cdstrategy() to check for media. This still needs some additional testing in various situations. I think th= ere may still be one panic (GEOM assertion I believe) I saw with an audio CD in= a drive, and I have not tested this with a SCSI CD/DVD drive. I have only tested so far with a SATA drive running ATAPI. In any case, give this a try and let me know if it improves things for you. --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-scsi@freebsd.org Wed Jul 11 18:35:00 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F27231033251 for ; Wed, 11 Jul 2018 18:34:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 886618007D for ; Wed, 11 Jul 2018 18:34:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 3F2791033250; Wed, 11 Jul 2018 18:34:59 +0000 (UTC) Delivered-To: scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2DA4F103324F for ; Wed, 11 Jul 2018 18:34:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C2DCE80077 for ; Wed, 11 Jul 2018 18:34:58 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 2AD781E4A3 for ; Wed, 11 Jul 2018 18:34:58 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w6BIYwgP091581 for ; Wed, 11 Jul 2018 18:34:58 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w6BIYw1f091580 for scsi@FreeBSD.org; Wed, 11 Jul 2018 18:34:58 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: scsi@FreeBSD.org Subject: [Bug 219857] panic in scsi_cd code Date: Wed, 11 Jul 2018 18:34:58 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: decui@microsoft.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: ken@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jul 2018 18:35:00 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219857 --- Comment #11 from Dexuan Cui --- (In reply to Kenneth D. Merry from comment #10) Thanks very much! We'll test it and report back. --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-scsi@freebsd.org Wed Jul 11 20:35:30 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 811721041E95 for ; Wed, 11 Jul 2018 20:35:30 +0000 (UTC) (envelope-from ken@freebsd.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mithlond.kdm.org", Issuer "mithlond.kdm.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id EFCFA86255 for ; Wed, 11 Jul 2018 20:35:29 +0000 (UTC) (envelope-from ken@freebsd.org) Received: from [10.0.0.26] (mbp2013.int.kdm.org [10.0.0.26]) (authenticated bits=0) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPSA id w6BKZR1h002450 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 11 Jul 2018 16:35:28 -0400 (EDT) (envelope-from ken@freebsd.org) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.4 \(3445.8.2\)) Subject: Re: problems with SAS JBODs 2 From: Ken Merry In-Reply-To: <0af047d477d15ec364140653bd967c89@mail.gmail.com> Date: Wed, 11 Jul 2018 16:35:23 -0400 Cc: FreeBSD-scsi Content-Transfer-Encoding: quoted-printable Message-Id: <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> To: Stephen Mcconnell , Oliver Sech X-Mailer: Apple Mail (2.3445.8.2) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [96.89.93.250]); Wed, 11 Jul 2018 16:35:28 -0400 (EDT) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jul 2018 20:35:30 -0000 Yes, I agree, Oliver=E2=80=99s problem looks different. Oliver, for your second set of files (freebsd_sas2.zip) it looks like = you may have devices that aren=E2=80=99t completely going away, even = from a SAS standpoint. Here are the 25 target IDs that show up in 2_shelf_connected_dmesg.txt: mpr0: mprsas_add_device: Target ID for added device is 467. mpr0: mprsas_add_device: Target ID for added device is 468. mpr0: mprsas_add_device: Target ID for added device is 469. mpr0: mprsas_add_device: Target ID for added device is 470. mpr0: mprsas_add_device: Target ID for added device is 471. mpr0: mprsas_add_device: Target ID for added device is 472. mpr0: mprsas_add_device: Target ID for added device is 473. mpr0: mprsas_add_device: Target ID for added device is 474. mpr0: mprsas_add_device: Target ID for added device is 475. mpr0: mprsas_add_device: Target ID for added device is 476. mpr0: mprsas_add_device: Target ID for added device is 477. mpr0: mprsas_add_device: Target ID for added device is 478. mpr0: mprsas_add_device: Target ID for added device is 479. mpr0: mprsas_add_device: Target ID for added device is 480. mpr0: mprsas_add_device: Target ID for added device is 481. mpr0: mprsas_add_device: Target ID for added device is 482. mpr0: mprsas_add_device: Target ID for added device is 483. mpr0: mprsas_add_device: Target ID for added device is 484. mpr0: mprsas_add_device: Target ID for added device is 485. mpr0: mprsas_add_device: Target ID for added device is 486. mpr0: mprsas_add_device: Target ID for added device is 487. mpr0: mprsas_add_device: Target ID for added device is 488. mpr0: mprsas_add_device: Target ID for added device is 489. mpr0: mprsas_add_device: Target ID for added device is 490. mpr0: mprsas_add_device: Target ID for added device is 503. Here are the 8 target IDs that disappear in = 3_shelf_disconnected_dmesg.txt: mpr0: mprsas_prepare_remove: Sending reset for target ID 467 mpr0: mprsas_prepare_remove: Sending reset for target ID 468 mpr0: mprsas_prepare_remove: Sending reset for target ID 469 mpr0: mprsas_prepare_remove: Sending reset for target ID 470 mpr0: mprsas_prepare_remove: Sending reset for target ID 471 mpr0: mprsas_prepare_remove: Sending reset for target ID 472 mpr0: mprsas_prepare_remove: Sending reset for target ID 473 mpr0: mprsas_prepare_remove: Sending reset for target ID 474 And here are the same 8 target IDs getting added in = 4_shelf_reconnected_dmesg.txt: mpr0: mprsas_add_device: Target ID for added device is 467. mpr0: mprsas_add_device: Target ID for added device is 468. mpr0: mprsas_add_device: Target ID for added device is 469. mpr0: mprsas_add_device: Target ID for added device is 470. mpr0: mprsas_add_device: Target ID for added device is 471. mpr0: mprsas_add_device: Target ID for added device is 472. mpr0: mprsas_add_device: Target ID for added device is 473. mpr0: mprsas_add_device: Target ID for added device is 474. Oliver, what happens when you try to do I/O to the devices that don=E2=80=99= t go away after you pull the cable? Does that cause the devices to go = away? Looking at the mprutil output, it also shows the devices sticking around = from the adapter=E2=80=99s standpoint. You can also try a =E2=80=98camcontrol rescan all=E2=80=99 or a = =E2=80=98camcontrol rescan N=E2=80=99 (where N is the scbus number shown = by =E2=80=98camcontrol devlist -v=E2=80=99). That will do some basic = probes for each of the devices and should in theory cause them to go = away if they aren=E2=80=99t accessible. It seems like the adapter may not be recognizing that the devices in = question have gone. Steve, do you have any ideas what could be going on? Ken =E2=80=94=20 Ken Merry ken@FreeBSD.ORG > On Jul 10, 2018, at 11:48 AM, Stephen Mcconnell via freebsd-scsi = wrote: >=20 > Ken, I looked at the logs and I don't see anything in them that = suggests > that the driver is not adding any of the devices. In fact, I don't see > anything that looks strange at all. This looks like a different = problem than > the other one you mentioned. What do you think? >=20 > Steve >=20 >> -----Original Message----- >> From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com] >> Sent: Tuesday, July 10, 2018 9:28 AM >> To: 'Oliver Sech'; 'FreeBSD-scsi' >> Subject: RE: problems with SAS JBODs 2 >>=20 >> Hi Oliver, I can't get to your links. Can you try to send the logs in >> another >> way? >>=20 >> Steve >>=20 >>> -----Original Message----- >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >>> scsi@freebsd.org] On Behalf Of Oliver Sech >>> Sent: Tuesday, July 10, 2018 9:14 AM >>> To: FreeBSD-scsi >>> Subject: Re: problems with SAS JBODs 2 >>>=20 >>> I tested a few additional things. I don't think this is a multipath, >>> daisy >> chain >>> nor a SAS wide ports problem. >>> I can reproduce the problem with just a single connection to an >>> Expander/JBOD. >>>=20 >>> Test: >>> * physically disconnect all shelves >>> * reboot system >>> * connect one shelf via SAS cable >>> * check number of disks (after a reboot everything always shows up) >>> * disconnect the shelf and wait (geom disk list still shows most = disks.) >>> * connect the shelf (missing disks) >>>=20 >>> Tested Hardware: >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e ( = internal >> daisy >>> chain + wide links) >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e (straight = HBA >> <- >>>> EXPANDER connection. (no wide links, no daisy chain)) >>> * Supermicro SAS2 SC847E26-RJBOD1 + SAS3 LSI 9305-16e (internal >>> daisy >>> chain) >>> * Promise SAS2 VTrak 830 + SAS3 LSI 9305-16e (straight = HBA >>> <-> >>> EXPANDER connection.) >>>=20 >>>=20 >>>=20 >>> On 07/04/2018 12:15 PM, Oliver Sech wrote: >>>>> 1) Are the expanders daisy chained? Some SAS expanders don't work >>> reliably >>>>> when daisy chained. Best to direct connect each one to the = server. >>>> At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4 >>> lanes?). >>>> Unfortunately the JBOD has 24 slots in the front and 20 in the back >>>> and, >>> those are connected via a internal SAS daisy chaining. >>>> I could rewire and connect each backplane directly to the server, = but >>> unfortunately I do not have enough ports.. >>>>=20 >>>> JOBD Model: Supermicro 847E2C-R1K28JBOD >>>>=20 >>>>> 2) Are the expanders connected in multipath or single path? You = need >>>>> geom_multipath if you're going to do that. >>>> See answer 1. There is a single path from the host to the first >>>> expander. >>>>=20 >>>>> 3) Are you attempting to use wide ports (two SAS cables connecting >> each >>>>> expander to the HBA). If do, you'll need to make sure that each = pair >>>>> of >>>>> SAS cables goes to the same HBA chip (not merely the same card, as >> some >>>>> cards contain two HBA chips). >>>> see 1. The last time I opened one of those JBODs there were 8 SAS >>>> cables >>> between the Front and Back expander. I assume that wide ports are = being >>> used. >>>> (2 expanders per backplane as well) >>>>=20 >>>>> 4) Are you trying to remove an expander while ZFS is active on = that >>>>> expander? That will suspend your pool, and ZFS doesn't always >>>>> recover >>> from >>>>> a suspended state. >>>> I'm testing with a new unused disk shelf that was never part of the >>>> ZFS >>> pool. There were >>>> _______________________________________________ >>>> freebsd-scsi@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>> To unsubscribe, send any mail to >>>> "freebsd-scsi-unsubscribe@freebsd.org" >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>> To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Wed Jul 11 20:56:32 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EFB2510443DD for ; Wed, 11 Jul 2018 20:56:31 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-io0-f182.google.com (mail-io0-f182.google.com [209.85.223.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 807B986FD8 for ; Wed, 11 Jul 2018 20:56:31 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-io0-f182.google.com with SMTP id r24-v6so25643271ioh.9 for ; Wed, 11 Jul 2018 13:56:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc :content-transfer-encoding; bh=OodQzPoCYQZhb17TiwZnJlfwChFvuJUI8DwTTJl9CI0=; b=QRO9OeX5eQ0BCPm2E5ZgKVFj69T8PG3U4wnKK2HqprdqP2wVDXm709e4wWmFNBPQUN /pDgJDSOaik+TTbONLw8Wq3X2Kd++GaBFnol3J7WlvTB9XQYXQvi+UKtR7SciOs4I+lG wLMtOWJeYRvTjexOqSHndN89OUNl3niU2lyZd9iM6Kv58Edg/KSanuyGxheLOcipHfIg HNRdUKZjJM5bdLiBz/nyoFhJAgcYCqkh0fZI5HIKjStpnq1ArP3mCm+w+QNxZz/uU7le nmIjT8DsDRdsSbYhznPeH2AM4XAq6Qy6qTpwVc4LzOHq/dlM1JvZs7CJFiv5D5ih3Oiv Y19A== X-Gm-Message-State: AOUpUlFKBeYnKyUlg2jMX95AbrA2rm/krTvZNdwhA191PADDd+KfYsxj GYSPxQwn1oIFxkgiCMMGIjx1h8bB X-Google-Smtp-Source: AAOMgpdAmH79mL9aMflVUfBLiUopzgjopadVHSiH+GIFhw5AdizEHycCt7LRGh9r98sFwRVrKxKjNw== X-Received: by 2002:a6b:660e:: with SMTP id a14-v6mr584017ioc.339.1531342204427; Wed, 11 Jul 2018 13:50:04 -0700 (PDT) Received: from mail-it0-f54.google.com (mail-it0-f54.google.com. [209.85.214.54]) by smtp.gmail.com with ESMTPSA id t66-v6sm1527764ita.24.2018.07.11.13.50.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Jul 2018 13:50:04 -0700 (PDT) Received: by mail-it0-f54.google.com with SMTP id p4-v6so4470311itf.2 for ; Wed, 11 Jul 2018 13:50:03 -0700 (PDT) X-Received: by 2002:a02:8952:: with SMTP id u18-v6mr110026jaj.13.1531342203259; Wed, 11 Jul 2018 13:50:03 -0700 (PDT) From: slm@freebsd.org References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> In-Reply-To: <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQHJZ/UmTT9Y1rodqvzH7TRwbPT2YALnpLW+Ap4aqgkCOUyURQHo0+HHAeMuJHcCKy6uYKQyLw/A Date: Wed, 11 Jul 2018 14:50:02 -0600 X-Gmail-Original-Message-ID: <6bc79bf80dbfbba8e77bb40d5b1a0512@mail.gmail.com> Message-ID: <6bc79bf80dbfbba8e77bb40d5b1a0512@mail.gmail.com> Subject: RE: problems with SAS JBODs 2 To: Ken Merry , Oliver Sech Cc: FreeBSD-scsi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jul 2018 20:56:32 -0000 I'm think this is a mapping table problem or the use_phy_num problem. I'm having Oliver change the use_phy_num sysctl values to 0 and then use your script to clear out the controller mapping entries to see what happens. Steve > -----Original Message----- > From: Ken Merry [mailto:ken@freebsd.org] > Sent: Wednesday, July 11, 2018 2:35 PM > To: Stephen Mcconnell; Oliver Sech > Cc: FreeBSD-scsi > Subject: Re: problems with SAS JBODs 2 > > Yes, I agree, Oliver=E2=80=99s problem looks different. > > Oliver, for your second set of files (freebsd_sas2.zip) it looks like you > may > have devices that aren=E2=80=99t completely going away, even from a SAS > standpoint. > > Here are the 25 target IDs that show up in 2_shelf_connected_dmesg.txt: > > mpr0: mprsas_add_device: Target ID for added device is 467. > mpr0: mprsas_add_device: Target ID for added device is 468. > mpr0: mprsas_add_device: Target ID for added device is 469. > mpr0: mprsas_add_device: Target ID for added device is 470. > mpr0: mprsas_add_device: Target ID for added device is 471. > mpr0: mprsas_add_device: Target ID for added device is 472. > mpr0: mprsas_add_device: Target ID for added device is 473. > mpr0: mprsas_add_device: Target ID for added device is 474. > mpr0: mprsas_add_device: Target ID for added device is 475. > mpr0: mprsas_add_device: Target ID for added device is 476. > mpr0: mprsas_add_device: Target ID for added device is 477. > mpr0: mprsas_add_device: Target ID for added device is 478. > mpr0: mprsas_add_device: Target ID for added device is 479. > mpr0: mprsas_add_device: Target ID for added device is 480. > mpr0: mprsas_add_device: Target ID for added device is 481. > mpr0: mprsas_add_device: Target ID for added device is 482. > mpr0: mprsas_add_device: Target ID for added device is 483. > mpr0: mprsas_add_device: Target ID for added device is 484. > mpr0: mprsas_add_device: Target ID for added device is 485. > mpr0: mprsas_add_device: Target ID for added device is 486. > mpr0: mprsas_add_device: Target ID for added device is 487. > mpr0: mprsas_add_device: Target ID for added device is 488. > mpr0: mprsas_add_device: Target ID for added device is 489. > mpr0: mprsas_add_device: Target ID for added device is 490. > mpr0: mprsas_add_device: Target ID for added device is 503. > > Here are the 8 target IDs that disappear in > 3_shelf_disconnected_dmesg.txt: > > mpr0: mprsas_prepare_remove: Sending reset for target ID 467 > mpr0: mprsas_prepare_remove: Sending reset for target ID 468 > mpr0: mprsas_prepare_remove: Sending reset for target ID 469 > mpr0: mprsas_prepare_remove: Sending reset for target ID 470 > mpr0: mprsas_prepare_remove: Sending reset for target ID 471 > mpr0: mprsas_prepare_remove: Sending reset for target ID 472 > mpr0: mprsas_prepare_remove: Sending reset for target ID 473 > mpr0: mprsas_prepare_remove: Sending reset for target ID 474 > > And here are the same 8 target IDs getting added in > 4_shelf_reconnected_dmesg.txt: > > mpr0: mprsas_add_device: Target ID for added device is 467. > mpr0: mprsas_add_device: Target ID for added device is 468. > mpr0: mprsas_add_device: Target ID for added device is 469. > mpr0: mprsas_add_device: Target ID for added device is 470. > mpr0: mprsas_add_device: Target ID for added device is 471. > mpr0: mprsas_add_device: Target ID for added device is 472. > mpr0: mprsas_add_device: Target ID for added device is 473. > mpr0: mprsas_add_device: Target ID for added device is 474. > > Oliver, what happens when you try to do I/O to the devices that don=E2=80= =99t go > away after you pull the cable? Does that cause the devices to go away? > > Looking at the mprutil output, it also shows the devices sticking around > from > the adapter=E2=80=99s standpoint. > > You can also try a =E2=80=98camcontrol rescan all=E2=80=99 or a =E2=80=98= camcontrol rescan N=E2=80=99 > (where N > is the scbus number shown by =E2=80=98camcontrol devlist -v=E2=80=99). T= hat will do some > basic probes for each of the devices and should in theory cause them to g= o > away if they aren=E2=80=99t accessible. > > It seems like the adapter may not be recognizing that the devices in > question > have gone. > > Steve, do you have any ideas what could be going on? > > Ken > =E2=80=94 > Ken Merry > ken@FreeBSD.ORG > > > > > On Jul 10, 2018, at 11:48 AM, Stephen Mcconnell via freebsd-scsi > > scsi@freebsd.org> wrote: > > > > Ken, I looked at the logs and I don't see anything in them that suggest= s > > that the driver is not adding any of the devices. In fact, I don't see > > anything that looks strange at all. This looks like a different problem > > than > > the other one you mentioned. What do you think? > > > > Steve > > > >> -----Original Message----- > >> From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com] > >> Sent: Tuesday, July 10, 2018 9:28 AM > >> To: 'Oliver Sech'; 'FreeBSD-scsi' > >> Subject: RE: problems with SAS JBODs 2 > >> > >> Hi Oliver, I can't get to your links. Can you try to send the logs in > >> another > >> way? > >> > >> Steve > >> > >>> -----Original Message----- > >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > >>> scsi@freebsd.org] On Behalf Of Oliver Sech > >>> Sent: Tuesday, July 10, 2018 9:14 AM > >>> To: FreeBSD-scsi > >>> Subject: Re: problems with SAS JBODs 2 > >>> > >>> I tested a few additional things. I don't think this is a multipath, > >>> daisy > >> chain > >>> nor a SAS wide ports problem. > >>> I can reproduce the problem with just a single connection to an > >>> Expander/JBOD. > >>> > >>> Test: > >>> * physically disconnect all shelves > >>> * reboot system > >>> * connect one shelf via SAS cable > >>> * check number of disks (after a reboot everything always shows up) > >>> * disconnect the shelf and wait (geom disk list still shows most > >>> disks.) > >>> * connect the shelf (missing disks) > >>> > >>> Tested Hardware: > >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e ( internal > >> daisy > >>> chain + wide links) > >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e (straight > HBA > >> <- > >>>> EXPANDER connection. (no wide links, no daisy chain)) > >>> * Supermicro SAS2 SC847E26-RJBOD1 + SAS3 LSI 9305-16e (internal > >>> daisy > >>> chain) > >>> * Promise SAS2 VTrak 830 + SAS3 LSI 9305-16e (straight > >>> HBA > >>> <-> > >>> EXPANDER connection.) > >>> > >>> > >>> > >>> On 07/04/2018 12:15 PM, Oliver Sech wrote: > >>>>> 1) Are the expanders daisy chained? Some SAS expanders don't work > >>> reliably > >>>>> when daisy chained. Best to direct connect each one to the server= . > >>>> At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4 > >>> lanes?). > >>>> Unfortunately the JBOD has 24 slots in the front and 20 in the back > >>>> and, > >>> those are connected via a internal SAS daisy chaining. > >>>> I could rewire and connect each backplane directly to the server, bu= t > >>> unfortunately I do not have enough ports.. > >>>> > >>>> JOBD Model: Supermicro 847E2C-R1K28JBOD > >>>> > >>>>> 2) Are the expanders connected in multipath or single path? You > need > >>>>> geom_multipath if you're going to do that. > >>>> See answer 1. There is a single path from the host to the first > >>>> expander. > >>>> > >>>>> 3) Are you attempting to use wide ports (two SAS cables connecting > >> each > >>>>> expander to the HBA). If do, you'll need to make sure that each > >>>>> pair > >>>>> of > >>>>> SAS cables goes to the same HBA chip (not merely the same card, as > >> some > >>>>> cards contain two HBA chips). > >>>> see 1. The last time I opened one of those JBODs there were 8 SAS > >>>> cables > >>> between the Front and Back expander. I assume that wide ports are > being > >>> used. > >>>> (2 expanders per backplane as well) > >>>> > >>>>> 4) Are you trying to remove an expander while ZFS is active on that > >>>>> expander? That will suspend your pool, and ZFS doesn't always > >>>>> recover > >>> from > >>>>> a suspended state. > >>>> I'm testing with a new unused disk shelf that was never part of the > >>>> ZFS > >>> pool. There were > >>>> _______________________________________________ > >>>> freebsd-scsi@freebsd.org mailing list > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > >>>> To unsubscribe, send any mail to > >>>> "freebsd-scsi-unsubscribe@freebsd.org" > >>> _______________________________________________ > >>> freebsd-scsi@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > >>> To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Thu Jul 12 10:00:46 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3316B104927B for ; Thu, 12 Jul 2018 10:00:46 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 93F89872C7; Thu, 12 Jul 2018 10:00:45 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from [10.12.22.246] ([193.170.152.64]) by mail.gmx.com (mrgmx103 [212.227.17.168]) with ESMTPSA (Nemesis) id 0MKHik-1fbti331Gw-001k9W; Thu, 12 Jul 2018 12:00:41 +0200 Subject: Re: problems with SAS JBODs 2 To: Ken Merry , Stephen Mcconnell Cc: FreeBSD-scsi References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> From: Oliver Sech Message-ID: <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> Date: Thu, 12 Jul 2018 12:00:41 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:eTB5jW7biihbF9I8CDtdTIu87j4ViglNKsrBno24dTijl0k5xED K1WJEi00JHRjq5C6Wnjo4QT+2De6RUbCSaZMq7+ir++whzWA0LAKcD0Fm/zQf40F2H1/fj9 oBodhKjsZt9oobytOxEPaFdJRY726C9PrRnEuT3U5Kdbcoavks38SX+o/+5DAJPH8NWhMDu HJdI9YhPm99yTHeem2wTQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:azlQ65a6PYI=:TY0Y+miLb7z9GWSrmj9gzl GXhLcG/qihEWRS4DINjcPGNC9JDj5KdhVFEviY3ERggL8e/4JEorEyNjg5cDtIuhPUq6j2nSB 7RhZqlmu/9wQxbwpFXm14tzOH20CC9IDteuKCoMfmcZlG6YCbu5HdyWLzHJXfPxQ/Urm/gz1R AQI5De3QlkEbEX+ppX2LKrJrhDAD4AtO13fR5OKOBnmjqw/RZXBCvlrjdzT43ci1y7KeU+5Dk aIpCS1LshJyZwSgphhrUXmS5k5zR6JhGtXTX13xy5HJ0kDq6aOI3aYcwB1pz+C+Fs2/r1hRpT tjy1sZ81GLftj+foYOI/0RshAHO6SafU0o1kmEnoBun3rep2Dy5dNT1p8rEhwppaM7WXIMC3H V5H3rZihs7ytFHWwSjKl9oA85aJtQF/9NsoGqzZ3prnXcW1TKiobGL/umkSCpo92Q0u2wmXOF 6rm5MgC4u6sH4Lb/H306dClYVfeVrYSFJijd7I8f5iUE+wO0cfdVsEXCTY4AcjD7ORRAEzTFZ IpaTHHBhzVVx+Xx8hJm+ebcauQlQS79AXKSORWmZGyOMQaJREJtFs7YEnJFt5C7AFlCU3MCpC oHM/ehtFKVpHKz62UUbxBlOt3hLzy4Ra2Zb/5IzXPtfQ4P51a3VJR1H25bflIBXxsNDfRBGkN Oh/ZgiraD9bJOWDprLHIWBDvJxib6BKSfSFLmXV4N+vLu/EWyMBSEvBLJ11pgsDg6zR0fHR5W vqiYHL8FEbQbqgXtzXj2PT5As3X7QAxw/buC82/bAdMSWe5GkJ2A73haPhwuBtPDeajmsmvwf sFssUw8 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jul 2018 10:00:46 -0000 On 07/11/2018 10:35 PM, Ken Merry wrote: > Oliver, what happens when you try to do I/O to the devices that don’t go away after you pull the cable? Does that cause the devices to go away? I tried to 'dd if=/dev/daX of=/dev/null bs=1k count=1' and at least the "da" device disappears. > Looking at the mprutil output, it also shows the devices sticking around from the adapter’s standpoint. > > You can also try a ‘camcontrol rescan all’ or a ‘camcontrol rescan N’ (where N is the scbus number shown by ‘camcontrol devlist -v’). That will do some basic probes for each of the devices and should in theory cause them to go away if they aren’t accessible. > > It seems like the adapter may not be recognizing that the devices in question have gone. I'm pretty sure that I tried this 'camcontrol rescan all' a few times. While I not sure anymore if that cleans up the non-working devices, I'm sure that no new devices were added. Unfortunately I haven't gotten yet to Steves 'clear controller mapping' script but I did a few other things: * The last time I tried to upgrade the firmware I had all sorts of problems. "sas3flash" reported bad checksums while flashing some of the files. So I reflashed both controllers with the DOS version of sas3flash. This was basically a challenge in itself because the DOS version of this utility does not seem to run on computers of this decade. (ERROR: Failed to initialize PAL. Exiting program.) The equivalent sas3flash.EFI version seems to be out of date and caused the checksum problems described before. (This time I wiped them before flashing with "sas3flash -o -e 6".) * I tried to change mpr tuneable "use_phy_num" after that but this has not improved the situation. I will retry and collect logs with Steves script. * I retried with the latest "mpr.ko" from the broadcom download page. (Same problems, no "use_phy_num" tuneable.) * I retested this hardware with Linux (4.15 and 4.17) ** Some shelves could be replugged reliably (ie: 45 disks show up, 45 disks disappear, 45 disks reappear) ** The newest shelf 2 disks were missing after the replugging (ie: 44 disks show up, 44 disks disappear, 42 disks reappear) (kernel log mpt3sas_cm0: "device is not present handle) * I tired a different controller ** So far I used a Broadcom LSI SAS 9305-16e (Controller: SAS3216) (Firmware 16.00.01.00 or 15.00.00.00) ** Yesterday I switched to a new fresh out-of-the-box Broadcom LSI 9305-24i (Controller: SAS3224) (Firmware 09.00.00.00 (or something similar with 09*)) With the new controller everything seems work on Linux. It might be the old Firmware?... It is better with the new controller on FreeBSD in that sense that I at least get one out of two /dev/sesX devices back. But disks are still missing and are not getting completely cleaned up... This whole thing is a bit frustrating, especially since up until now I thought that HBAs are kind of "connect and forget" devices. Next step is to set up a separate test environment and try to get it to work there. I will keep you updated and try provide log for all FreeBSD related problems. From owner-freebsd-scsi@freebsd.org Thu Jul 12 10:09:27 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AAA231049F88 for ; Thu, 12 Jul 2018 10:09:27 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 21F598834F; Thu, 12 Jul 2018 10:09:27 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wr1-x436.google.com with SMTP id m1-v6so8367430wrg.5; Thu, 12 Jul 2018 03:09:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=hJcZgd98T2MHX8yfqohD5la4HRZSYC3JyEH8yiKH5Xc=; b=bFeb+ROrnUtj/ITb/eTEFAMeGicG8taB5UjR10+PINFKHqSfOFLSXtXHoCugFrdzJz kcEQ5XiUpee7duAU1Qrg/gA6PraYFWIiIhYU43Wd2EeeX+KXB2G9FIWvsxVkJfboAAPf PDB0UrScnyrsLgE/a0TWDmeFq9jVYWpdQU60QhDi46Zo4lrv+KzycLwGt9X/HFi0EDq6 UfxHI2EWTYnn7ltBPdWKj85Z9x2GwsEHXXHIMRVf75yz9DDO5pA/vxa6DknUKxAp0Bpz C9SnTAmqOQPQz0FHaQPVVg9wqwM5qB4RbNIOJLaRFGOTkY9RFErr7UBuTjMQWk1IzKZ+ Vc2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=hJcZgd98T2MHX8yfqohD5la4HRZSYC3JyEH8yiKH5Xc=; b=twxQbLI1wD5ss/sgbVj8Y5Q4Yz1O1DVgonKyGkw44642zUq5iTOsdsF7C1mIZNDFHx gu7IbC4Y/bEBWCz6MaWx4Zv4p/qVyb8g8HyIOQ2/WUiehRpRPt4u1hKC+Q35AB6RoFAI f0+4NsogjvbLg3jOlISUnge7qMbvPomlPXAZwOAbSOGx10/BmzfTTUZEgfC7GkHNQOls wAt/gfwBpDMeQueTfy059Cb8Kb7DLJNF52jDfDnnmzIpZuam9Bo7s//fPDnv+nRxOTFz 8efMQZDscaSjrbW8MkIGhtQgV5sBVmSckKJxWZsIi+CPYgzVnzC/YPHmUiOno5IhdJn5 tIYQ== X-Gm-Message-State: AOUpUlEewc+QE7xSO/+PNV19jiAeYj7H8TciWXK46QmDLHWQJSaMj8H/ o2gBu8nuUSZULurA1dru6Zw= X-Google-Smtp-Source: AAOMgpe0VpvJmH1+Hf99xod+2Wep2f4I0PML64whFP8E1ge7ZrGIKUlGvAVgOJsKdtAjz1a6MCansw== X-Received: by 2002:a5d:4210:: with SMTP id n16-v6mr1174922wrq.55.1531390166214; Thu, 12 Jul 2018 03:09:26 -0700 (PDT) Received: from bens-mac.home (LFbn-1-7163-89.w90-116.abo.wanadoo.fr. [90.116.94.89]) by smtp.gmail.com with ESMTPSA id r125-v6sm2668389wmb.27.2018.07.12.03.09.25 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 12 Jul 2018 03:09:25 -0700 (PDT) Content-Type: text/plain; charset=us-ascii; delsp=yes; format=flowed Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: problems with SAS JBODs 2 From: Ben RUBSON In-Reply-To: <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> Date: Thu, 12 Jul 2018 12:09:24 +0200 Cc: Ken Merry , Stephen Mcconnell , FreeBSD-scsi Content-Transfer-Encoding: 7bit Message-Id: <628BB9ED-238C-4427-A075-7955A61DE311@gmail.com> References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> To: Oliver Sech X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jul 2018 10:09:28 -0000 On 12 Jul 2018 12:00, Oliver Sech wrote: > It is better with the new controller on FreeBSD in that sense that I at > least get one out of two /dev/sesX devices back. But disks are still > missing and are not getting completely cleaned up... On 03 Jul 2018 14:54, Ben RUBSON wrote: > I faced same sort of issue but with iSCSI disks. > At least disks did not disconnect properly, and did not reconnect until a > reboot was performed. > Among the needed iSCSI patches, a GEOM one has been pushed : > https://github.com/freebsd/freebsd/commit/ea40366602be7548eba0bec35fec46ea4509dbb7 > (it's in 11.2, but not in your 11.1). > Perhaps this could help. Did you try with FreeBSD 11.2 or with the kernel patch I proposed above ? Ben From owner-freebsd-scsi@freebsd.org Thu Jul 12 13:38:42 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 200BF103A5B4 for ; Thu, 12 Jul 2018 13:38:42 +0000 (UTC) (envelope-from ken@freebsd.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mithlond.kdm.org", Issuer "mithlond.kdm.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id BAC1791DC7 for ; Thu, 12 Jul 2018 13:38:41 +0000 (UTC) (envelope-from ken@freebsd.org) Received: from [10.0.0.26] (mbp2013.int.kdm.org [10.0.0.26]) (authenticated bits=0) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPSA id w6CDcdTl017936 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 12 Jul 2018 09:38:39 -0400 (EDT) (envelope-from ken@freebsd.org) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.4 \(3445.8.2\)) Subject: Re: problems with SAS JBODs 2 From: Ken Merry In-Reply-To: <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> Date: Thu, 12 Jul 2018 09:38:36 -0400 Cc: Stephen Mcconnell , FreeBSD-scsi Content-Transfer-Encoding: quoted-printable Message-Id: <7C1E630B-65AD-4FE8-BFDF-F13068070B5E@freebsd.org> References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> To: Oliver Sech X-Mailer: Apple Mail (2.3445.8.2) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [96.89.93.250]); Thu, 12 Jul 2018 09:38:40 -0400 (EDT) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jul 2018 13:38:42 -0000 > On Jul 12, 2018, at 6:00 AM, Oliver Sech = wrote: >=20 > On 07/11/2018 10:35 PM, Ken Merry wrote: >> Oliver, what happens when you try to do I/O to the devices that = don=E2=80=99t go away after you pull the cable? Does that cause the = devices to go away? >=20 > I tried to 'dd if=3D/dev/daX of=3D/dev/null bs=3D1k count=3D1' and at = least the "da" device disappears. Ok, that=E2=80=99s good. Can you send the dmesg output and check with = =E2=80=98camcontrol devlist -v=E2=80=99 to make sure the device has = fully gone away? The reason I ask is that I have spent lots of time over the years = debugging device arrival and departure problems in CAM, GEOM and devfs, = and I want to make sure we aren=E2=80=99t running into any non-SAS = related problems. >=20 >> Looking at the mprutil output, it also shows the devices sticking = around from the adapter=E2=80=99s standpoint. >>=20 >> You can also try a =E2=80=98camcontrol rescan all=E2=80=99 or a = =E2=80=98camcontrol rescan N=E2=80=99 (where N is the scbus number shown = by =E2=80=98camcontrol devlist -v=E2=80=99). That will do some basic = probes for each of the devices and should in theory cause them to go = away if they aren=E2=80=99t accessible. >>=20 >> It seems like the adapter may not be recognizing that the devices in = question have gone. >=20 >=20 > I'm pretty sure that I tried this 'camcontrol rescan all' a few times. = While I not sure anymore if that cleans up the non-working devices, I'm = sure that no new devices were added. If doing a read from the device with dd makes it go away, =E2=80=98camcont= rol rescan all=E2=80=99 should make it go away as well. It sends = command to every device, and if the mpr(4) driver tells CAM the drive is = no longer there, it=E2=80=99ll get removed. If it doesn=E2=80=99t cause the device to get removed (and the rescan = doesn=E2=80=99t hang), it means that you=E2=80=99re getting a response = from a device that is no longer physically connected to the machine, = which is impossible with SAS. >=20 > Unfortunately I haven't gotten yet to Steves 'clear controller = mapping' script but I did a few other things: Steve=E2=80=99s email made it sound like he was going to send it. I = just sent it to you separately. > * The last time I tried to upgrade the firmware I had all sorts of = problems. "sas3flash" reported bad checksums while flashing some of the = files. > So I reflashed both controllers with the DOS version of sas3flash. = This was basically a challenge in itself because the DOS version of this = utility does not seem to run on computers of this decade. (ERROR: = Failed to initialize PAL. Exiting program.) > The equivalent sas3flash.EFI version seems to be out of date and = caused the checksum problems described before. > (This time I wiped them before flashing with "sas3flash -o -e 6=E2=80=9D= .) That is unfortunate=E2=80=A6perhaps Steve has some insight. >=20 > * I tried to change mpr tuneable "use_phy_num" after that but this has = not improved the situation. I will retry and collect logs with Steves = script. Changed it to what? I think it defaults to 1. Did you try 0? > * I retried with the latest "mpr.ko" from the broadcom download page. = (Same problems, no "use_phy_num" tuneable.) >=20 > * I retested this hardware with Linux (4.15 and 4.17) > ** Some shelves could be replugged reliably (ie: 45 disks show up, 45 = disks disappear, 45 disks reappear) > ** The newest shelf 2 disks were missing after the replugging (ie: 44 = disks show up, 44 disks disappear, 42 disks reappear) (kernel log = mpt3sas_cm0: "device is not present handle) >=20 > * I tired a different controller > ** So far I used a Broadcom LSI SAS 9305-16e (Controller: SAS3216) = (Firmware 16.00.01.00 or 15.00.00.00) > ** Yesterday I switched to a new fresh out-of-the-box Broadcom LSI = 9305-24i (Controller: SAS3224) (Firmware 09.00.00.00 (or something = similar with 09*)) > With the new controller everything seems work on Linux. It might be = the old Firmware?... > It is better with the new controller on FreeBSD in that sense that I = at least get one out of two /dev/sesX devices back. But disks are still = missing and are not getting completely cleaned up=E2=80=A6 It does sound a bit like a mapping table problem. Clearing it might = help, we=E2=80=99ll see. > This whole thing is a bit frustrating, especially since up until now I = thought that HBAs are kind of "connect and forget" devices. Next step is = to set up a separate test environment and try to get it to work there. I = will keep you updated and try provide log for all FreeBSD related = problems. Thanks for debugging this. Unfortunately there are a number of ways it = can go wrong. The mapping code has been the source of some problems, = sometimes enclosure vendors do the wrong thing, and sometimes there are = other bugs. Ken =20