From owner-freebsd-scsi@freebsd.org  Tue Feb 14 14:19:21 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1A509CDE399
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Tue, 14 Feb 2017 14:19:21 +0000 (UTC) (envelope-from crest@rlwinm.de)
Received: from smtp.rlwinm.de (smtp.rlwinm.de [148.251.233.239])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id DB20818F3
 for <freebsd-scsi@freebsd.org>; Tue, 14 Feb 2017 14:19:20 +0000 (UTC)
 (envelope-from crest@rlwinm.de)
Received: from crest.local (unknown [87.253.189.132])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.rlwinm.de (Postfix) with ESMTPSA id 5FCF111156
 for <freebsd-scsi@freebsd.org>; Tue, 14 Feb 2017 15:19:13 +0100 (CET)
Subject: Re: multipath device never failing - loops over providers instead
To: freebsd-scsi@freebsd.org
References: <20170211045605.GA43225@FreeBSD.org>
From: Jan Bramkamp <crest@rlwinm.de>
Message-ID: <7dfa461d-ade3-0410-3ff1-540631561393@rlwinm.de>
Date: Tue, 14 Feb 2017 15:19:13 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0)
 Gecko/20100101 Thunderbird/45.7.1
MIME-Version: 1.0
In-Reply-To: <20170211045605.GA43225@FreeBSD.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Feb 2017 14:19:21 -0000

On 11/02/2017 05:56, John wrote:
> Hi Folks,
>
>    Running 10.3-STABLE  r308246 from Nov 3, 2016
>
>    I thought I saw a commit in this area a while back but I
> cannot seem to find it nor is google helping..
>
>    I have SAS drives behind 2 multiplexers (4 paths total) which
> are all configured similar to the following:
>
> # gmultipath status Z76
>          Name   Status  Components
> multipath/Z76  OPTIMAL  da92 (ACTIVE)
>                         da236 (PASSIVE)
>                         da428 (PASSIVE)
>                         da572 (PASSIVE)
>
>    For each path on the components above, the following sequence occurs:
>
> kernel: (da92:mpr0:0:399:0): READ(10). CDB: 28 00 0b a7 20 c0 00 00 10 00
> kernel: (da92:mpr0:0:399:0): CAM status: SCSI Status Error
> kernel: (da92:mpr0:0:399:0): SCSI status: Check Condition
> kernel: (da92:mpr0:0:399:0): SCSI sense: HARDWARE FAILURE asc:32,0 (No defect spare location available)
> kernel: (da92:mpr0:0:399:0): Info: 0xba720c0
> kernel: (da92:mpr0:0:399:0): Field Replaceable Unit: 157
> kernel: (da92:mpr0:0:399:0): Command Specific Info: 0x80010000
> kernel: (da92:mpr0:0:399:0): Actual Retry Count: 255
> kernel: (da92:mpr0:0:399:0): Retrying command (per sense data)
>
>    After each path has failed, the following is seen:
>
> kernel: GEOM_MULTIPATH: Error 5, da92 in Z76 marked FAIL
> kernel: GEOM_MULTIPATH: all paths in Z76 were marked FAIL, restore da572
> kernel: GEOM_MULTIPATH: all paths in Z76 were marked FAIL, restore da428
> kernel: GEOM_MULTIPATH: all paths in Z76 were marked FAIL, restore da236
> kernel: GEOM_MULTIPATH: da572 is now active path in Z76
>
>    and the entire failure loop occurs again. The multipath device
> itself is never failed (so the zfs pool can never go into degraded mode,
> the faulty drive replaced with a spare, etc).
>
>    Once I pulled the drive the multipath device Z76 fails and
> things sent as expected.
>
>    It seems g_multipath_fault() in this instance should just fail the device.
>
>    Does anyone have any pointers on this issue?

This is a known bug in GEOM multipath. There are at least two open PRs 
mentioning exactly this problem. When i encountered it even prevented my 
system from booting into single user mode. As soon as GEOM multipath 
found the metadata over one path it consumed that path. Too bad that the 
tasting on of the new multipath provider triggered a read error, because 
GEOM multipath just entered an infinite retry loop over all known paths. 
Because of this bug GEOM multipath is unusable for production. I suspect 
that it wouldn't be too hard to fix if there is a way to attach some 
state (e.g. a bitmap of failed paths) to each BIO request.

From owner-freebsd-scsi@freebsd.org  Wed Feb 15 20:34:32 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3E7C7CE00A5;
 Wed, 15 Feb 2017 20:34:32 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 6335C7B6;
 Wed, 15 Feb 2017 20:34:27 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA05093;
 Wed, 15 Feb 2017 22:34:26 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1ce6HS-000Eeb-1M; Wed, 15 Feb 2017 22:34:26 +0200
To: freebsd-geom@FreeBSD.org, freebsd-scsi@FreeBSD.org
From: Andriy Gapon <avg@FreeBSD.org>
Subject: cd + geom -> panic on media change
Message-ID: <6a59787c-ecba-80db-a669-d0987466e3f8@FreeBSD.org>
Date: Wed, 15 Feb 2017 22:33:05 +0200
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.6.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Feb 2017 20:34:32 -0000


I had an Audio CD in a DVD drive.
I booted with that CD in and this is how it was reported:
cd0: <Optiarc DVD RW AD-7191S 1.02> Removable CD-ROM SCSI device
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: 392MB (200919 2048 byte sectors)

I see that there is some special code in scsi_cd.c that sets sector size to 2352
for Audio CDs, but that was not reflected in the report quoted above.

Later I popped out the CD (using the physical eject button, if that matters) and
popped in a UDF formatted DVD disk.  That resulted in the following panic:

panic: wrong offset 472559440 for sectorsize 2048
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff8043517b = db_trace_self_wrapper+0x2b/frame
0xfffffe0504bc8870
kdb_backtrace() at 0xffffffff80685289 = kdb_backtrace+0x39/frame 0xfffffe0504bc8920
vpanic() at 0xffffffff8064ce8c = vpanic+0x14c/frame 0xfffffe0504bc8960
panic() at 0xffffffff8064cbd3 = panic+0x43/frame 0xfffffe0504bc89c0
g_io_request() at 0xffffffff805c3b81 = g_io_request+0x3e1/frame 0xfffffe0504bc8a00
g_read_data() at 0xffffffff805c48e7 = g_read_data+0x77/frame 0xfffffe0504bc8a40
g_part_gpt_probe() at 0xffffffff805df0d1 = g_part_gpt_probe+0x111/frame
0xfffffe0504bc8a80
G_PART_PROBE() at 0xffffffff805daa0b = G_PART_PROBE+0x4b/frame 0xfffffe0504bc8aa0
g_part_probe() at 0xffffffff805da366 = g_part_probe+0xc6/frame 0xfffffe0504bc8af0
g_part_taste() at 0xffffffff805d8de7 = g_part_taste+0x147/frame 0xfffffe0504bc8b30
g_new_provider_event() at 0xffffffff805c7a0b = g_new_provider_event+0x10b/frame
0xfffffe0504bc8b50
one_event() at 0xffffffff805c2a4f = one_event+0xff/frame 0xfffffe0504bc8b70
g_run_events() at 0xffffffff805c2875 = g_run_events+0x65/frame 0xfffffe0504bc8b90
g_event_procbody() at 0xffffffff805c4f38 = g_event_procbody+0x58/frame
0xfffffe0504bc8ba0
fork_exit() at 0xffffffff80614800 = fork_exit+0xd0/frame 0xfffffe0504bc8bf0
fork_trampoline() at 0xffffffff80837a7e = fork_trampoline+0xe/frame
0xfffffe0504bc8bf0

The actual DVD has size 680368 x 2048.
I did some basic math and 472559440 == 200919 * 2352 - 2048.
Given that the code was probing GPT, the offset is consistent with mediasize
still being the old size of the Audio CD while the sector size being 2048.
Seems like the disk properties were not properly updated before posting the new
provider event.

Patches and suggestions are welcome :)
Thank you!

-- 
Andriy Gapon

From owner-freebsd-scsi@freebsd.org  Wed Feb 15 20:38:40 2017
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A82F9CE012F;
 Wed, 15 Feb 2017 20:38:40 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id A92A2866;
 Wed, 15 Feb 2017 20:38:38 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA05104;
 Wed, 15 Feb 2017 22:38:37 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1ce6LV-000Eeq-BF; Wed, 15 Feb 2017 22:38:37 +0200
Subject: Re: cd + geom -> panic on media change
To: freebsd-geom@FreeBSD.org, freebsd-scsi@FreeBSD.org
References: <6a59787c-ecba-80db-a669-d0987466e3f8@FreeBSD.org>
From: Andriy Gapon <avg@FreeBSD.org>
Message-ID: <07e08996-49d4-3817-cf2b-01504e25ca67@FreeBSD.org>
Date: Wed, 15 Feb 2017 22:37:41 +0200
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.6.0
MIME-Version: 1.0
In-Reply-To: <6a59787c-ecba-80db-a669-d0987466e3f8@FreeBSD.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Feb 2017 20:38:40 -0000

On 15/02/2017 22:33, Andriy Gapon wrote:
> 
> I had an Audio CD in a DVD drive.
> I booted with that CD in and this is how it was reported:
> cd0: <Optiarc DVD RW AD-7191S 1.02> Removable CD-ROM SCSI device
> cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
> cd0: 392MB (200919 2048 byte sectors)
> 
> I see that there is some special code in scsi_cd.c that sets sector size to 2352
> for Audio CDs, but that was not reflected in the report quoted above.
> 
> Later I popped out the CD (using the physical eject button, if that matters) and
> popped in a UDF formatted DVD disk.  That resulted in the following panic:
> 
> panic: wrong offset 472559440 for sectorsize 2048


By the way, I see that all requests going down pass through g_io_check() which
would reject such a request with EINVAL.
Why do we have the KASSERT in g_io_request() then?

> KDB: stack backtrace:
> db_trace_self_wrapper() at 0xffffffff8043517b = db_trace_self_wrapper+0x2b/frame
> 0xfffffe0504bc8870
> kdb_backtrace() at 0xffffffff80685289 = kdb_backtrace+0x39/frame 0xfffffe0504bc8920
> vpanic() at 0xffffffff8064ce8c = vpanic+0x14c/frame 0xfffffe0504bc8960
> panic() at 0xffffffff8064cbd3 = panic+0x43/frame 0xfffffe0504bc89c0
> g_io_request() at 0xffffffff805c3b81 = g_io_request+0x3e1/frame 0xfffffe0504bc8a00
> g_read_data() at 0xffffffff805c48e7 = g_read_data+0x77/frame 0xfffffe0504bc8a40
> g_part_gpt_probe() at 0xffffffff805df0d1 = g_part_gpt_probe+0x111/frame
> 0xfffffe0504bc8a80
> G_PART_PROBE() at 0xffffffff805daa0b = G_PART_PROBE+0x4b/frame 0xfffffe0504bc8aa0
> g_part_probe() at 0xffffffff805da366 = g_part_probe+0xc6/frame 0xfffffe0504bc8af0
> g_part_taste() at 0xffffffff805d8de7 = g_part_taste+0x147/frame 0xfffffe0504bc8b30
> g_new_provider_event() at 0xffffffff805c7a0b = g_new_provider_event+0x10b/frame
> 0xfffffe0504bc8b50
> one_event() at 0xffffffff805c2a4f = one_event+0xff/frame 0xfffffe0504bc8b70
> g_run_events() at 0xffffffff805c2875 = g_run_events+0x65/frame 0xfffffe0504bc8b90
> g_event_procbody() at 0xffffffff805c4f38 = g_event_procbody+0x58/frame
> 0xfffffe0504bc8ba0
> fork_exit() at 0xffffffff80614800 = fork_exit+0xd0/frame 0xfffffe0504bc8bf0
> fork_trampoline() at 0xffffffff80837a7e = fork_trampoline+0xe/frame
> 0xfffffe0504bc8bf0
> 
> The actual DVD has size 680368 x 2048.
> I did some basic math and 472559440 == 200919 * 2352 - 2048.
> Given that the code was probing GPT, the offset is consistent with mediasize
> still being the old size of the Audio CD while the sector size being 2048.
> Seems like the disk properties were not properly updated before posting the new
> provider event.
> 
> Patches and suggestions are welcome :)
> Thank you!
> 


-- 
Andriy Gapon