From owner-freebsd-questions@freebsd.org Mon Mar 9 16:12:32 2020 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6D3F4266B29 for ; Mon, 9 Mar 2020 16:12:32 +0000 (UTC) (envelope-from byrnejb@harte-lyne.ca) Received: from mx32.harte-lyne.ca (mx32.harte-lyne.ca [216.185.71.32]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mx32.harte-lyne.ca", Issuer "CA_HLL_ISSUER_2016" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 48bjt66k5Cz4LWt for ; Mon, 9 Mar 2020 16:12:30 +0000 (UTC) (envelope-from byrnejb@harte-lyne.ca) Received: from mx32.harte-lyne.ca (localhost [127.0.32.1]) by mx32.harte-lyne.ca (Postfix) with ESMTP id D354C4C64 for ; Mon, 9 Mar 2020 12:12:29 -0400 (EDT) X-Virus-Scanned: amavisd-new at harte-lyne.ca Received: from mx32.harte-lyne.ca ([127.0.32.1]) by mx32.harte-lyne.ca (mx32.harte-lyne.ca [127.0.32.1]) (amavisd-new, port 10024) with ESMTP id 9PXhGFotVVLM for ; Mon, 9 Mar 2020 12:12:01 -0400 (EDT) Received: from webmail.harte-lyne.ca (webmail.hamilton.harte-lyne.ca [216.185.71.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx32.harte-lyne.ca (Postfix) with ESMTPSA id 0DA374C4F for ; Mon, 9 Mar 2020 12:12:01 -0400 (EDT) Received: from 216.185.71.44 (SquirrelMail authenticated user byrnejb_hll) by webmail.harte-lyne.ca with HTTP; Mon, 9 Mar 2020 12:11:59 -0400 Message-ID: <31c48e162379e683098551521a528a25.squirrel@webmail.harte-lyne.ca> Date: Mon, 9 Mar 2020 12:11:59 -0400 Subject: FreeBSD-12.1 ZFS boot wierdness From: "James B. Byrne" To: freebsd-questions@freebsd.org Reply-To: byrnejb@harte-lyne.ca User-Agent: SquirrelMail/1.4.23 [SVN] MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Rspamd-Queue-Id: 48bjt66k5Cz4LWt X-Spamd-Bar: -------- X-Spamd-Result: default: False [-8.43 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; HAS_REPLYTO(0.00)[byrnejb@harte-lyne.ca]; R_SPF_ALLOW(-0.20)[+ip4:216.185.71.0/26]; TO_DN_NONE(0.00)[]; RCVD_DKIM_ARC_DNSWL_MED(-0.50)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[32.71.185.216.list.dnswl.org : 127.0.4.2]; DKIM_TRACE(0.00)[harte-lyne.ca:+]; HAS_X_PRIO_THREE(0.00)[3]; DMARC_POLICY_ALLOW(-0.50)[harte-lyne.ca,quarantine]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:12021, ipnet:216.185.64.0/20, country:CA]; IP_SCORE(-3.77)[ip: (-9.90), ipnet: 216.185.64.0/20(-4.94), asn: 12021(-3.95), country: CA(-0.09)]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.96)[-0.958,0]; R_DKIM_ALLOW(-0.20)[harte-lyne.ca:s=dkim_hll]; RCVD_COUNT_FIVE(0.00)[5]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; DWL_DNSWL_LOW(-1.00)[harte-lyne.ca.dwl.dnswl.org : 127.0.4.1] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Mar 2020 16:12:32 -0000 We experienced the situation last week where one of our hosts failed to reboot on a warm restart with i/o errors being reported. This host was configured with 4 x 8Tb in a raidz2 with root0-on-zfs. After going down a lot of rabbit holes we established that the zfs pool was intact. As a last resort, following replication the pool's content on another host, we pulled a spared unit with the same hardware configuration, installed the hdds from the first unit into it. And that unit booted with problem. Problem solved, right? Not so fast. Before putting the replacement unit into service I carried out a series of tests on that host to ensure that any combination of two drives would actually boot. And this is were things get a little inexplicable, at least for me. Given four HDDs: A, B, C, D; and a host with four hot swap drive bays: 0, 1, 2, 3; I can boot with any combination of two drives EXCEPT when the drive in slot 2 does not have its companion in either slot 1 or 3. For example 0A,1-,2-,3B will boot, as will the reverse 0B,1-,2-,3A. Any two drives in positions 0 and 3 will boot, as will any two drives in 0 and 1, or 0 and 3, or 1 and 3, or 2 and 3. What will not boot is 0X,1-,2Y,3-; for any values of X and Y. Does anyone have any idea what is going on? -- *** e-Mail is NOT a SECURE channel *** Do NOT transmit sensitive data via e-Mail Do NOT open attachments nor follow links sent by e-Mail James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3