From owner-freebsd-questions@freebsd.org Sun Mar 1 07:16:45 2020 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 738CA259831 for ; Sun, 1 Mar 2020 07:16:45 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from holgerdanske.com (holgerdanske.com [184.105.128.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "holgerdanske.com", Issuer "holgerdanske.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 48VZMb3Y7rz3Kgr for ; Sun, 1 Mar 2020 07:16:43 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from 99.100.19.101 ([99.100.19.101]) by holgerdanske.com with ESMTPSA (ECDHE-RSA-AES128-GCM-SHA256:TLSv1.2:Kx=ECDH:Au=RSA:Enc=AESGCM(128):Mac=AEAD) (SMTP-AUTH username dpchrist@holgerdanske.com, mechanism PLAIN) for ; Sat, 29 Feb 2020 23:16:31 -0800 Subject: Re: ZFS i/o error on boot unable to start system To: freebsd-questions@freebsd.org References: From: David Christensen Message-ID: Date: Sat, 29 Feb 2020 23:16:30 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 48VZMb3Y7rz3Kgr X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of dpchrist@holgerdanske.com has no SPF policy when checking 184.105.128.27) smtp.mailfrom=dpchrist@holgerdanske.com X-Spamd-Result: default: False [-1.68 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-0.98)[-0.985,0]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; IP_SCORE(-0.60)[ipnet: 184.104.0.0/15(0.65), asn: 6939(-3.58), country: US(-0.05)]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; DMARC_NA(0.00)[holgerdanske.com]; RCVD_IN_DNSWL_NONE(0.00)[27.128.105.184.list.dnswl.org : 127.0.10.0]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:6939, ipnet:184.104.0.0/15, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Mar 2020 07:16:45 -0000 On 2020-02-28 11:04, David Christensen wrote: > On 2020-02-28 05:51, James B. Byrne via freebsd-questions wrote: >> I have reported this on the forums as well. >> >> FreeBSD-12.1p2 >> raidz2 on 4x8TB HDD (reds) >> root on zfs >> >> We did a hot restart of this host this morning and received the >> following on >> the console: >> >> ZFS: i/o error - all block copies unavailable >> ZFS: failed to read pool zroot directory object >> qptzfsboot: failed to mount default pool zroot >> >> FreeBSD/x86 boot >> ZFS: i/o error - all block copies unavailable >> ZFS: can't fild dataset 0 >> Default: zroot/<0x0> >> boot: >> >> What has happened?  How do I get this system back up and online? >> >> My first thought is that in modifying rc.conf to change some ip4 address >> assignments that I may have done something else inadvertently which >> has caused >> this.  I cannot think of any other changes made since the system was last >> restarted a noon yesterday. >> >> ​This is an urgent matter.  Any help is gratefully welcomed. > > So, you have a desktop computer with four Western Digital Red 8 TB SATA > hard disk drives.  You installed FreeBSD-12.1-RELEASE-amd64 and ended up > with one ZFS RAIDZ2 pool with everything in it -- boot, root, usr, var, > tmp, home, whatever.  You have since upgraded to 12.1-p2.  Yesterday, > you edited /etc/rc.conf and now the system will not boot. > > > The most likely explanation is that you broke rc.conf. > > > One possible solution would be too boot a rescue shell or live system, > import the RAIDZ2 pool, and fix rc.conf.  Be sure to export the pool > each time you are done editing and before attempting to boot it. > > > Let us know how it works out. I put my operating systems on a single device -- typically a 2.5" SATA SSD, but sometimes a USB 3.0 flash drive (which tends to work the best in USB 2.0 ports). I use BIOS boot, MBR partitioning, ZFS boot, GELI random key swap, and GELI passphrase ZFS root. My bulk data is on 3.5" SATA HDD's, each with GPT, one large partition, and GELI, fed into a ZFS pool; one data pool per computer. All of my tower and rack computers have 2.5" racks for the system drive. One has racks for the 3.5" data drives (the others are internal). The goal is to be able to mix and match computers, system drives, and data pools as required. I migrated two 2.5" SSD system disks and two sets of pool drives to different computers this evening. Initially, both computers failed to boot with the OP's message "ZFS: i/o error - all block copies unavailable". Swapping the drives allowed one computer to boot. The other produced a white screen of death at some late point in the FreeBSD boot loader (third stage?). Unracking the data drives and resetting the CMOS settings to defaults allowed it to boot. (My guess is that the white screen of death was caused by incorrect CMOS Setup video settings.) I then shutdown, racked the data drives, and rebooted. The device node names had changed, the system drive was no longer ada0 (it was ada2), and root GELI was broken. The solution was to reverse the order of the SATA port connections, unrack the data drives, boot, shutdown, rack the data drives, and boot again. (My guess is that /boot/zfs/zpool.cache is involved?) David