From owner-freebsd-stable@freebsd.org Mon Jul 22 00:43:05 2019 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 00D83A398D for ; Mon, 22 Jul 2019 00:43:04 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from dec.sakura.ne.jp (dec.sakura.ne.jp [210.188.226.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8369371722; Mon, 22 Jul 2019 00:43:01 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from fortune.joker.local (124-18-96-116.dz.commufa.jp [124.18.96.116]) (authenticated bits=0) by dec.sakura.ne.jp (8.15.2/8.15.2/[SAKURA-WEB]/20080708) with ESMTPA id x6M0B7Zx060852; Mon, 22 Jul 2019 09:11:07 +0900 (JST) (envelope-from junchoon@dec.sakura.ne.jp) Date: Mon, 22 Jul 2019 09:11:07 +0900 From: Tomoaki AOKI To: freebsd-stable@freebsd.org Cc: wollman@csail.mit.edu, trond.endrestol@ximalas.info, eugen@grosbein.net, mav@FreeBSD.org Subject: Re: ZFS root mount regression Message-Id: <20190722091107.910eed56a32e2fde506c013f@dec.sakura.ne.jp> In-Reply-To: <841d26dd-7433-2e6d-9011-76ed7ad3d5d2@FreeBSD.org> References: <23858.2573.932364.128957@khavrinen.csail.mit.edu> <73cddcd9-97f0-e73f-da9d-2a454fd3ea1a@grosbein.net> <841d26dd-7433-2e6d-9011-76ed7ad3d5d2@FreeBSD.org> Reply-To: junchoon@dec.sakura.ne.jp Organization: Junchoon corps X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.32; amd64-portbld-freebsd12.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 8369371722 X-Spamd-Bar: +++++ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [5.71 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; HAS_REPLYTO(0.00)[junchoon@dec.sakura.ne.jp]; MV_CASE(0.50)[]; TO_DN_NONE(0.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCPT_COUNT_FIVE(0.00)[5]; HAS_ORG_HEADER(0.00)[]; MX_GOOD(-0.01)[dec.sakura.ne.jp]; RECEIVED_SPAMHAUS_PBL(0.00)[116.96.18.124.zen.spamhaus.org : 127.0.0.10]; IP_SCORE(1.33)[ipnet: 210.188.224.0/19(4.86), asn: 9370(1.83), country: JP(-0.04)]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:9370, ipnet:210.188.224.0/19, country:JP]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.99)[0.989,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[sakura.ne.jp]; AUTH_NA(1.00)[]; NEURAL_SPAM_MEDIUM(1.00)[0.999,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(1.00)[1.000,0]; R_SPF_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jul 2019 00:43:05 -0000 Hi. Maybe different problem (as mav@ noted) with Garrett's but related to parallel-mounting. *For Garrett's problem, +1 with Trond. For myself, I incorporate drive type and No. in pool name to avoid collision between 2 physical drives (one working, and one emergency) in the same host. After ZFS parallel mounting is committed (both head and stable/12), auto-mounting from manually-imported non-root pool(s) looks racy and usually fails (some datasets are shown as mounted, but not accessible until manual unmount/remount is proceeded). *I'm experiencing the problem when I import another root pool by `zpool import -R /mnt -f poolname`. Patch from ZoL on bug 237517 [1] seems to fix the parallel mounting race. (Named ZoL fix by fullermd.) As it seemed to be race condition, I'm not 100% shure the patch is really correct. [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237517 On Sun, 21 Jul 2019 17:41:59 -0400 Alexander Motin wrote: > Hi, > > I am not sure how the original description leads to conclusion that > problem is related to parallel mounting. From my point of view it > sounds like a problem that root pool mounting happens based on name, not > pool GUID that needs to be passed from the loader. We have seen problem > like that ourselves too when boot pool names collide. So I doubt it is > a new problem, just nobody got to fixing it yet. > > On 20.07.2019 06:41, Eugene Grosbein wrote: > > CC'ing Alexander Motin who comitted the change. > > > > 20.07.2019 1:21, Garrett Wollman wrote: > > > >> I recently upgraded several file servers from 11.2 to 11.3. All of > >> them boot from a ZFS pool called "tank" (the data is in a different > >> pool). In a couple of instances (which caused me to have to take a > >> late-evening 140-mile drive to the remote data center where they are > >> located), the servers crashed at the root mount phase. In one case, > >> it bailed out with error 5 (I believe that's [EIO]) to the usual > >> mountroot prompt. In the second case, the kernel panicked instead. > >> > >> The root cause (no pun intended) on both servers was a disk which was > >> supplied by the vendor with a label on it that claimed to be part of > >> the "tank" pool, and for some reason the 11.3 kernel was trying to > >> mount that (faulted) pool rather than the real one. The disks and > >> pool configuration were unchanged from 11.2 (and probably 11.1 as > >> well) so I am puzzled. > >> > >> Other than laboriously running "zpool labelclear -f /dev/somedisk" for > >> every piece of media that comes into my hands, is there anything else > >> I could have done to avoid this? > > > > Both 11.3-RELEASE announcement and Release Notes mention this: > > > >> The ZFS filesystem has been updated to implement parallel mounting. > > > > I strongly suggest reading Release documentation in case of troubles > > after upgrade, at least. Or better, read *before* updating. > > > > I guess this parallelism created some race for your case. > > > > Unfortunately, a way to fall back to sequential mounting seems undocumented. > > libzfs checks for ZFS_SERIAL_MOUNT environment variable to exist having any value. > > I'm not sure how you set it for mounting root, maybe it will use kenv, > > so try adding to /boot/loader.conf: > > > > ZFS_SERIAL_MOUNT=1 > > > > Alexander should have more knowledge on this. > > > > And of course, attaching unrelated device having label conflicting > > with root pool is asking for trouble. Re-label it ASAP. > > > > -- > Alexander Motin > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Tomoaki AOKI