From owner-freebsd-fs@freebsd.org Wed Jun 5 07:50:12 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 05DF015C8B28 for ; Wed, 5 Jun 2019 07:50:12 +0000 (UTC) (envelope-from pen@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [IPv6:2001:6b0:17:f0a0::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7E402703B2; Wed, 5 Jun 2019 07:50:11 +0000 (UTC) (envelope-from pen@lysator.liu.se) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 7AE1940020; Wed, 5 Jun 2019 09:50:07 +0200 (CEST) Received: from [IPv6:2001:6b0:17:fc08:44fc:d1e1:765d:527c] (unknown [IPv6:2001:6b0:17:fc08:44fc:d1e1:765d:527c]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id F2AFA4001D; Wed, 5 Jun 2019 09:50:06 +0200 (CEST) From: Peter Eriksson Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: RFC: patching fsshare in ZFS Date: Wed, 5 Jun 2019 09:50:06 +0200 References: To: Rick Macklem , Alexander Motin , "mmacy@ixsystems.com" , "ryan@ixsystems.com" , "pjd@freebsd.org" , "freebsd-fs@freebsd.org" In-Reply-To: Message-Id: X-Mailer: Apple Mail (2.3445.104.11) X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jun 2019 07:50:12 -0000 Hi all! =20 I=E2=80=99ve been experimenting a little with adding support for a = simple BerkelyDB-based =E2=80=9Cexports=E2=80=9D database to mountd in = order to speed things up for the ZFS share code. The changes to mountd = are fairly simple, and the corresponding stuff was pretty simple to add = to the ZFS code too last I tried it. Speeds things up quite a bit - no = need to do linear searches through the /etc/zfs/exports file and no need = to rewrite the file for changes either=E2=80=A6 With N*10000 NFS shared = filesystems like we do this can be pretty nice to have.=20 My current DB-based code supports multiple exports entries per filsystem = by separating the =E2=80=9Crows=E2=80=9D in the database entry for a = filesystem with NUL characters. Let me know if there is some interest in this for others than just me. - Peter > 2 - Peter has some NFS servers with 20000-72000+ file systems being = exported. > The current code in fsshare.c copies the exports file and then = appends the new > entry for a file system and then replaces the exports file with = the new one. > I think this file copying happens for every file system, which = seems like a lot > of overhead. (I forget what Peter said w.r.t. how long this = takes, but I think it > was quite a while.) > My guess is that Pawel did this so that the update to the file = would happen > atomically. > It seems to me that if mountd held a read lock on the export = file while reading it > and fsshare() held a write lock on the file while appending the = new entry, that > the file copying could be avoided? > - The main problem I see w.r.t. doing this is that an old mountd = binary that doesn't > read lock the file could be broken by the fsshare() change. > --> One way to avoid this would be to have the new mountd = write more than > just the pid in the MOUNTD_PID file so that fsshare() = could tell if mountd was > going to be read locking the file. > OR > Just don't MFC the change and assume that the new mountd = would be > released when the new fsshare() is (in FreeBSD13?). >=20 > Anyhow, I can tweak mountd.c and fsshare.c, but that's as far as I can = take it. >=20 > Others would need to do testing and whatever it takes to get a change = to fsshare.c > into the ZFS sources. >=20 > So, what do you think about this? rick >=20 >=20