From owner-freebsd-fs@freebsd.org Fri Nov 8 07:03:56 2019 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7AB0C1A9319 for ; Fri, 8 Nov 2019 07:03:56 +0000 (UTC) (envelope-from pen@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 478WTR1WFBz3yGs for ; Fri, 8 Nov 2019 07:03:54 +0000 (UTC) (envelope-from pen@lysator.liu.se) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id CA4C64001D; Fri, 8 Nov 2019 08:03:51 +0100 (CET) Received: from [IPv6:2001:6b0:17:f002:dae:62ce:59cf:9d12] (unknown [IPv6:2001:6b0:17:f002:dae:62ce:59cf:9d12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id A332E4001C; Fri, 8 Nov 2019 08:03:51 +0100 (CET) From: Peter Eriksson Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\)) Subject: Re: ZFS snapdir readability (Crosspost) Date: Fri, 8 Nov 2019 08:03:52 +0100 References: <65AE896D-A32E-451A-B9D0-EC40D438BB03@gmail.com> To: Chris Watson , Jan Behrens , =?utf-8?Q?Karli_Sj=C3=B6berg_via_freebsd-fs?= In-Reply-To: <65AE896D-A32E-451A-B9D0-EC40D438BB03@gmail.com> Message-Id: X-Mailer: Apple Mail (2.3601.0.10) X-Virus-Scanned: ClamAV using ClamSMTP X-Rspamd-Queue-Id: 478WTR1WFBz3yGs X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=liu.se; spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates 130.236.254.3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se X-Spamd-Result: default: False [-4.10 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[3.254.236.130.list.dnswl.org : 127.0.11.2]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none]; IP_SCORE(-1.60)[ipnet: 130.236.0.0/16(-4.43), asn: 2843(-3.54), country: SE(-0.03)]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE]; MID_RHS_MATCH_FROM(0.00)[] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Nov 2019 07:03:56 -0000 Yes, and no. We use =E2=80=98refquotas=E2=80=99 on the user filesystems = to prevent a user from using up all the free space in the zpool.=20 However, what happens is that ZFS will decrease the =E2=80=9Ctransaction = size=E2=80=9D of writes when a filesystem is nearly at quota in order to = prevent a transaction to write more than the assigned quota (since it = might write more due to compression/snapshots etc (or whatever)). Now = this is (apparently) by design and it works.=20 The effect is that a user trying to write to a nearly full (at quota) = filesystem will =E2=80=9Csee=E2=80=9D an extremely slow fileserver (from = MB/s to KB/s (or B/s if _really_ full) which makes for interesting bug = reports and problem diagnosing (since other users at the same time will = see no such slow-downs). _However_ - other effects are that creating snapshots, or destroying = snapshots _also_ can take longer than usual. Or if you delete a lot of = snapshots at the same time as a user has deleted a lot of files and you = then reboot the server - then when it starts up again and attempts to = mount all filesystems you will notice that it might take =E2=80=9Cforever=E2= =80=9D and it might look like =E2=80=9Czfs mount -a=E2=80=9D is = =E2=80=9Cstuck=E2=80=9D. Luckily for us when this happened we had = previously modified the system startup scripts so all zfs mounting is = done in the background and in parallell - so we got a login prompt and = could login and run some commands (normally all filesystems are mounted = first before allowing system logins=E2=80=A6.) So we logged in and noticed that it was doing a lot of writes (zpool = iostat) and was =E2=80=9Cstuck=E2=80=9D at attempting to mount a single = specific filesystem. A filesystem that happened to have 0 bytes free = refquota. When I added some 10G more =E2=80=98refquota=E2=80=99 to that = filesystem things speeded up a lot :-). (First time this happened we let the server finish mounting by itself - = which took about 2.5 days=E2=80=A6) - Peter > On 8 Nov 2019, at 01:31, Chris Watson wrote: >=20 > Peter, on your last point about 100% utilization, don=E2=80=99t you = use quotas/user quotas to prevent that? >=20 > Chris >=20 > Sent from my iPhone >=20 >> On Nov 7, 2019, at 4:06 PM, Peter Eriksson = wrote: >>=20 >> =EF=BB=BFThe =E2=80=9Ceasy=E2=80=9D solution is to give each user (or = group / project) their own ZFS filesystem. Then the =E2=80=9C.zfs=E2=80=9D= directory would be inside the users own $HOME and you can set $=08HOME = to 0700=E2=80=A6. >>=20 >> That is what we are doing. Granted it generates a =E2=80=9Cfew=E2=80=9D= filesystems (like some 20000 per server (we have around 120k users), = and then add hourly snapshots to each as =E2=80=9Cicing=E2=80=9D on the = cake). Mounting all those takes a bit of time - but luckily with the = latest FreeBSD release things are much faster these days :-) >>=20 >> There are some other issues with that - like 100% full filesystems = causing severe system slowdown during writes=E2=80=A6 So you really = wanna have some monitoring system that warns for that. >>=20 >> - Peter >>=20 >>=20 >>>=20 >>> I recently noticed that all ZFS filesystems in FreeBSD allow access = to >>> the .zfs directory (snapdir) for all users of the system. It is >>> possible to hide that directory using the snapdir option: >>=20 >>=20 >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"