Date: Mon, 6 Jul 2015 10:20:21 -0300 From: Renato Botelho <garga@FreeBSD.org> To: Mateusz Guzik <mjguzik@gmail.com> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: rename() + fsync() implementation Message-ID: <4B5ABFF0-AAE5-4E74-BAD4-E7B301DC3F7D@FreeBSD.org> In-Reply-To: <20150704133034.GA3102@dft-labs.eu> References: <2770DA5F-D7F2-46D9-9158-10C86115F8AC@FreeBSD.org> <20150704133034.GA3102@dft-labs.eu>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Jul 4, 2015, at 10:30, Mateusz Guzik <mjguzik@gmail.com> wrote: >=20 > On Fri, Jul 03, 2015 at 01:59:43PM -0300, Renato Botelho wrote: >> Some time ago we found a bug on pfSense and after investigating we = figured out the root cause was passwd / group related tools was not = checking if files were safe in disk, and system ended up with a 0 length = passwd/group db after a power cycle. There are more context at revision = [1]. >>=20 >> After that, bapt@ suggest to do similar fix for cap_mkdb and = services_mkdb, that also can be found at another review [2]. >>=20 >=20 > This definitely needs an explanation what is going on here. >=20 > When the problem from [1] is encountered, which file appears to be > zeroed? >=20 > If the new one, should not O_SYNC you added in several places take = care > of that? (btw, it would be nicer if that was fsynced before close > instead) According comment #11 of pfSense bug #4523 [3]: ---------- With SU, with or without J, you end up with 0 byte master.passwd, = passwd, group, pwd.db, spwd.db. Or some subset of those. Without SU, you = end up with master.passwd and/or group corrupted, containing parts of = other files in /etc/. It's replicable on stock FreeBSD 9.0 through 11-CURRENT by running the = following:=20 #!/bin/sh /usr/sbin/pw userdel -n 'admin' /usr/sbin/pw groupadd all -g 1998 /usr/sbin/pw groupadd admins -g 1999 /usr/sbin/pw groupmod all -g 1998 -M '' echo = \$6\$O/T6GYkcgYvOBTGm\$KvOh3zhFKiA6HMEPktuImAI8/cetwEFsgj7obXdeTcQvn6mhs50= HgkWt6nfnxNhTIb2w4Je6dqdKtARavxThP1 | /usr/sbin/pw usermod -q -n root -s = /bin/tcsh -H 0 echo = \$6\$O/T6GYkcgYvOBTGm\$KvOh3zhFKiA6HMEPktuImAI8/cetwEFsgj7obXdeTcQvn6mhs50= HgkWt6nfnxNhTIb2w4Je6dqdKtARavxThP1 | /usr/sbin/pw useradd -m -k = /etc/skel -o -q -u 0 -n admin -g wheel -s /bin/sh -d /root -c 'System = Administrator' -H 0 /usr/sbin/pw unlock admin -q /usr/sbin/pw groupmod all -g 1998 -M '0' /usr/sbin/pw groupmod admins -g 1999 -M '0' then power cycling the system. If using SU, you'll end up with 0 byte = files. Without SU, you'll have corrupted files containing parts of some = other file(s) in /etc. ---------- > If the old one, this still has the window (although miniscule compared > to 5 mins) since whatever crash/failure you experienced can happen > before you fsync. It may be ok enough in practice, but then the = question > is whether O_SYNC on the new file was of any significance. Or to state > differently, do callers have to fsync/O_SYNC the file they are passing > as an argument? >=20 > Of course it may be either one can appear truncated. The idea of using O_SYNC on temporary file, and call fsync() on = directory after rename came from Kirk McKusick: ---------- Applications that are properly written take these steps: 1. write new file to a temporary name. 2. fsync newly written file. 3. rename temporary name to file to be updated. 4. For faster results, fsync directory that has updated file to = commit the new file. ---------- I decided to change it on libutil because the functions are specific for = passwd and group operations, otherwise it would need to be changed on = all applications that calls it (usr.bin/chpass, usr.sbin/pw, = usr.sbin/rpc.yppasswdd). > Assuming the whole approach makes sense here are some comments about = the > code itself: >=20 >> /* >> * Make sure file is safe on disk. To improve performance we = will call >> * fsync() to the directory where file lies >> */ >> if (rename(tname, dbname) =3D=3D -1 || >> (dbname_dir =3D dirname(dbname)) =3D=3D NULL || >=20 > dirname returns a pointer to an internal buffer, so it is not suitable > for use in a library function (think: foo =3D dirname(...); = this(...);) What do you think about having dirname_r implemented? Using the same = approach we have today for basename/basename_r. > Since it can fail and does not depend on rename, it should have been > done earlier. >=20 > dbname_dir looks like a weird name. Something like dbdirname would be > better. OK >> (dbname_dir_fd =3D open(dbname_dir, O_RDONLY|O_DIRECTORY)) = =3D=3D -1 || >=20 > dbname_dir_fd is definitely bad. This is not a name, this is a fd. So > dbdirfd. OK >> fsync(dbname_dir_fd) !=3D 0) { >=20 > Why does this do '!=3D 0' instead of '=3D=3D -1=E2=80=99? No special reason, I=E2=80=99m going to change it. >> if (dbname_dir_fd !=3D -1) >> close(dbname_dir_fd); >> err(1, "Cannot rename `%s' to `%s'", tname, dbname); >=20 > It could be renamed succeeded, so this msg should be modified to state > what really failed. But as a library func it likely should return an > error instead, indicating which part failed. Agreed. >> } >>=20 >> if (dbname_dir_fd !=3D -1) >> close(dbname_dir_fd); >>=20 >=20 > At this point dbname_dir_fd is guaranteed to be !=3D -1. True, thanks for pointing this out. >> The idea is to implement a =E2=80=9Csync rename=E2=80=9D function to = do all these steps. I thought about it and IMO lib/libutil would be a = good place to implement it. But since I=E2=80=99m starting to touch src = now, I would like to hear more opinions about this. >>=20 >>=20 >> [1] https://reviews.freebsd.org/D2978 >> [2] https://reviews.freebsd.org/D2982 >=20 > --=20 > Mateusz Guzik <mjguzik gmail.com> [3] https://redmine.pfsense.org/issues/4523#note-11 -- Renato Botelho
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B5ABFF0-AAE5-4E74-BAD4-E7B301DC3F7D>