Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 Jul 2015 10:20:21 -0300
From:      Renato Botelho <garga@FreeBSD.org>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: rename() + fsync() implementation
Message-ID:  <4B5ABFF0-AAE5-4E74-BAD4-E7B301DC3F7D@FreeBSD.org>
In-Reply-To: <20150704133034.GA3102@dft-labs.eu>
References:  <2770DA5F-D7F2-46D9-9158-10C86115F8AC@FreeBSD.org> <20150704133034.GA3102@dft-labs.eu>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Jul 4, 2015, at 10:30, Mateusz Guzik <mjguzik@gmail.com> wrote:
>=20
> On Fri, Jul 03, 2015 at 01:59:43PM -0300, Renato Botelho wrote:
>> Some time ago we found a bug on pfSense and after investigating we =
figured out the root cause was passwd / group related tools was not =
checking if files were safe in disk, and system ended up with a 0 length =
passwd/group db after a power cycle. There are more context at revision =
[1].
>>=20
>> After that, bapt@ suggest to do similar fix for cap_mkdb and =
services_mkdb, that also can be found at another review [2].
>>=20
>=20
> This definitely needs an explanation what is going on here.
>=20
> When the problem from [1] is encountered, which file appears to be
> zeroed?
>=20
> If the new one, should not O_SYNC you added in several places take =
care
> of that? (btw, it would be nicer if that was fsynced before close
> instead)

According comment #11 of pfSense bug #4523 [3]:

----------
With SU, with or without J, you end up with 0 byte master.passwd, =
passwd, group, pwd.db, spwd.db. Or some subset of those. Without SU, you =
end up with master.passwd and/or group corrupted, containing parts of =
other files in /etc/.

It's replicable on stock FreeBSD 9.0 through 11-CURRENT by running the =
following:=20

#!/bin/sh

/usr/sbin/pw userdel -n 'admin'
/usr/sbin/pw groupadd all -g 1998
/usr/sbin/pw groupadd admins -g 1999
/usr/sbin/pw groupmod all -g 1998 -M ''
echo =
\$6\$O/T6GYkcgYvOBTGm\$KvOh3zhFKiA6HMEPktuImAI8/cetwEFsgj7obXdeTcQvn6mhs50=
HgkWt6nfnxNhTIb2w4Je6dqdKtARavxThP1 | /usr/sbin/pw usermod -q -n root -s =
/bin/tcsh -H 0
echo =
\$6\$O/T6GYkcgYvOBTGm\$KvOh3zhFKiA6HMEPktuImAI8/cetwEFsgj7obXdeTcQvn6mhs50=
HgkWt6nfnxNhTIb2w4Je6dqdKtARavxThP1 | /usr/sbin/pw useradd -m -k =
/etc/skel -o -q -u 0 -n admin -g wheel -s /bin/sh -d /root -c 'System =
Administrator' -H 0
/usr/sbin/pw unlock admin -q
/usr/sbin/pw groupmod all -g 1998 -M '0'
/usr/sbin/pw groupmod admins -g 1999 -M '0'

then power cycling the system. If using SU, you'll end up with 0 byte =
files. Without SU, you'll have corrupted files containing parts of some =
other file(s) in /etc.
----------

> If the old one, this still has the window (although miniscule compared
> to 5 mins) since whatever crash/failure you experienced can happen
> before you fsync. It may be ok enough in practice, but then the =
question
> is whether O_SYNC on the new file was of any significance. Or to state
> differently, do callers have to fsync/O_SYNC the file they are passing
> as an argument?
>=20
> Of course it may be either one can appear truncated.

The idea of using O_SYNC on temporary file, and call fsync() on =
directory after rename came from Kirk McKusick:

----------
Applications that are properly written take these steps:

	1. write new file to a temporary name.
	2. fsync newly written file.
	3. rename temporary name to file to be updated.
	4. For faster results, fsync directory that has updated file to =
commit the new file.
----------

I decided to change it on libutil because the functions are specific for =
passwd and group operations, otherwise it would need to be changed on =
all applications that calls it (usr.bin/chpass, usr.sbin/pw, =
usr.sbin/rpc.yppasswdd).

> Assuming the whole approach makes sense here are some comments about =
the
> code itself:
>=20
>>        /*
>>         * Make sure file is safe on disk. To improve performance we =
will call
>>         * fsync() to the directory where file lies
>>         */
>>        if (rename(tname, dbname) =3D=3D -1 ||
>>            (dbname_dir =3D dirname(dbname)) =3D=3D NULL ||
>=20
> dirname returns a pointer to an internal buffer, so it is not suitable
> for use in a library function (think: foo =3D dirname(...); =
this(...);)

What do you think about having dirname_r implemented? Using the same =
approach we have today for basename/basename_r.

> Since it can fail and does not depend on rename, it should have been
> done earlier.
>=20
> dbname_dir looks like a weird name. Something like dbdirname would be
> better.

OK

>>            (dbname_dir_fd =3D open(dbname_dir, O_RDONLY|O_DIRECTORY)) =
=3D=3D -1 ||
>=20
> dbname_dir_fd is definitely bad. This is not a name, this is a fd. So
> dbdirfd.

OK

>>            fsync(dbname_dir_fd) !=3D 0) {
>=20
> Why does this do '!=3D 0' instead of '=3D=3D -1=E2=80=99?

No special reason, I=E2=80=99m going to change it.

>>                if (dbname_dir_fd !=3D -1)
>>                        close(dbname_dir_fd);
>>                err(1, "Cannot rename `%s' to `%s'", tname, dbname);
>=20
> It could be renamed succeeded, so this msg should be modified to state
> what really failed. But as a library func it likely should return an
> error instead, indicating which part failed.

Agreed.

>>        }
>>=20
>>        if (dbname_dir_fd !=3D -1)
>>                close(dbname_dir_fd);
>>=20
>=20
> At this point dbname_dir_fd is guaranteed to be !=3D -1.

True, thanks for pointing this out.

>> The idea is to implement a =E2=80=9Csync rename=E2=80=9D function to =
do all these steps. I thought about it and IMO lib/libutil would be a =
good place to implement it. But since I=E2=80=99m starting to touch src =
now, I would like to hear more opinions about this.
>>=20
>>=20
>> [1] https://reviews.freebsd.org/D2978
>> [2] https://reviews.freebsd.org/D2982
>=20
> --=20
> Mateusz Guzik <mjguzik gmail.com>

[3] https://redmine.pfsense.org/issues/4523#note-11
--
Renato Botelho




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B5ABFF0-AAE5-4E74-BAD4-E7B301DC3F7D>