Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Apr 2015 08:04:31 -0500
From:      Guy Helmer <guy.helmer@gmail.com>
To:        Jilles Tjoelker <jilles@stack.nl>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: lockf() vs. flock() -- lockf() not locking?
Message-ID:  <8F0FB344-3858-462F-A67D-97805379514C@gmail.com>
In-Reply-To: <20150407221525.GA99106@stack.nl>
References:  <3950D855-0F4E-49E0-A388-4C7ED102B68B@gmail.com> <20150407221525.GA99106@stack.nl>

next in thread | previous in thread | raw e-mail | index | archive | help

> On Apr 7, 2015, at 5:15 PM, Jilles Tjoelker <jilles@stack.nl> wrote:
>=20
> On Mon, Apr 06, 2015 at 04:18:09PM -0500, Guy Helmer wrote:
>> Recently an application I use switched from using flock() for =
advisory
>> file locking to lockf() in the code that protects against concurrent
>> writes to a file that is being shared and updated by multiple
>> processes (not threads in a single process). The code seems reliable =
=E2=80=94
>> a lock manager class opens the file & obtains the lock, then the
>> read/update method opens the file using a separate file descriptor &
>> reads/writes the file, flushes & closes the second file descriptor,
>> and then destroys the lock manager object which unlocks the file &
>> closes the first file descriptor.
>=20
>> Surprisingly this simple change seems to have made the code =
unreliable
>> by allowing concurrent writers to the file and corrupting its
>> contents:
>=20
>> -    if (flock(fd, LOCK_EX) !=3D 0)
>> +    if (lockf(fd, F_LOCK, 0) !=3D 0)
>>         throw std::runtime_error("Failed to get a lock of " + =
filename);
>=20
>> . . .
>>     if (fd !=3D -1) {
>> -        flock(fd, LOCK_EX);
>> +        lockf(fd, F_ULOCK, 0);
>>         close(fd);
>>         fd =3D -1;
>>     }
>=20
>> =46rom my reading of the lockf(3) man page and reviewing the
>> implementation in lib/libc/gen/lockf.c, and corresponding code in
>> sys/kern/kern_descrip.c, it appears the lockf() call should be
>> successfully obtaining an advisory lock over the whole file like a
>> successful flock() did. However, I have a stress test that quickly
>> corrupts the target file using the lockf() implementation, and the
>> test fails to cause corruption using the flock() implementation. =
I=E2=80=99ve
>> instrumented the code, and it's clear that multiple processes are
>> simultaneously in the block of code after the =E2=80=9Clockf(fd, =
F_LOCK, 0)=E2=80=9D
>> line.
>=20
>> Am I missing something obvious? Any ideas?
>=20
> With lockf/fcntl locks, the close of the second file descriptor =
actually
> already unlocks the file. If there is another close and open in there,
> it would explain your problem. Both the lockf(3) and the fcntl(2) man
> pages mention these strange semantics, but only fcntl(2) clearly warns
> about them. With flock locks, opening the file another time will not
> cause problems.
>=20
> The second thing that will not work with lockf/fcntl locks is having a
> child process inherit them.
>=20
> Changing flock() to lockf() seems like a bad idea, particularly in a
> reusable "lock manager" class, since it is then harder to see what
> operations must be avoided to avoid losing the lock.
>=20
> There is a proposal in the Austin Group for the next version of POSIX =
to
> add a form of file lock that allows both range locking and proper
> (flock-style) semantics.

Thank you! I had forgotten the text in the fcntl(2) page about =
fcntl/lockf semantics (haven=E2=80=99t read that for a while). I have =
verified the method in question uses the lock manager to lock the file, =
opens the file to read the contents, closes the file [thus loosing the =
lockf lock], does some work, and then opens & writes the file to save =
the updates.

Regards,
Guy




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8F0FB344-3858-462F-A67D-97805379514C>