Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Jun 2019 21:44:01 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.org>
Cc:        "kib@freebsd.org" <kib@FreeBSD.org>, Alan Somers <asomers@freebsd.org>, Brooks Davis <brooks@freebsd.org>
Subject:   RFC: should the copy_file_range() syscall be atomic?
Message-ID:  <YQXPR01MB3128EA1CB774995DD5C774FCDDEF0@YQXPR01MB3128.CANPRD01.PROD.OUTLOOK.COM>

next in thread | raw e-mail | index | archive | help
When I first wrote the copy_file_range() syscall, the updating of the file =
was atomic,
because I locked both vnodes while doing it.
kib@ quickly pointed out that this could introduce a LOR/deadlock because t=
wo
userland threads could concurrently do the syscall with the file arguments =
reversed.

It turns out that locking two VREG vnodes concurrently to do this isn't eas=
y and
would require the implementation of non-blocking versions of:
  vn_rangelock_rlock() and vn_rangelock_wlock()
  - I am not sure how difficult doing this is, but I'll admit I'd rather no=
t do this.

Also, having the vnodes locked precludes use of VOP_IOCTL(..FIOSEEKDATA/
FIOSEEKHOLE..) calls to find holes in the byte range of the file being copi=
ed from.

Without the vnodes locked, it is possible for other threads to write to eit=
her of the
files concurrently with the copy_file_range(), resulting in an indeterminat=
e results.
(cp(1) has never guaranteed atomic copying of files, so is it needed in thi=
s syscall?)
In summary, doing the syscall non-atomically has the advantages of:
- vn_rdwr() takes care of the vnode locking issues, so any changes w.r.t. l=
ocking
  wouldn't require changes to this syscall's code.
- VOP_IOCTL() could be used to find holes.
- No new rangelock primitives need to be added to the system.
- If there were some combination of file system/storage unit where copying
  non-overlapping byte ranges concurrently could result in better performan=
ce,
  then that could be done. (An atomic syscall would serialize them.)

The advantage of an atomic syscall would be consistency with write(2) and r=
ead(2)
behaviour.

The following comments are copied from phabricator:
kib@ - So you relock range for each chunk ? This defeats the purpose of the=
 range locking. Should copy_file_range() be atomic WRT other reads and writ=
es ?

asomers@ - That has an unfortunate side effect: copy_file_range is no longe=
r atomic if you drop the vnode locks and range locks in the middle. It woul=
d be possible for two copy_file_range operations to proceed concurrently, l=
eaving the file in a state where each of the operations was partially succe=
ssful. A better solution would be to add rangelock_trywlock and rangelock_t=
ryrlock. I don't think it would be very hard (testing them, however, could =
be).

I don't see anything in the Linux man page w.r.t. atomicity, so I am now as=
king what
others think?
(I'll admit I'm biased towards non-atomic, since I have already coded it an=
d can use
 the VOP_IOCTL() calls to find the holes in the input file, but...)

rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR01MB3128EA1CB774995DD5C774FCDDEF0>