From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 22:49:04 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0EE961F7; Tue, 15 Jan 2013 22:49:04 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 79BE6EF6; Tue, 15 Jan 2013 22:49:03 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAJXb9VCDaFvO/2dsb2JhbABFhjq3YnOCHgEBBAEjBFIFFg4KAgINGQJZBogmBqYEgkCOc4EjjwKBEwOIYY0rkEmDE4IG X-IronPort-AV: E=Sophos;i="4.84,475,1355115600"; d="scan'208";a="9236221" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 15 Jan 2013 17:49:00 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 2F355B3F0D; Tue, 15 Jan 2013 17:49:00 -0500 (EST) Date: Tue, 15 Jan 2013 17:49:00 -0500 (EST) From: Rick Macklem To: Bruce Evans Message-ID: <1149390778.2023367.1358290140175.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20130115141019.H1444@besplex.bde.org> Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Rick Macklem , fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 22:49:04 -0000 Bruce Evans wrote: > On Mon, 14 Jan 2013, Rick Macklem wrote: > > > John Baldwin wrote: > >> The NFS client tries to infer when an application has passed NULL > >> to > >> utimes() > >> so that it can let the server set the timestamp rather than using a > >> client- > >> supplied timestamp. It does this by checking to see if the desired > >> timestamp's second matches the current second. However, this breaks > >> applications that are intentionally trying to set a specific > >> timestamp > >> within > >> the current second. In addition, utimes() sets a flag to indicate > >> if > >> NULL was > >> passed to utimes(). The patch below changes the NFS client to check > >> this flag > >> and only use the server-supplied time in that case: > > It is certainly an error to not check VA_UTIMES_NULL at all. I think > the flag (or the NULL pointer) cannot be passed to the server, so the > best we can do for the VA_UTIMES_NULL case is read the current time on > the client and pass it to the server. Upper layers have already read > the current time, but have passed us VA_UTIMES_NULL so that we can > tell > that the pointer was originally null so that we can do the different > permissions checks for this case. > > >> Index: fs/nfsclient/nfs_clport.c > >> =================================================================== > >> --- fs/nfsclient/nfs_clport.c (revision 225511) > >> +++ fs/nfsclient/nfs_clport.c (working copy) > >> @@ -762,7 +762,7 @@ > >> *tl = newnfs_false; > >> } > >> if (vap->va_atime.tv_sec != VNOVAL) { > >> - if (vap->va_atime.tv_sec != curtime.tv_sec) { > >> + if (!(vap->va_vaflags & VA_UTIMES_NULL)) { > >> NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED); > >> *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); > >> txdr_nfsv3time(&vap->va_atime, tl); > >> @@ -775,7 +775,7 @@ > >> *tl = txdr_unsigned(NFSV3SATTRTIME_DONTCHANGE); > >> ... > > Something mangled the patch so that it is hard to see what it does. It > just uses the flag instead of guessing. > > I can't see anything that does the different permissions check for > the VA_UTIMES_NULL case, and testing shows that this case is just > broken, > at least for an old version of the old nfs client -- the same > permissions > are required for all cases, but write permission is supposed to be > enough for the VA_UTIMES_NULL case (since write permission is > sufficient > for setting the mtime to the current time (plus epsilon) using > write(2) > and truncate(2). Setting the atime to the current time should require > no more and no less than read permission, since it can be done using > read(2), but utimes(NULL) requires write permission for that too). > I did a quick test on a -current client/server and it seems to work ok. The client uses SET_TIME_TO_SERVER and the server sets VA_UTIMES_NULL for this case. At least it works for a UFS exported volume. > > In the old days, a lot of NFS servers only stored times at a > > resolution of 1sec, which I think is why the code had the habit > > of comparing "seconds equal". > > I think this is not the reason for the check here. > > > If there is some app. out there > > that sets "current time" via utimes(2) with a curent time argument > > instead of a NULL argument would seem to be broken to me. > > (It is conceivable that some app. did this to avoid clock > > skew between the client and server, but I doubt it.) > > Apps have no alternative to using the NULL arg if they have write > permission > to the file but don't own it. > > Oops, on looking at the code I now think it _is_ possible to pass the > request to set the current time on the server, since in the > NFSV3SATTRTIME_TOSERVER case we just pass this case value and not > any time value to the server, so the server has no option but to use > its current time. It is not surprising that the permissions checks > for this don't work right. I thought that the client was responsible > for most permissions checks, but can't find many or the relevant one > here. The NFSV3SATTRTIME_TOSERVER code on the server sets > VA_UTIMES_NULL, so I would have thought that the permissions check on > the server does the right thing. > As noted above, it seems to work correctly for the new server in -current, at least for UFS exports. Normally a server will do permission checking for NFS RPCs. There is nothing stopping a client from doing a check and returning an error, but traditionally a server has not trusted a client to do so. (I'm not sure if adding a check in the client is what jhb@ was referring to in his reply to this?) > There are some large timestamping bugs nearby: > > - the old nfs server code for NFSV3SATTRTIME_TOSERVER uses > getnanotime() > to read the current time. This violates the system's policy set by > the vfs.timestamp precision in most cases, since using getnanotime() > is the worst supported policy and is not the defaul. > > The old nfs client uses the correct function to read the current > time, vfs_timestamp(), in nfs_create(), but this is the only use of > vfs_timestamp() in old nfs code. I think most cases use the server > time and thus use the correct function iff the leaf server file > system uses the correct function. > > - the new nfs server code for NFSV3SATTRTIME_TOSERVER macro-izes all > reads of the current time except 1 as NFSGETTIME(). This uses > getmicrotime(), so it violates the system's policy in all cases, > since using getmicrotime() is not a supported policy (using > microtime() is supported). The 1 exception is a hard-coded > getmicrotime() in fs/nfsclient/nfs_clport.c whose use is visible > in the above patch. This one really didn't matter, because only the > seconds part of curtime was used. It was just a micro-pessimization > and style bug. The (not quite) correct way to get the seconds part > is to use time_second, as is done in the old nfs client. > (This way is not quite correct because there are some races and > non-monotonicities reading the times. In the above check, > vap->va_atime.tv_sec might have been read by a more precise clock > than curtime.tv_sec. Then the check might give a false positive > or negative. But the check is only a heuristic, and is inherently > racy, so this doesn't rally matter. > With the above pathcm the check becomes a different pessimization and > style bug. The curtime variable becomes unused except for its > incorrect initialization. > In this case, after the patch is applied, curtime and getmicrotime() can just be deleted (as you noted, above). > New nfs code never uses the correct function vfs_timestamp(). This needs to be fixed. Until now, I would have had no idea what is the correct interface. (When I did the port, I just used a call that seemed to return what I wanted.;-) Having said that, after reading what you wrote below, it is not obvious to me what the correct fix is? (It seems to be a choice between microtime() and vfs_timestamp()?) > > Following the system pollcy for file timestamps causes some problems > for utimes(NULL) too. Old versions hard-coded microtime(). Current > versions use vfs_timestamp(). The latter is better, but tends to > give different results than times(non_NULL), since few or no > applications know anything about the system's policy. touch(1) > probably should know, but doesn't. So the simple "touch foo" gives > various results, depending: > - touch(1) starts with gettimeofday(). This gives microseconds > resolution and usually microseconds accuracy if its result is used. > - touch then tries utimes(non_NULL) with the current time that it > just read. This usually works, giving microseconds resolution, > etc. This is OK, but often different from the system policy. > - touch then tries utimes(NULL). If this works, then it follow the > system policy. > > Another problem is that not all file systems support nanoseconds > resolutions, so not all system policies or utimes() requests can > be honored. > > I would usually prefer the system's policy to be enforced as far as > possible. Thus if the system's policy is microseconds resolution, > then times with nanoseconds resolution should be rounded down to the > nearest microsecond. This case is most useful since utimes() cannot > preserve times with more than microseconds resolution. Utilities like > cp(1) blindly round the times given in nanoseconds by stat(2) to ones > that can be written by utimes(2), so this often happens in an > uncontrollable way anyway (POSIX is finally getting around to > specifying > permissible errors for unrepresentable resolutions). But sometimes I > want utimes() to preserve times as well as possible. > > Bruce