From owner-freebsd-arch@FreeBSD.ORG Thu May 8 11:19:01 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8D3DD37B401; Thu, 8 May 2003 11:19:01 -0700 (PDT) Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8876A43FF2; Thu, 8 May 2003 11:18:56 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.12.8/8.12.3) with ESMTP id h48IItTh026354; Thu, 8 May 2003 11:18:55 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200305081818.h48IItTh026354@beastie.mckusick.com> To: Bruce Evans In-Reply-To: Your message of "Thu, 08 May 2003 18:28:33 +1000." <20030508175350.S59316@gamplex.bde.org> Date: Thu, 08 May 2003 11:18:55 -0700 From: Kirk McKusick cc: arch@FreeBSD.org cc: Brian Buhrow cc: Jens Schweikhardt Subject: Re: Access times on executables (kern/25777) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 May 2003 18:19:01 -0000 Thanks for your very through analysis of the options! Since most filesystems are not mounted with -async, I am curious how the numbers change when you run them normally (ufs with soft updates and NFS client/server just normally) since that is what folks are likely to really do. Also, does option 1 actually result in access times being set on NFS, or are the times rejected because the binaries are not owned by the user? Kirk McKusick =-=-=-=-=-= Date: Thu, 8 May 2003 18:28:33 +1000 (EST) From: Bruce Evans To: Kirk McKusick cc: Jens Schweikhardt , arch@FreeBSD.org, Brian Buhrow Subject: Re: Access times on executables (kern/25777) In-Reply-To: <20030505205914.D8183@gamplex.bde.org> X-ASK-Info: Whitelist match I'm replying to an earlier article in this thread since I lost your last reply. On Sun, 4 May 2003, Kirk McKusick wrote: > Still, I think that VOP_READ is likely to be more efficient and work in > a higher percentage of the times (always on the local filesystem and > most of the time on NFS). Thus, I enclose my (new) proposed change below. > Index: sys/kern_exec.c > =================================================================== > RCS file: /usr/ncvs/src/sys/kern/kern_exec.c,v > retrieving revision 1.218 > diff -c -r1.218 kern_exec.c > *** kern_exec.c 1 Apr 2003 01:26:20 -0000 1.218 > --- kern_exec.c 5 May 2003 06:15:21 -0000 > *************** > *** 598,603 **** > --- 598,626 ---- > exec_setregs(td, imgp->entry_addr, > (u_long)(uintptr_t)stack_base, imgp->ps_strings); > > + /* > + * At this point, it counts as an access. > + * Ensure that atime gets updated if needed. > + */ > + if (!(textvp->v_mount->mnt_flag & MNT_NOATIME)) { > + struct uio auio; > + struct iovec aiov; > + char ch; > + > + auio.uio_iov = &aiov; > + auio.uio_iovcnt = 1; > + aiov.iov_base = &ch; > + aiov.iov_len = 1; > + auio.uio_resid = 1; > + auio.uio_offset = 0; > + auio.uio_segflg = UIO_SYSSPACE; > + auio.uio_rw = UIO_READ; > + auio.uio_td = td; > + vn_lock(textvp, LK_EXCLUSIVE | LK_RETRY, td); > + (void) VOP_READ(textvp, &auio, 0, p->p_ucred); > + VOP_UNLOCK(textvp, 0, td); > + } > + > done1: > /* > * Free any resources malloc'd earlier that we didn't use. I ran some benchmarks after fixing some bugs in the above (textvp is stale (maybe NULL the first time?), and as jhb pointed out, p->p_ucred should be td->td_ucred). I benchmarked the following methods: - method 1: my original method using VOP_SETATTR() updated - method 2: fixed version of above - method 3: version of the above that just calls vn_rdwr(). %%% Index: kern_exec.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_exec.c,v retrieving revision 1.208 diff -u -2 -r1.208 kern_exec.c --- kern_exec.c 13 Jan 2003 23:04:31 -0000 1.208 +++ kern_exec.c 8 May 2003 07:11:05 -0000 @@ -1,2 +1,4 @@ +int exec_atime_method = -1; + /* * Copyright (c) 1993, David Greenman @@ -611,4 +599,49 @@ (u_long)(uintptr_t)stack_base, imgp->ps_strings); + /* + * At this point, it counts as an access. + * Ensure that atime gets updated if needed. + */ + if (!(ndp->ni_vp->v_mount->mnt_flag & MNT_NOATIME)) { + if (exec_atime_method == 1) { + struct vattr vattr; + struct mount *mp; + + if (vn_start_write(ndp->ni_vp, &mp, V_WAIT | PCATCH) != 0) + goto out; + VATTR_NULL(&vattr); + vfs_timestamp(&vattr.va_atime); + vattr.va_vaflags |= VA_EXECVE_ATIME; + VOP_LEASE(ndp->ni_vp, td, td->td_ucred, LEASE_WRITE); + vn_lock(ndp->ni_vp, LK_EXCLUSIVE | LK_RETRY, td); + (void) VOP_SETATTR(ndp->ni_vp, &vattr, td->td_ucred, td); + VOP_UNLOCK(ndp->ni_vp, 0, td); + vn_finished_write(mp); +out: ; + } else if (exec_atime_method == 2) { + struct iovec aiov; + struct uio auio; + char ch; + + auio.uio_iov = &aiov; + auio.uio_iovcnt = 1; + aiov.iov_base = &ch; + aiov.iov_len = 1; + auio.uio_resid = 1; + auio.uio_offset = 0; + auio.uio_segflg = UIO_SYSSPACE; + auio.uio_rw = UIO_READ; + auio.uio_td = td; + vn_lock(ndp->ni_vp, LK_EXCLUSIVE | LK_RETRY, td); + (void) VOP_READ(ndp->ni_vp, &auio, 0, p->p_ucred); + VOP_UNLOCK(ndp->ni_vp, 0, td); + } else if (exec_atime_method == 3) { + char ch; + + (void) vn_rdwr(UIO_READ, ndp->ni_vp, &ch, 1, 0, UIO_SYSSPACE, + 0, td->td_ucred, td->td_ucred, NULL, td); + } + } + done1: /* %%% The VOP_READ() methods had much more overhead than I expected or like (about 25% for a tiny statically linked program (`main() {}'). Even the VOP_SETATTR() method was faster. Raw output: %%% ufs-static-noexec 4.32 real 0.04 user 4.13 sys 4.23 real 0.07 user 4.01 sys 4.30 real 0.06 user 4.10 sys ufs-static-static method -1 7.02 real 0.06 user 6.77 sys 7.10 real 0.07 user 6.83 sys 7.09 real 0.06 user 6.83 sys 7.03 real 0.19 user 6.65 sys 7.11 real 0.22 user 6.68 sys 7.00 real 0.04 user 6.77 sys 7.12 real 0.19 user 6.75 sys 7.07 real 0.22 user 6.66 sys 7.07 real 0.07 user 6.81 sys method 1 7.72 real 0.23 user 7.29 sys 7.76 real 0.25 user 7.31 sys 7.71 real 0.30 user 7.21 sys 7.69 real 0.29 user 7.21 sys 7.70 real 0.31 user 7.18 sys 7.71 real 0.19 user 7.32 sys method 2 7.83 real 0.07 user 7.56 sys 7.84 real 0.27 user 7.37 sys 7.82 real 0.28 user 7.34 sys method 3 7.90 real 0.26 user 7.43 sys 7.89 real 0.05 user 7.62 sys 7.92 real 0.05 user 7.66 sys 7.86 real 0.05 user 7.61 sys 7.88 real 0.18 user 7.49 sys 7.84 real 0.21 user 7.43 sys 7.88 real 0.31 user 7.36 sys 7.82 real 0.28 user 7.34 sys 7.86 real 0.24 user 7.42 sys nfs-static-noexec 4.13 real 0.05 user 3.95 sys 4.20 real 0.17 user 3.89 sys 4.25 real 0.25 user 3.87 sys nfs-static-static method -1 11.43 real 0.04 user 9.18 sys 11.41 real 0.21 user 8.96 sys 11.43 real 0.27 user 8.94 sys method 1 11.54 real 0.28 user 8.88 sys 11.64 real 0.19 user 9.08 sys 11.53 real 0.17 user 8.97 sys method 2 11.77 real 0.09 user 9.49 sys 11.78 real 0.25 user 9.31 sys 11.80 real 0.33 user 9.26 sys method 3 11.94 real 0.24 user 9.46 sys 11.86 real 0.32 user 9.35 sys 11.81 real 0.28 user 9.35 sys %%% *fs-static-noexec tests 10000 forks with no exec of a statically linked program on *fs. *fs-static-static tests 10000 forks with exec of statically linked programs on *fs. The system is an old Celeron (the speed doesn't matter much; only compare the relative times). All file systems were mounted -async. The exec part of the fork-exec takes an average of about 270 usec for the ufs case. Setting the atime adds about 60 usec to this for method 1. Method 2 adds another 10 usec. Method 3 adds another 4 usec to method 2. Plain read() takes much less than 60 usec on this machine: $ dd if=/kernel of=/dev/null bs=1 count=1000000 1000000+0 records in 1000000+0 records out 1000000 bytes transferred in 17.703695 secs (56485 bytes/sec) so most of the overheads for exec must be indirect. I think they are just close() forcing update marks to timestamps. This happens after every exec, but not after every read in the dd benchmark. I think method 1 turns out to be fastest since it forces the update marks to timestamps directly. Bruce