From owner-freebsd-arch@FreeBSD.ORG Mon Jun 26 09:31:20 2006 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 782D816A401 for ; Mon, 26 Jun 2006 09:31:20 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id A2EA243D83 for ; Mon, 26 Jun 2006 09:31:17 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout2.pacific.net.au (Postfix) with ESMTP id D3EF470EC9; Mon, 26 Jun 2006 19:31:15 +1000 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k5Q9VAP4031419; Mon, 26 Jun 2006 19:31:11 +1000 Date: Mon, 26 Jun 2006 19:31:10 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Andrew Reilly In-Reply-To: <20060625213605.GA93766@duncan.reilly.home> Message-ID: <20060626181131.G67741@delplex.bde.org> References: <20060625011746.GC81052@duncan.reilly.home> <20060625013110.GA62237@troutmask.apl.washington.edu> <20060625020154.GA89358@gurney.reilly.home> <20060626002658.A65226@delplex.bde.org> <20060625213605.GA93766@duncan.reilly.home> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Steve Kargl , freebsd-arch@FreeBSD.org Subject: Re: What's up with our stdout? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2006 09:31:20 -0000 On Mon, 26 Jun 2006, Andrew Reilly wrote: > On Mon, Jun 26, 2006 at 01:10:38AM +1000, Bruce Evans wrote: >> This doesn't seem to have anything to do with stdout. F_SETLKW just >> seems to be broken on all regular files (and thus is unsupported for >> all file types). The above works under the modified version of >> FreeBSD-5.2 that I use, but it fails with the documented errno EOPNOTSUPP >> under at least FreeBSD-6.0-STABLE. Replacing STDOUT_FILENO by fd = >> open("foo", O_RDWR) gives the same failure. Replacing FSETLKW by >> FSETLK or F_GETLK gives the same failure. > > Thanks for the clarification. > > Don't all of the databases rely on fcntl locks? How can this be > broken? The problem seems to be that the file system really doesn't support locking. On freefall, both fcntl(2) locking and flock(2) fail in my home directory but work in /tmp. My home directory on freefall is nfs-mounted without nolockd, and also without rpc.lockd or rpc.statd. nfs without the rpc daemons really doesn't support remote locking, so it is correct for it to fail, but rpc.lockd is buggy so it is often not used. On my own machines I normally avoid nfs-locking using nolockd (this gives locking that doesn't work (remotely) but claims to work). On the FreeBSD cluster nfs-locking is apparently normally avoided by nfs-mounting without nolockd and not starting the rpc daemons (this gives locking that doesn't work and doesn't claim to work). Configuring of locking for nfs is confusing and poorly documented. Neither rpc.lockd nor rpc.statd gets started automatically when a file system is nfs-mounted without nolockd. This wouldn't be easy to automate, since the daemons must be started on both the clients and servers. mount_nfs(8) doesn't say clearly which daemons must be started where. rc.conf(5) says wrongly that rpc_lock_lockd and rpc_statd_enable only apply to servers. Starting them both on clients and servers seems to be needed. With a filesystem nfs-remounted without nolockd: there seem to be ordering or timing requirements for starting them -- starting them manually sometimes gave a useful error message for flock() attempts when not all were started, but sometimes starting them all didn't stop flock() from failing and other times gave a hung flock(). Killing and restarting rpc.lockd on the client (while leaving the other daemons running) usually worked to unhang flock() and make it work on the next try. A modified version of the NetBSD code to test both flock() and fcntl() locking: %%% #include #include #include struct flock stdout_lock; main() { if (flock(STDOUT_FILENO, LOCK_EX) == -1) err(EXIT_FAILURE, "flock(...LOCK_EX): stdout"); warnx("flock(...LOCK_EX): succeeded"); if (flock(STDOUT_FILENO, LOCK_UN) == -1) err(EXIT_FAILURE, "flock(...LOCK_UN): stdout"); warnx("flock(...LOCK_EX): succeeded"); stdout_lock.l_len = 0; stdout_lock.l_start = 0; stdout_lock.l_type = F_WRLCK; stdout_lock.l_whence = SEEK_SET; if (fcntl(STDOUT_FILENO, F_SETLKW, &stdout_lock) == -1) err(EXIT_FAILURE, "fcntl(...F_SETLKW): stdout"); warnx("fcntl(...F_SETLKW): succeeded"); return (0); } %%% The Minix regression tests showed too many other regressions. One was that after mkdir()/open()/rmdir() of a directory, fstat() on the open unlinked directory gave a wrong link count of 2. Another was that after creation of a directory with LINK_MAX links, it was possible to mkdir() another subdir in the directory, giving 1 more link than the maxiumum possible: drwxrwxrwx 32768 bde bde 1048576 Jun 25 16:42 DIR_28/foo/ This is a more serious bug, since ffs uses the test (i_nlink <= 0) in a couple of places, and since i_nlink_t is int16_t, 32768 for a link count is unrepresentable (it overflows to -32768). This seems to be caused by either a race in soft updates or using i_nlink where i_effnlink should be used. These bugs show up on an nfs-mounted directory machines in the FreeBSD cluster. I think nfs doesn't affect this, and the underlying file system is ffs2 with soft updates. Normally I don't want to see bugs like this, and I use ffs1 without soft updates on my own machines to avoid seeing new ones. Bruce