From owner-freebsd-stable@FreeBSD.ORG Thu Mar 9 00:57:24 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9231116A420 for ; Thu, 9 Mar 2006 00:57:24 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id D4BCD43D53 for ; Thu, 9 Mar 2006 00:57:23 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 8CAE61A4D80; Wed, 8 Mar 2006 16:57:23 -0800 (PST) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id C8EF9524AA; Wed, 8 Mar 2006 19:57:22 -0500 (EST) Date: Wed, 8 Mar 2006 19:57:22 -0500 From: Kris Kennaway To: Miguel Lopes Santos Ramos Message-ID: <20060309005722.GA55432@xor.obsecurity.org> References: <20060308224531.GA53611@xor.obsecurity.org> <200603090026.k290Qihj002701@compaq.anjos.strangled.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="pWyiEgJYm5f9v55/" Content-Disposition: inline In-Reply-To: <200603090026.k290Qihj002701@compaq.anjos.strangled.net> User-Agent: Mutt/1.4.2.1i Cc: kuriyama@imgsrc.co.jp, freebsd-stable@freebsd.org, kris@obsecurity.org Subject: Re: rpc.lockd brokenness (2) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2006 00:57:24 -0000 --pWyiEgJYm5f9v55/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 09, 2006 at 12:26:44AM +0000, Miguel Lopes Santos Ramos wrote: > > From: Kris Kennaway > > Subject: Re: rpc.lockd brokenness (2) > > > > This is intentional. It's how pidfile_*() tests whether the process > > is still running. The intention is that if someone tries to open the > > pidfile again while the first process is still running, the lock > > acquisition will fail and we'll know the other process is still alive, > > and therefore avoid starting a second instance. >=20 > No, no, you got me wrong. The pidfile is left locked after cron stopped > running (with /etc/rc.d/cron stop). This behaviour must be wrong. OK, I misunderstood. The rc.d script will signal cron to kill it, which should be closing the file descriptors and causing rpc.lockd to release the lock. Perhaps this part is broken. OK, I tested this with daemon -p, and it indeed seems to be broken: haessal# daemon -p pid_file sleep 100000 haessal# kill -KILL `cat pid_file` haessal# ps -p `cat pid_file` PID TT STAT TIME COMMAND haessal# lockf -t 0 pid_file echo Yay lockf: pid_file: already locked > > There is a (known) lockd bug here though, which you isolated: > > >=20 > So, this really is bin/80389? No, I don't think so. The missing ability to cancel locking requests (i.e. unkillable process while blocked on a lock) has never been implemented in FreeBSD's rpc.lockd (I'm not aware of a PR about it, so I filed my own earlier tonight), and the problem above might be a separate regression. > I am a bit disappointed. First, this problem didn't cause me trouble befo= re > I went to 6-STABLE, now I must either disable cron or disable locking (wh= ich > I can't). > And I'm still not completely convinced. That problem, if I understand cor= rectly, > existed before January... The pidfile_*() functions are new, before that the pidfile handling was done differently. > There are two things... > - cron.pid shouldn't be locked after cron terminated. (this interaction w= as > fully saved as http://mega.ist.utl.pt/~mlsr/nfs-nofile.bin) Actually the locking isn't traced here; I misunderstood how it works, and the lock transactions are done on another UDP port. You have to use rpcinfo to figure out which one it is, since it varies. Anyway, the above sequence reproduces it. > - cron shouldn't hang on startup just because the file is locked, since > pidfile_open opens it with O_NONBLOCK (unlike lockf). I haven't been able to reproduce this, e.g. lockf -t 0 does O_NONBLOCK locking and works correctly when the file is already locked. Perhaps it's another locked file (not the pidfile) that was also leaked in the same way, and is being opened without O_NONBLOCK. > - cron shouldn't hang in such a way that it is not killable... (and should > not also the open system call in lockf be interruptible?) This is the bug (really: missing feature) that I described in my previous mail. Kris --pWyiEgJYm5f9v55/ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (FreeBSD) iD8DBQFED31yWry0BWjoQKURAgULAJ9i4lMqVMtQXnglp0eVl+Md6FGnWgCgonFc Gpxre1m11a+weYT1QSWNc44= =80Xg -----END PGP SIGNATURE----- --pWyiEgJYm5f9v55/--