Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Oct 2003 01:44:58 -0400 (EDT)
From:      Robert Watson <rwatson@freebsd.org>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        cvs-all@freebsd.org
Subject:   Re: cvs commit: src/sys/kern kern_sig.c
Message-ID:  <Pine.NEB.3.96L.1031026014144.74063M-100000@fledge.watson.org>
In-Reply-To: <20031026145418.G16944@gamplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sun, 26 Oct 2003, Bruce Evans wrote:

> On Sat, 25 Oct 2003, Alfred Perlstein top posted:
> 
> > This is bad, it's time to add a flag to the vnode to do this
> > properly instead of relying upon the underlying FS to implement
> > the locking.
> 
> How would a mere flag help fix the real complexities for nfs?

Well, the point of the locking originally introduced in the core dump code
was presumably to help avoid a common case scenario: corrupted core dumps
due to parallel dumping.  Given the difficulty in addressing the problem
in any thorough way (distributed locking, etc), I think I'd almost rather
go for the simplest possible mechanism.  Setting a vnode flag during a
coredump to the vnode, and then causing any other core dump attempts to be
aborted while its set, presents a pretty clean solution to the single-host
case.

> > * Robert Watson <rwatson@FreeBSD.org> [031025 09:14] wrote:
> > > rwatson     2003/10/25 09:14:09 PDT
> > >
> > >   FreeBSD src repository
> > >
> > >   Modified files:
> > >     sys/kern             kern_sig.c
> > >   Log:
> > >   When generate a core dump, use advisory locking in an advisory way:
> > >   if we do acquire an advisory lock, great!  We'll release it later.
> > >   However, if we fail to acquire a lock, we perform the coredump
> > >   anyway.
> 
> Er, advisory locking means that honoring the lock is not enforced, not
> that it is good to not honor it.

The comment was a bit flippant and inaccurate: if it's possible for the
locking request to succeed, we wait for it to succeed.  However, if we get
a fatal error (rather than blocking), then we plow on ahead.  By "advisory
locking", I mean that we're using the advisory locking facility.  By
"advisory way", I mean "if it works, use it, and if it's not available,
don't".

> > >   This problem became particularly visible with NFS after
> > >   the introduction of rpc.lockd: if the lock manager isn't running,
> > >   then locking calls will fail, aborting the core dump (resulting in
> > >   a zero-byte dump file).
> > >
> > >   Reported by:    Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>
> 
> There is only a problem if the lock manager is supposed to be running
> but is not.  That is a configuration error, or perhaps a transient
> error, so it should not be "fixed" by ignoring the failure.  If ignoring
> nfs locks is what is wanted in all cases, then it should be configured
> by mounting the file system with -L (= nolockd).  Maybe the lock request
> should hang for transient failures. 
> 
> Support for correct configuration of this is still mostly nonexistent in
> /etc/defaults/rc.conf and rc.conf(5).  The default for nfs mounts is
> lockd, but the default for rpc_lockd_enable is "NO".  Setting
> rpc_lockd_enable to "YES" is not sufficient to configure this.  The
> setting of at least rc_statd_enable must also be changed. 
> 
> This stuff is misconfigured on all of the freebsd machines that I
> checked.  Some run 4.9, so nfs locking is not available.  beast and
> builder demonstrate the bug by giving empty core dumps.  bento avoids
> the bug by dumping cores in a non-nfs directory. 

Agreed.  The current condition of NFS locking is pretty pessimal: we still
have substantial bugs in the implementation of the lock manager, and
configuring locking correctly is difficult.  The default configuration is
particularly poor.  We should address most of these.  :-)

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1031026014144.74063M-100000>