From owner-freebsd-hackers Thu Apr 29 17: 4:32 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from wrath.cs.utah.edu (wrath.cs.utah.edu [155.99.198.100]) by hub.freebsd.org (Postfix) with ESMTP id EB40514C11 for ; Thu, 29 Apr 1999 17:04:30 -0700 (PDT) (envelope-from danderse@cs.utah.edu) Received: from torrey.cs.utah.edu (torrey.cs.utah.edu [155.99.212.91]) by wrath.cs.utah.edu (8.8.8/8.8.8) with ESMTP id SAA11099; Thu, 29 Apr 1999 18:04:30 -0600 (MDT) Received: (from danderse@localhost) by torrey.cs.utah.edu (8.9.1/8.9.1) id SAA11461; Thu, 29 Apr 1999 18:04:29 -0600 (MDT) (envelope-from danderse@cs.utah.edu) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Thu, 29 Apr 1999 18:04:29 -0600 (MDT) From: "David G. Andersen" To: "David E. Cross" Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: ypserv In-Reply-To: David E. Cross's message of Mon, April 12 1999 <199904121852.OAA22126@cs.rpi.edu> References: <199904121852.OAA22126@cs.rpi.edu> X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <14120.61942.222606.324094@torrey.cs.utah.edu> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Lo and Behold, David E. Cross said: > Our ypserv processes have been dieing a great deal lately (luckily they > restart themselves, but not before all the clients rebind to another > machine). I have tracked the problem down to a stack corruption. Apparently > caused by a stack overflow (I am still working on it, don't get excited yet ;). Sorry for taking so long to reply to this, David. We've been pounding on ypserv lately, and have managed to get it to a fairly stable state on our servers, but unfortunately, there's one patch we haven't committed in because it breaks the semantics of the databases. The problem you're most likely seeing is the database concurrent access problem (bin/10971). The solution to this is modifying the Berkeley DB routines to use pread(), which has only been introduced into -current recently, and I'm not sure if the db routines have been modified accordingly. We patched our DB routines (ONLY for ypserv) to explicitly lock the database before doing their reads, but since it's a read lock, it requires *write* access to the database, which changes the semantics of how ypserv behaves. It's not a great solution, but it does the trick for us pretty well. There's another fix we sent in to stop hangs due to an incorrect use of the RPC library. You'll want to apply it. It's in bin/10970. If you fix that and get things stable enough, you'll then run into another probem which is fixed in -current and -stable (bin/11122) with a bad length into strncmp. Bill Paul committed this fix a few weeks ago, so if you've got an up to date system you'll be OK on that scope. If you'd like, I can ship you a functioning ypserv binary for 3.0 with all of our patches, statically linked against the locking DB routines. -Dave Andersen -- work: danderse@cs.utah.edu me: angio@pobox.com University of Utah http://www.angio.net/ Computer Science - Flux Research Group "What's footnote FIVE?" To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message