From owner-freebsd-current@FreeBSD.ORG Wed Sep 23 12:52:03 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C9CDC1065694 for ; Wed, 23 Sep 2009 12:52:03 +0000 (UTC) (envelope-from matthias.andree@gmx.de) Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by mx1.freebsd.org (Postfix) with SMTP id 2D8CD8FC08 for ; Wed, 23 Sep 2009 12:52:02 +0000 (UTC) Received: (qmail invoked by alias); 23 Sep 2009 12:51:59 -0000 Received: from balu.cs.uni-paderborn.de (EHLO balu.cs.uni-paderborn.de) [131.234.21.37] by mail.gmx.net (mp011) with SMTP; 23 Sep 2009 14:51:59 +0200 X-Authenticated: #428038 X-Provags-ID: V01U2FsdGVkX19XIeQv2gFXUYEKHDk3eOX9LD95k3Kf757a35N6JB H4pbQyYVqZboYa Received: from localhost ([127.0.0.1]) by balu.cs.uni-paderborn.de with esmtp (Exim 4.69) (envelope-from ) id KQFD2O-00037C-3N for freebsd-current@freebsd.org; Wed, 23 Sep 2009 14:52:00 +0200 Message-ID: <4ABA19EF.8010503@gmx.de> Date: Wed, 23 Sep 2009 14:51:59 +0200 From: Matthias Andree User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666 MIME-Version: 1.0 To: freebsd-current@freebsd.org References: <4AB8BAA9.1060100@zedat.fu-berlin.de> <200909222248.16475.doconnor@gsoft.com.au> <4AB93614.2080106@locolomo.org> <200909231104.39234.doconnor@gsoft.com.au> <4AB9DDD8.2020700@zedat.fu-berlin.de> <200909230856.n8N8u2hp062395@banyan.cs.ait.ac.th> In-Reply-To: <200909230856.n8N8u2hp062395@banyan.cs.ait.ac.th> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.59 Subject: Re: LDAP server gone -> impossible to login locally! X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2009 12:52:04 -0000 Olivier Nicole schrieb: >> > On a related note, why is slapd so damn fragile? It's a righteous pain >> > in the bum the way you have to run db_recover-X.Y /var/db/openldap-data >> > if slapd fails to start. >> Yes, this is a lot of pain. I have had issues the same way and never >> figured out what the reason was. /var/ is very often corrupted after a >> crash, power failure or unclean reboot. Maybe not slpad is that fragile, >> but db47 is. > > Last June, we had to shutdown our openldap server every night, I > noticed that a simple halt(8) would leave the bdb backend database in > a corrupted state. > > It worked well if I /usr/local/etc/rc.d/slapd stop and sync(8) a couple > of type before I halt(8). > > After that I wrote a small script that would take a backup of the ldap > data every 2 hours and keep 5 days of backup. > > It seems that Berkeley DB has a lot of options that need to be > configured to be working optimally with openldap. Maybe soft-update > should be desactivated from the filesystem where the db files reside. This hasn't anything to do with the filesystem, but with abuse of the application (read: LDAP daemons) and/or its Berkeley DB support. If you kill the application before it can write all that it needs to write, you may corrupt your database, particularly if you catch it in the middle of a page split if a page in the DB file overflows and your database isn't transactional (i. e. with log.* files - which requires application support in turn). I'm not sure about OpenLDAP, but I feel I know Berkeley DB good enough to know it does not usually create or rename files on shutdown EXCEPT if there are bulk writes pending in a transactional database (which might then trigger creation of log.* files or flushing of corresponding writes). So I'd be surprised if SOFTDEPs were a cause of db47 corruption here. SOFTDEPs may have side effects that influence the shutdown process as a whole, but then the shutdown process is broken already without softdeps. So I'd rather make sure that the daemons are shut down properly at shutdown time, i. e. run the stop scripts and make sure they sleep long enough for the application to shut down cleanly (as needed). The database should be properly closed before halt(8) draws the SIGKILL shotgun and starts firing. IOW, check that your slapd stop is properly hooked to the shutdown procedure and waits long enough. If your filesystems get corrupted at power failures, make sure your HDD write caches are turned off (unless they're battery backed or otherwise permanent caches that survive the outage); you'd also need to check if your hardware is allowed to reorder writes, and if so, if writes get reordered across flush-cache primitives (aka. write barriers). I'm unaware of current support for preventing dangerous reorders with enabled write caches in the disk/controller drivers and filesystems though.