From owner-freebsd-current@FreeBSD.ORG  Wed Sep 23 12:52:03 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C9CDC1065694
	for <freebsd-current@freebsd.org>; Wed, 23 Sep 2009 12:52:03 +0000 (UTC)
	(envelope-from matthias.andree@gmx.de)
Received: from mail.gmx.net (mail.gmx.net [213.165.64.20])
	by mx1.freebsd.org (Postfix) with SMTP id 2D8CD8FC08
	for <freebsd-current@freebsd.org>; Wed, 23 Sep 2009 12:52:02 +0000 (UTC)
Received: (qmail invoked by alias); 23 Sep 2009 12:51:59 -0000
Received: from balu.cs.uni-paderborn.de (EHLO balu.cs.uni-paderborn.de)
	[131.234.21.37]
	by mail.gmx.net (mp011) with SMTP; 23 Sep 2009 14:51:59 +0200
X-Authenticated: #428038
X-Provags-ID: V01U2FsdGVkX19XIeQv2gFXUYEKHDk3eOX9LD95k3Kf757a35N6JB
	H4pbQyYVqZboYa
Received: from localhost ([127.0.0.1])
	by balu.cs.uni-paderborn.de with esmtp (Exim 4.69)
	(envelope-from <matthias.andree@gmx.de>) id KQFD2O-00037C-3N
	for freebsd-current@freebsd.org; Wed, 23 Sep 2009 14:52:00 +0200
Message-ID: <4ABA19EF.8010503@gmx.de>
Date: Wed, 23 Sep 2009 14:51:59 +0200
From: Matthias Andree <matthias.andree@gmx.de>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de;
	rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666
MIME-Version: 1.0
To: freebsd-current@freebsd.org
References: <4AB8BAA9.1060100@zedat.fu-berlin.de>	<200909222248.16475.doconnor@gsoft.com.au>	<4AB93614.2080106@locolomo.org>	<200909231104.39234.doconnor@gsoft.com.au>	<4AB9DDD8.2020700@zedat.fu-berlin.de>
	<200909230856.n8N8u2hp062395@banyan.cs.ait.ac.th>
In-Reply-To: <200909230856.n8N8u2hp062395@banyan.cs.ait.ac.th>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
X-FuHaFi: 0.59
Subject: Re: LDAP server gone -> impossible to login locally!
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Sep 2009 12:52:04 -0000

Olivier Nicole schrieb:
>> > On a related note, why is slapd so damn fragile? It's a righteous pain 
>> > in the bum the way you have to run db_recover-X.Y /var/db/openldap-data 
>> > if slapd fails to start.
>> Yes, this is a lot of pain. I have had issues the same way and never 
>> figured out what the reason was. /var/ is very often corrupted after a 
>> crash, power failure or unclean reboot. Maybe not slpad is that fragile, 
>> but db47 is.
>  
> Last June, we had to shutdown our openldap server every night, I
> noticed that a simple halt(8) would leave the bdb backend database in
> a corrupted state.
> 
> It worked well if I /usr/local/etc/rc.d/slapd stop and sync(8) a couple
> of type before I halt(8).
> 
> After that I wrote a small script that would take a backup of the ldap
> data every 2 hours and keep 5 days of backup.
> 
> It seems that Berkeley DB has a lot of options that need to be
> configured to be working optimally with openldap. Maybe soft-update
> should be desactivated from the filesystem where the db files reside.

This hasn't anything to do with the filesystem, but with abuse of the
application (read: LDAP daemons) and/or its Berkeley DB support.

If you kill the application before it can write all that it needs to write, you
may corrupt your database, particularly if you catch it in the middle of a page
split if a page in the DB file overflows and your database isn't transactional
(i. e. with log.* files - which requires application support in turn).

I'm not sure about OpenLDAP, but I feel I know Berkeley DB good enough to know
it does not usually create or rename files on shutdown EXCEPT if there are bulk
writes pending in a transactional database (which might then trigger creation of
log.* files or flushing of corresponding writes).

So I'd be surprised if SOFTDEPs were a cause of db47 corruption here.

SOFTDEPs may have side effects that influence the shutdown process as a whole,
but then the shutdown process is broken already without softdeps.

So I'd rather make sure that the daemons are shut down properly at shutdown
time, i. e. run the stop scripts and make sure they sleep long enough for the
application to shut down cleanly (as needed). The database should be properly
closed before halt(8) draws the SIGKILL shotgun and starts firing.

IOW, check that your slapd stop is properly hooked to the shutdown procedure and
waits long enough.

If your filesystems get corrupted at power failures, make sure your HDD write
caches are turned off (unless they're battery backed or otherwise permanent
caches that survive the outage); you'd also need to check if your hardware is
allowed to reorder writes, and if so, if writes get reordered across flush-cache
primitives (aka. write barriers). I'm unaware of current support for preventing
dangerous reorders with enabled write caches in the disk/controller drivers and
filesystems though.