Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Oct 2007 11:32:08 -0700 (PDT)
From:      Doug Barton <dougb@FreeBSD.org>
To:        Jeremy Chadwick <koitsu@FreeBSD.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: BIND 9.3.4 assertion failure on restart
Message-ID:  <alpine.BSF.0.9999.0710201122130.50892@qbhto.arg>
In-Reply-To: <20071018193322.GA23372@eos.sc1.parodius.com>
References:  <20071018193322.GA23372@eos.sc1.parodius.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Jeremy,

I saw this on Thursday, but I also saw that Mark had answered you and I 
had to focus on $REAL_LIFE so sorry for the delay.

On Thu, 18 Oct 2007, Jeremy Chadwick wrote:

> The following is a reproducible problem on a couple of our DNS servers:
> (one running 6.2-STABLE, one running 7.0-PRERELEASE):
>
> pid 52308 (named), uid 53: exited on signal 6
> Oct 18 12:10:21 anubis named[52308]: /usr/src/lib/bind/isc/../../../contrib/bind9/lib/isc/task.c:1238: INSIST((((manager->tasks).head == ((void *)0)) ? isc_boolean_true : isc_boolean_false)) failed
> Oct 18 12:10:21 anubis named[52308]: exiting (due to assertion failure)
>
> The problem only occurs when using "/etc/rc.d/named restart".  Doing a
> manual "/etc/rc.d/named stop" then "/etc/rc.d/named start" does not
> induce the problem.

I'm currently working on some improvements to the rc.d/named script that 
should help with that issue (unrelated to the bug Mark mentioned in BIND 
9.3.4).

> There was one random Internet user who posted about the same issue:
>
> http://forums.devshed.com/dns-36/weird-loggs-470845.html
>
> There's nothing bizarre about our BIND configuration on these boxes.
> I've re-written it (by hand) a couple times hoping it might be some
> syntax problem or other oddity, but it doesn't appear to be.  We're not
> chrooting,

You probably should be. :) You're correct in thinking that it's not a 
factor for this issue though.

> and there's no jails.  Only thing "non-standard" in rc.conf that's 
> named-related is named_flags="-4".

Yeah, that's both harmless and common.

> Both boxes exhibiting this problem are running on identical hardware
> (C2Ds, same memory amount, etc.), with an SMP kernel.  The 7.0 box uses
> the ULE scheduler, while the 6.2 box uses the 4BSD scheduler.  I mention
> this because the master server (running 6.2-STABLE on different
> hardware, non-SMP kernel, single-core P4 CPU) uses CPUTYPE?=prescott and
> does not have this problem.

If you're running on 6.x and/or BIND 9.3.x you should definitely not use 
threads, and your idea of using -n1 is probably a good idea as well (even 
if the bug were not present).

I saw your followup to this post so I'm a little unclear as to what 
hardware we're talking about, but if you're using a dual core or SMP 
machine I strongly encourage you to upgrade to 7.0 and BIND 9.4.1-P1. Both 
new versions have significant improvements in how they handle threads, and 
Kris has done some great work profiling that combination and shown that it 
significantly outperforms 6.2 and 9.3.x.

> I can't provide access to these boxes, but I can provide the
> configuration files and zones (there are not many) to those I trust
> (dougb@ that means you :) ).

Heh, thanks.

hth,

Doug

-- 

     This .signature sanitized for your protection




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.0.9999.0710201122130.50892>