Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 03 Apr 2008 10:46:53 -0700
From:      JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?= <Jinmei_Tatuya@isc.org>
To:        Attila Nagy <bra@fsn.hu>
Cc:        freebsd-performance@freebsd.org, bind-users@isc.org
Subject:   Re: max-cache-size doesn't work with 9.5.0b1
Message-ID:  <m21w5mltki.wl%Jinmei_Tatuya@isc.org>
In-Reply-To: <47F4F63E.80703@fsn.hu>
References:  <475B0F3E.5070100@fsn.hu> <m2lk6g71bc.wl%Jinmei_Tatuya@isc.org> <479DFE74.8030004@fsn.hu> <m2k5ltke09.wl%Jinmei_Tatuya@isc.org> <479F02A7.9020607@fsn.hu> <m24pcwt5b7.wl%Jinmei_Tatuya@isc.org> <47A614E9.4030501@fsn.hu> <m2wspkpl7r.wl%Jinmei_Tatuya@isc.org> <47A77A13.6010802@fsn.hu> <m2zlueohxk.wl%Jinmei_Tatuya@isc.org> <47B1D2F4.5070304@fsn.hu> <m2tzkexdo7.wl%Jinmei_Tatuya@isc.org> <47B2DD62.6020507@fsn.hu> <m2abm4y4by.wl%Jinmei_Tatuya@isc.org> <47BAE0B3.4090004@fsn.hu> <m2pruse24g.wl%Jinmei_Tatuya@isc.org> <47F4F63E.80703@fsn.hu>

next in thread | previous in thread | raw e-mail | index | archive | help
At Thu, 03 Apr 2008 17:22:38 +0200,
Attila Nagy <bra@fsn.hu> wrote:

> Sorry again for the long delay, I've got other work to do, and our 9.4 
> servers work fine (at least on FreeBSD 6, though, see the other 
> -performance- problem)...

No problem, I understand testing a beta version cannot be a high
priority work.

> > BTW, is this reproduceable on FreeBSD 6.x?  If so, then I'd like to
> > see what happens if you specify some small value of datasize
> > (e.g. 512MB) and have named abort when malloc() fails with the "X"
> > _malloc_options.  (This option doesn't seem to work for FreeBSD 7.x,
> > at least at the moment).
> >   
> Yes, it's the same, even when there is a different (libpthreads, KSE) 
> threading library is in use.
> I've recompiled named with the following in main():
> ./work/bind-9.5.0b2/bin/named/main.c:   _malloc_options="X";
> 
> And set cache-size to 32MB.
> 
> At:
> 21664 bind        4  20    0   819M   819M kserel 0   5:32  0.00% named.950
> I pressed a CTRL-C:
> mem.c:1114: REQUIRE((((ctx) != ((void *)0)) && (((const isc__magic_t 
> *)(ctx))->magic == ((('M') << 24 | ('e') << 16 | ('m') << 8 | ('C')))))) 
> failed.

Hmm, this is odd in two points:
1. the "X" malloc option doesn't seem to work as expected.  I expected
   a call to malloc() should trigger an assertion failure (within the
   malloc library) at a much earlier stage.  Does it change if you try
   the alternative debugging approach I mentioned before?  That is:
  - create a symbolic link from "/etc/malloc.conf" to "X":
    # ln -s X /etc/malloc.conf
  - start named with a moderate limitation of virtual memory size, e.g.
    # /usr/bin/limits -v 384m $path_to_named/named <command line options>

2. Whether it's related to this max-cache-size issue, the assertion
   failure in mem.c wasn't an expected result; this is likely to be a
   bug anyway.  If the process dumped a core, can you show the
   stack backtrace of it?
   (gdb) thread apply all bt full

> > Some other questions:
> > - can we see your named.conf?  If you specify non-default
> >   configuration options, that might be the reason for, or related to,
> >   this problem.
> >   
> Of course (see at the end).
> 
> > - does your named produce lot of log messages?  If so, it might also
> >   be a reason (simply because it relies on standard libraries).
> >   
> grep named ns20080403.log | wc -l
> 1930006
> For today (17 hours and 18 minutes of logs).
> Is this a lot?

This means about 31 log messages per second.  This may not be
extremely frequent, but if some memory is lost for every log message,
I guess it could be a reason for the growing memory at the hight rate
we've seen.

What if you change the channel setting from:

>     channel syslog-ng {
>         syslog local5;
>         severity info;
>         print-category yes;
>         print-severity yes;
>         };

to this one?

     channel syslog-ng {
         null;
         severity info;
         print-category yes;
         print-severity yes;
         };

BTW,

> -hmm I haven't tried to change cleaning-interval, it was needed because 
> the default cache housekeeping effectively stopped the ns during the 
> cleanup-

This doesn't matter for 9.5.  It doesn't perform periodic cleaning
regardless of the value of cleaning-interval.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?m21w5mltki.wl%Jinmei_Tatuya>