From owner-freebsd-current@FreeBSD.ORG Wed Dec 24 08:09:38 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B4AE816A4CE for ; Wed, 24 Dec 2003 08:09:38 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5A18943D39 for ; Wed, 24 Dec 2003 08:09:36 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.10/8.12.10) with ESMTP id hBOG8lUd066951; Wed, 24 Dec 2003 11:08:47 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)hBOG8kso066948; Wed, 24 Dec 2003 11:08:46 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Wed, 24 Dec 2003 11:08:46 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Oliver Brandmueller In-Reply-To: <20031224154121.GA83770@e-Gitt.NET> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@freebsd.org Subject: Re: file descriptor leak in 5.2-RC X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Dec 2003 16:09:38 -0000 On Wed, 24 Dec 2003, Oliver Brandmueller wrote: > Hi. > > I just started (by accident) a new thread regarding the same topic... Hmm. So this makes multiple reports, so we definitely have a problem. Are you using any sort of threaded applications -- if so, which threading packates are you using (linuxthreads, libc_r, libkse, et al). Do you know if you're making use of /dev/fd/*, or /dev/std* in scripts on your system? Do you have any reports of unusual process exits (via signals, etc)? If you look at the output of lsof or fstat while the system is actively running, it might be interesting to get a list of the kinds of sockets in use. Somewhere, presumably we're slipping a file descriptor reference, perhaps in a failure mode that turns up frequently in your environment. Helping to identify what differentiates your environment from the ones where this doesn't turn up may help track down the problem. The areas I've asked you to look at above are "interesting" file descriptor handling cases, and the problem might well be in one of these. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research > On Sat, Dec 20, 2003 at 09:38:11PM +0100, Poul-Henning Kamp wrote: > > In message , Robe > > rt Watson writes: > > > > >[...] so if we actually have a leak, > > >fstat(8) should show a small number of files, but the sysctl > > >kern.openfiles should reveal a large number of files open. > > > > sysctl kern.malloc | grep "file desc" ? > > I can with no problems reproduce this behaviour. > > The machine is a mail filtering server running exim, amavisd + > SpamAssassin and ClamAV. I do have the machine currently in a testing > environment and thus can do some experimentation. > > The machine gets the whole feed of messages we usually have (but just > not delivers any mail back to the main servers after filtering). This > means about 3-5 Mails per second going through the machine, which seems > enough to reproduce the effect very fast. > > The following values are (with SCHED_4BSD, SCHED-ULE give the same) read > in single user mode after the machine had been up for about 25 minutes > and did 10 minutes of mail filtering. Of course none of the daemons are > running anymore: > > # sysctl kern.openfiles > kern.openfiles: 4715 > # lsof | wc -l > 35 > # fstat | wc -l > 23 > # sysctl kern.malloc | grep "file desc" > file desc to leader 0 0K 1K 3 32 > file desc 102 26K 58K 15408 256 > # ps ax > PID TT STAT TIME COMMAND > 0 ?? DLs 0:00.11 (swapper) > 1 ?? ILs 0:00.64 /sbin/init -- > 2 ?? DL 0:00.11 (g_event) > 3 ?? DL 0:02.30 (g_up) > 4 ?? DL 0:01.70 (g_down) > 5 ?? DL 0:00.00 (taskqueue) > 6 ?? IL 0:00.00 (acpi_task0) > 7 ?? IL 0:00.00 (acpi_task1) > 8 ?? IL 0:00.00 (acpi_task2) > 9 ?? DL 0:00.00 (pagedaemon) > 10 ?? DL 0:00.00 (ktrace) > 11 ?? RL 26:37.86 (idle: cpu3) > 12 ?? RL 26:33.18 (idle: cpu2) > 13 ?? RL 25:53.23 (idle: cpu1) > 14 ?? RL 25:26.75 (idle: cpu0) > 27 ?? WL 0:00.00 (irq14: ata0) > 29 ?? WL 0:01.34 (irq16: uhci0) > 37 ?? WL 0:01.61 (irq24: twe0) > 61 ?? WL 0:02.00 (irq48: em0) > 86 ?? WL 0:01.65 (swi8: tty:sio clock) > 88 ?? WL 0:03.32 (swi1: net) > 89 ?? DL 0:00.43 (random) > 91 ?? WL 0:00.00 (swi7: acpitaskq) > 92 ?? WL 0:00.00 (swi7: task queue) > 94 ?? WL 0:00.00 (swi0: tty:sio) > 95 ?? DL 0:05.38 (pagezero) > 96 ?? DL 0:00.02 (bufdaemon) > 97 ?? DL 0:00.01 (vnlru) > 98 ?? DL 0:00.88 (syncer) > 415 ?? DL 0:00.00 (usb0) > 416 ?? DL 0:00.00 (usbtask) > 15403 d0 Ss 0:00.01 -sh (sh) > 15415 d0 R+ 0:00.00 ps ax > # uname -a > FreeBSD lupin 5.2-CURRENT FreeBSD 5.2-CURRENT #13: Wed Dec 24 15:31:44 CET 2003 root@lupin.eusc.inter.net:/usr/obj/usr/src/sys/MOMAIL i386 > # uptime > 4:35PM up 29 mins, 1 user, load averages: 0.10, 0.75, 0.63 > > There are no debugging options in the kernel and malloc.conf is linked > to aj since I needed to do the performance testing. The machine has to > go into production state on sunday; I would like to stay with FBSD 5 due > to the better SMP performance and the ability to do FS snapshots. Only > in the worst case I'd put a 4-STABLE on it. So I will give any help I > can to solve the issue. > > Greetinx, merry x-mas, Oliver > > -- > | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | > | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | > | Ich bin das Internet. Sowahr ich Gott helfe. | > | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >